Taking action on commit messages

Many modern code hosting platforms (e.g. GitHub and GitLab) parse commit messages to do something smart with them. The most common is probably to look for references to an issue number and create a link or close the issue. For example: “Fixes #37”. Commit messages can also be used to notify or reference other users. For example: “I think @funnelfiasco broke it. Again.”

These automated actions have a lot of utility. They simplify the communication process. Manually linking to issues, users, etc would be a pain, which means it would never happen. This hurts not only the project developers, but also the users trying to dive into troubleshooting a problem.

But it’s not all candy and rainbows. As an example, a coworker removed the “deprecated” decorator from some Python code. His commit message included “un-@deprecated”. Our GitLab instance saw the “@” and decided to add the “deprecated” group to the issue. That added the entire engineering and operations teams to the issue.

The obvious solution is to require a more explicit markup than a single character. Something like “HEYDOTHIS-NOTIFY-funnelfiasco” reduces the possibility of accidentally triggering an action. On the other hand, it’s a giant pain in the ass. This, as above, means it’s likely to not be used. Even if it is still used, manual syntax is prone to error.

So what’s the answer? I don’t have a good solution. Projects parse commit messages on a daily basis to simplify workflows and improve communication. The Asterisk community, as an example, uses more than just simple tagging. The drawbacks are mostly nuisance at this point, and I don’t think they outweigh the benefits.

What might change my mind is if commit message parsing could be used to execute arbitrary code on the server. If several vulnerabilities align in just the right way, I suppose it’s a theoretical possibility. Of course, people you trust with commit access to the repo could do damage the old fashioned way. But it would be an attack vector for pull requests, albeit an amusing one. “Hey, I improved your project with this code, but my commit message also will add your server to my botnet if you merge it.”

 

November Opensource.com articles

I’ve decided to make this a regular thing: near the beginning of every month, I’ll recap the articles I’ve written for Opensource.com in the previous month. This seems better than scattershot posts that may or may not include all of my articles. So here’s November:

Deploying Fedora 18 documentation: learning git the hard way

If you haven’t heard, the Fedora team released Fedora 18 today. It’s the culmination of many months of effort, and some very frustrating schedule delays. I’m sure everyone was relieve to push it out the door, even as some contributors worked to make sure the mirrors were stable and update translations. I remembered that I had forgotten to push the Fedora 18 versions of the Live Images Guide and the Burning ISOs Guide, so I quickly did that. Then I noticed that several of the documents that were on the site earlier weren’t anymore. Crap.

Here’s how the Fedora Documentation site works: contributors write guides in DocBook XML, build them with a tool called publican, and then check the built documents into a git repository. Once an hour, the web server clones the git repo to update the content on the site. Looking through the commits, it seemed like a few hours prior, someone had published a document without updating their local copy of the web repo first, which blew away previously-published Fedora 18 docs.

The fix seemed simple enough: I’d just revert to a few commits prior and then we could re-publish the most recent updates. So I git a `git reset –hard` and then tried to push. It was suggested that a –force might help, so I did. That’s when I learned that this basically sends the local git repo to the remote as if the remote were empty (someone who understands git better would undoubtedly correct this explanation), which makes sense. For many repos, this probably isn’t too big a deal. For the Docs web repo, which contains many images, PDFs, epubs, etc. and is roughly 8 GB on disk, this can be a slow process. On a residential cable internet connection which throttles uploads to about 250 KiB/s after the first minute, it’s a very slow process.

I sent a note to the docs mailing list letting people know I was cleaning up the repo and that they shouldn’t push any docs to the web. After an hour or so, the push finally finished. It was…a failure? Someone hadn’t seen my email and pushed a new guide shortly after I had started the push-of-doom. Fortunately I discovered the git revert command in the meantime. revert, instead of pretending like the past never happened, makes diffs to back out the commit(s). After reverting four commits and pushing, we were back to where we were when life was happy. It was simple to re-publish the docs after that, and a reminder was sent to the group to ensure the repo is up-to-date before pushing.

The final result is that some documents were unavailable for a few hours. The good news is that I learned a little bit more about git today. The better news is that this should serve as additional motivation to move to Publican 3, which will allow us to publish guides via RPMs instead of an unwieldy git repo.