Microsoft bought GitHub. Now what?

Last Monday, a weekend of rumors proved to be true. Microsoft announced plans to buy code-hosting site GitHub for $7.5 billion. Microsoft’s past, particularly before Satya Nadella took the corner office a few years ago, was full of hostility to open source. “Embrace, extend, extinguish” was the operative phrase. It should come as no surprise, then, that many projects responded by abandoning the platform.

But beyond the kneejerk reaction, there are two questions to consider. First: can open source projects trust Microsoft? Secondly, should open source (and free software in particular) projects rely on corporate hosting.

Microsoft as a friend

Let’s start with the first question. With such a long history of active assault on open source, can Microsoft be trusted? Understanding that some people will never be convinced, I say “yes”. Both from the outside and from my time as a Microsoft employee, it’s clear that the company has changed under Nadella. Microsoft recognizes that open source projects are not only complementary, but strategically important.

This is driven by a change in the environment that Microsoft operates in. The operating system is less important than ever. Desktop-based office suites are giving way to web-based tools for many users. Licensed revenue may be the past and much of the present, but it’s not the future. Subscription revenue, be it from services like Office 365 or Infrastructure-as-a-Service offerings, is the future. And for many of these, adoption and consumption will be driven by open source projects and the developers (developers! developers! developers! developers!) that use them.

Microsoft’s change of heart is undoubtedly driven by business needs, but that doesn’t make it any less real. Jim Zemlin, Executive Director at the Linux Foundation, expressed his excitement, implying it was a victory for open source. Tidelift ran the numbers to look at Microsoft’s contributions to non-Microsoft projects. Their conclusion?

…today the company is demonstrating some impressive traction when it comes to open source community contributions. If we are to judge the company on its recent actions, the data shows what Satya Nadella said in his announcement about Microsoft being “all in on open source” is more than just words.

And in any acquisition, you should always ask “if not them, then who?” CNBC reported that GitHub was also in talks with Google. While Google may have a better reputation among the developer community, I’m not sure they’d be better for GitHub. After all, Google had Google Code, which it shut down in 2016. Would a second attempt in this space fare any better? Google Code had a two year head start on GitHub, but it languished.

As for other major tech companies, this tweet sums it up pretty well:

Can you trust anyone to host?

My friend Lyz Joseph made an excellent point on Facebook the day the acquisition was announced:

Unpopular opinion: If you’re an open source project using GitHub, you already sold out. You traded freedom for convenience, regardless of what company is in control.

People often forget that GitHub itself is not open source. Some projects have avoided hosting on GitHub for that very reason. Even though the code repo itself is easily mirrored or migrated, that’s not the real value in GitHub. The “social coding” aspects — the issues, fork tracking, wikis, ease of pull requests, etc — are what make GitHub valuable. Chris Siebenmann called it “sticky in a soft way.

GitLab, at least, offers a “community edition” that projects can self-host. In a fantasy world, each project would run their own infrastructure, perhaps with federated authentication for ease of use when you’re a participant in many projects. But that’s not the reality we live in. Hosting servers costs money and time. Small projects in particular lack both of those. Third-party infrastructure will always be attractive for this reason. And as good as competition is, having a dominant social coding site is helpful to users in the same way that a dominant social network is simpler: network effects are powerful.

So now what?

The deal isn’t expected to close for a while, and Microsoft plans to seek regulatory approval, which will not speed the process. Nothing will change immediately. In the medium term, I don’t expect much to change either. Microsoft has made it clear that it plans to run GitHub as a fairly autonomous business (the way it does with LinkedIn). GitHub gets the stability that comes from the support of one of the world’s largest companies. Microsoft gets a chance to improve its reputation and an opportunity to make it easier for developers to use Azure services.

Full disclosure: I am a recent employee of Microsoft and a shareholder. I was not involved in the acquisition and had no inside knowledge pertinent to the acquisition or future plans for GitHub.

Taking action on commit messages

Many modern code hosting platforms (e.g. GitHub and GitLab) parse commit messages to do something smart with them. The most common is probably to look for references to an issue number and create a link or close the issue. For example: “Fixes #37”. Commit messages can also be used to notify or reference other users. For example: “I think @funnelfiasco broke it. Again.”

These automated actions have a lot of utility. They simplify the communication process. Manually linking to issues, users, etc would be a pain, which means it would never happen. This hurts not only the project developers, but also the users trying to dive into troubleshooting a problem.

But it’s not all candy and rainbows. As an example, a coworker removed the “deprecated” decorator from some Python code. His commit message included “un-@deprecated”. Our GitLab instance saw the “@” and decided to add the “deprecated” group to the issue. That added the entire engineering and operations teams to the issue.

The obvious solution is to require a more explicit markup than a single character. Something like “HEYDOTHIS-NOTIFY-funnelfiasco” reduces the possibility of accidentally triggering an action. On the other hand, it’s a giant pain in the ass. This, as above, means it’s likely to not be used. Even if it is still used, manual syntax is prone to error.

So what’s the answer? I don’t have a good solution. Projects parse commit messages on a daily basis to simplify workflows and improve communication. The Asterisk community, as an example, uses more than just simple tagging. The drawbacks are mostly nuisance at this point, and I don’t think they outweigh the benefits.

What might change my mind is if commit message parsing could be used to execute arbitrary code on the server. If several vulnerabilities align in just the right way, I suppose it’s a theoretical possibility. Of course, people you trust with commit access to the repo could do damage the old fashioned way. But it would be an attack vector for pull requests, albeit an amusing one. “Hey, I improved your project with this code, but my commit message also will add your server to my botnet if you merge it.”

 

GitHub as a community management platform?

GitHub is the dominant platform for hosting open source code. It’s hardly ubiquitous, there are other hosting services and many projects self-host. Nonetheless, it’s the go-to place for many FLOSS projects and has lowered the barrier to contribution. Arguably, it’s brought the barrier too low.

At least, that’s my interpretation of an open letter to GitHub published on Thursday. Signed by dozens of project maintainers, the letter identifies troubles that often arise on the GitHub platform and offer suggestions for fixes.

The issues raised in the letter are legitimate, and they’re expressed quite reasonably for something published on the Internet, but they highlight what GitHub is and isn’t. GitHub is a source code management platform, it is not a community management platform.

That’s not to say it can’t be. GitHub is great for what it does, but it could be even better. Managing code is easy; managing contributors and other community members is not. For GitHub to take the next step in promoting open source software development, it needs to provide tools that aid in community. That includes bug and issue tracking, communication (mailing lists?), and other features that turn a project’s users into community members.

Hacktoberfest

I’m a little late to the game, but over the weekend I heard about Hacktoberfest. Sponsored by DigitalOcean in partnership with GitHub, Hacktoberfest is intended to get people to make contributions to open source projects. While I’ve made contributions before, the prospect of a free t-shirt that I don’t need was enough to get me to submit three pull requests on Saturday.

Wouldn’t it be great if I submitted a pull request every week? Then I looked at my todo list and my calendar and walked it back. I think a pull request  (or direct commit to a project I have access to) per month is a reasonable goal. I’ve been meaning to make more contributions to projects for a while, so this may be just the motivation I need.

I need to come up with a catchy name, but I’ll use this blog as a record of what I contribute. In the meantime, if you haven’t signed up for Hacktoberfest, go do that. Happy contributing!

The SourceForge treason

Many of you have undoubtedly heard of what happens to projects that SourceForge deems inactive: the installers get wrapped in a bundled adware installer. Popular projects like nmap, VLC, and the GIMP have found their packages subject to this hijacking lately. Although legally permissible (at least from a licensing standpoint), it’s ethically disturbing. SourceForge says this is a feature — an opportunity for projects to generate a little bit of revenue (assuming they opt in and aren’t hijacked), but it is antithetical to the philosophy of many projects.

Part of the problem is the the “Hotel California” nature of SourceForge. Projects can check out any time they like, but they can never leave. Once a project is hosted on SourceForge, it is reportedly near-impossible for the project to be removed, even if it moves active development to a different platform. On the one hand, this is beneficial to the community, since it ensures abandoned or closed packages will remain available for download. On the other hand, it allows for undesired insertion of adware and other antisocial activities.

SourceForge is clearly no longer a safe place for developers to host projects or for users to download software. That’s a shame, because SourceForge had a take on hosting that other sites seem to ignore. GitHub, BitBucket, and others are very focused on serving as platform for hosting code. They focus on the developer experience. GitHub’s simple interfaces for forking projects and submitting pull requests (whatever technical limitations they may have) have done a great deal for making source code readily accessible and encouraging open development.

SourceForge’s strength was that it provided an easy way for users to search for projects and to download compiled releases. The casual user almost certainly lacks the skills and desire to build packages from source (and many others who are capable certainly don’t want to). The misleading “Click here to Download!” ads aside, SourceForge made it easy for users to get an installer. GitHub has a “Releases” feature which attempts to do this, though it’s not clear that the feature is widely used (full disclosure: I have not used it as a developer or a consumer).

The looming death of SourceForge leaves a real gap in the accessibility of open source to the casual user. It also highlights the dangers of relying on a third-party hosting service (what’s to stop GitHub from doing something similar except the fear of irrelevance?). Self-hosting is not the easy answer it seems, though. Developers are not necessarily systems administrators and may not have the skill or the available resources to maintain their own hosting site, especially for trivial code or code that becomes widely popular. Hopefully SourceForge serves as an example of what not to do and an inspiration for someone to fill the gap better.