Decentralization is more appealing in theory than in reality

One of the appeals of open standards is that they allow market forces to work. Consumers can choose the tool or service that best meets their specific needs and if someone is a bad actor, the consumer can flee. Centralized proprietary services, in contrast, tie you to a specific provider. Consumers have no choice if they want/need to use the service.

It’s not as great as it sounds

This is true in theory, but reality is more complicated. Centralization allows for lower friction. Counterintuitively, centralization can allow for greater advancement. As Moxie Marlinspike wrote in a Whisper Labs blog post, open standards got the Internet “to the late 90s.” Decentralization is (in part) why email isn’t end-to-end encrypted by default. Centralization is (again, in part) why Slack has largely supplanted IRC and XMPP.

Particularly for services with a directory component (social networks for sure, but also tools like GitHub), centralization makes a lot of sense. It lowers the friction of finding those you care about. It also makes moderation easier.

Of course, those benefits can also be disadvantages. Easier moderation also means easier censorship. But not everyone is capable of or willing to run their own infrastructure. Or to find the “right” service among twenty nearly-identical offerings. The free market requires an informed consumer, and most consumers lack the knowledge necessary to make an informed choice.

Decentralization in open source

Centralized services versus federated (or isolated) services is a common discussion topic in open source. Jason Baker recently wrote a comment on a blog post that read in part:

I use Slack and GitHub and Google * and many other services because they’re simply easier – both for me, and for (most) of the people I’m collaborating with. The cost of being easier for most people I collaborate with is that I’m also probably excluding someone. Is that okay? I’m not sure. I go back and forth on that question a lot. In general, though, I try to be flexible to accommodate the people I’m actually working with, as opposed to solving the hypothetical/academic/moral question.

Centralization to some degree is inevitable. Whether build on open standards or not, most projects would rather work on their project than run their infrastructure. And GitHub (like Sourceforge before it), has enabled many small projects to flourish because they don’t need to spend time on infrastructure. Imagine if every project needed to run it’s own issue tracker, code repository, etc. The barrier to entry would be too high.

Striking a balance

GitHub provides an instructive example. It uses an open, decentralized technology (git) and layers centralized services on top. Users can get the best of both worlds in this sense. Open purists may not find this acceptable, but I think a pragmatic view is more appropriate. If allowing some proprietary services enables a larger and more robust open software ecosystem, isn’t that worthwhile?

Pay maintainers! No, not like that!

A lot of people who work on open source software get paid to do so. Many others do not. And as we learned during the Heartbleed aftermath, sometimes the unpaid (or under-paid) projects are very important. Projects have changed their licenses (e.g. MongoDB, which is now not an open source project by the Open Source Initiative’s definition) in order to cut off large corporations that don’t pay for the free software.

There’s clearly a broad recognition that maintainers need to be paid in order to sustain the software ecosystem. So if you expect that people are happy with GitHub’s recent announcement of a GitHub Sponsors, you have clearly spent no time in open source software communities. The reaction has had a lot of “pay the maintainers! No, not like that!” which strikes me as being obnoxious and unhelpful.

GitHub Sponsors is not a perfect model. Bradley Kuhn and Karen Sandler of the Software Freedom Conservancy called it a “quick fix to sustainability“. That’s the most valid criticism. It turns out that money doesn’t solve everything. Throwing money at a project can sometimes add to the burden, not lessen it. Money adds a lot of messiness and overhead to manage it, especially if there’s not a legal entity behind the project. That’s where the services provided by fiscal sponsor organizations like Conservancy come in.

But throwing money at a problem can sometimes help it. Projects can opt in to accepting money, which means they can avoid the problems if they want. On the other hand, if they want to take in money, GitHub just made it pretty easy. The patronage model has worked well for artists, it could also work for coders.

The other big criticism that I’ll accept is that it puts the onus on individual sponsorships (indeed, that’s the only kind available at the moment), not on corporate:

Like with climate change or reducing plastic waste, the individual’s actions are insignificant compared to the effects of corporate action. But that doesn’t mean individual action is bad. If iterative development is good for software, then why not iterate on how we support the software? GitHub just reduced the friction of supporting open source developers significantly. Let’s start there and fix the system as we go.

Apache Software Foundation moves to GitHub

Last week, GitHub and the Apache Software Foundation (ASF) announced that ASF migrated their git repositories to GitHub. This caused a bit of a stir. It’s not every day that “the world’s largest open source foundation” moves to a proprietary hosting platform.

Free software purists expressed dismay. One person described it as “a really strange move In part because Apache’s key value add [was] that they provided freely available infrastructure.” GitHub, while it may be “free as in beer”, is definitely not “free as in freedom”. git itself is open source software, but GitHub “special sauce” is not.

For me, it’s not entirely surprising that ASF would make this move. I’ve always seen ASF as a more pragmatically-minded organization than, for example, the Free Software Foundation (FSF). I’d argue that the ecosystem benefits from having both ASF- and FSF-type organizations.

It’s not clear what savings ASF gets from this. Their blog post says they maintain their own mirrors, so there’s still some infrastructure involved. Of course, it’s probably smaller than running the full service, but by how much?

More than a reduced infrastructure footprint, I suspect the main benefit to the ASF is that it lowers the barrier to contribution. Like it or not, GitHub is the go-to place to find open source code. Mirroring to GitHub makes the code available, but you don’t get the benefits of integrating issues and pull requests (at least not trivially). Major contributors will do what it takes to adopt the tool, but drive by contributions should be as easy as possible.

There’s also another angle, which probably didn’t the drive the decision but brings a benefit nonetheless. Events like Hacktoberfest and 24 Pull Requests help motivate new contributors, but they’re based on GitHub repositories. Using GitHub as your primary forge means you’re accessible to the thousands of developers who participate in these events.

In a more ideal world, ASF would use a more open platform. In the present reality, this decision makes sense.

GitHub’s new status feature

Two weeks ago, GitHub added a new feature for all users: the ability to set a status. I’m in favor of this. First, it appeals to my AOL Instant Messenger nostalgia. Second, I think it provides a valuable context for open source projects. It allows maintainers to say “hey, I’m not going to be very responsive for a bit”. In theory, this should let people filing issues and pull requests not get so angry if they don’t get a quick response.

Jessie Frazelle described it as the “cure for open source guilt”.

I was surprised at the amount of blowback this got. (See, for example the replies to Nat Friedman’s tweet.) Some of the responses are of the dumb “oh noes you’re turning GitHub into a social media platform. It should be about the code!” variety. To those people I say “fine, don’t use this feature.” Others raise a point about not advertising being on vacation.

I’m sympathetic to that. I’m generally pretty quiet about the house being empty on public or public-ish platforms. It’s a good way to advertise yourself to vandals and thieves. To be honest, I’m more worried about something like Nextdoor where the users are all local than GitHub where anyone who cares is probably a long way away. Nonetheless, it’s a valid concern, especially for people with a higher profile.

I agree with Peter that it’s not wise to set expectations for maintainers to share their private details. That said, I do think it’s helpful for maintainers to let their communities know what to expect from them. There are many reasons that someone might need to step away from their project for a week or several. A simple “I’m busy with other stuff and will check back in on February 30th” or something to that effect would accomplish the goal of setting community expectations without being too revelatory.

The success of this feature will rely on users making smart decisions about what they choose to reveal. That’s not always a great bet, but it does give people some control over the impact. The real question will be: how much do people respect it?

Microsoft bought GitHub. Now what?

Last Monday, a weekend of rumors proved to be true. Microsoft announced plans to buy code-hosting site GitHub for $7.5 billion. Microsoft’s past, particularly before Satya Nadella took the corner office a few years ago, was full of hostility to open source. “Embrace, extend, extinguish” was the operative phrase. It should come as no surprise, then, that many projects responded by abandoning the platform.

But beyond the kneejerk reaction, there are two questions to consider. First: can open source projects trust Microsoft? Secondly, should open source (and free software in particular) projects rely on corporate hosting.

Microsoft as a friend

Let’s start with the first question. With such a long history of active assault on open source, can Microsoft be trusted? Understanding that some people will never be convinced, I say “yes”. Both from the outside and from my time as a Microsoft employee, it’s clear that the company has changed under Nadella. Microsoft recognizes that open source projects are not only complementary, but strategically important.

This is driven by a change in the environment that Microsoft operates in. The operating system is less important than ever. Desktop-based office suites are giving way to web-based tools for many users. Licensed revenue may be the past and much of the present, but it’s not the future. Subscription revenue, be it from services like Office 365 or Infrastructure-as-a-Service offerings, is the future. And for many of these, adoption and consumption will be driven by open source projects and the developers (developers! developers! developers! developers!) that use them.

Microsoft’s change of heart is undoubtedly driven by business needs, but that doesn’t make it any less real. Jim Zemlin, Executive Director at the Linux Foundation, expressed his excitement, implying it was a victory for open source. Tidelift ran the numbers to look at Microsoft’s contributions to non-Microsoft projects. Their conclusion?

…today the company is demonstrating some impressive traction when it comes to open source community contributions. If we are to judge the company on its recent actions, the data shows what Satya Nadella said in his announcement about Microsoft being “all in on open source” is more than just words.

And in any acquisition, you should always ask “if not them, then who?” CNBC reported that GitHub was also in talks with Google. While Google may have a better reputation among the developer community, I’m not sure they’d be better for GitHub. After all, Google had Google Code, which it shut down in 2016. Would a second attempt in this space fare any better? Google Code had a two year head start on GitHub, but it languished.

As for other major tech companies, this tweet sums it up pretty well:

Can you trust anyone to host?

My friend Lyz Joseph made an excellent point on Facebook the day the acquisition was announced:

Unpopular opinion: If you’re an open source project using GitHub, you already sold out. You traded freedom for convenience, regardless of what company is in control.

People often forget that GitHub itself is not open source. Some projects have avoided hosting on GitHub for that very reason. Even though the code repo itself is easily mirrored or migrated, that’s not the real value in GitHub. The “social coding” aspects — the issues, fork tracking, wikis, ease of pull requests, etc — are what make GitHub valuable. Chris Siebenmann called it “sticky in a soft way.

GitLab, at least, offers a “community edition” that projects can self-host. In a fantasy world, each project would run their own infrastructure, perhaps with federated authentication for ease of use when you’re a participant in many projects. But that’s not the reality we live in. Hosting servers costs money and time. Small projects in particular lack both of those. Third-party infrastructure will always be attractive for this reason. And as good as competition is, having a dominant social coding site is helpful to users in the same way that a dominant social network is simpler: network effects are powerful.

So now what?

The deal isn’t expected to close for a while, and Microsoft plans to seek regulatory approval, which will not speed the process. Nothing will change immediately. In the medium term, I don’t expect much to change either. Microsoft has made it clear that it plans to run GitHub as a fairly autonomous business (the way it does with LinkedIn). GitHub gets the stability that comes from the support of one of the world’s largest companies. Microsoft gets a chance to improve its reputation and an opportunity to make it easier for developers to use Azure services.

Full disclosure: I am a recent employee of Microsoft and a shareholder. I was not involved in the acquisition and had no inside knowledge pertinent to the acquisition or future plans for GitHub.

Taking action on commit messages

Many modern code hosting platforms (e.g. GitHub and GitLab) parse commit messages to do something smart with them. The most common is probably to look for references to an issue number and create a link or close the issue. For example: “Fixes #37”. Commit messages can also be used to notify or reference other users. For example: “I think @funnelfiasco broke it. Again.”

These automated actions have a lot of utility. They simplify the communication process. Manually linking to issues, users, etc would be a pain, which means it would never happen. This hurts not only the project developers, but also the users trying to dive into troubleshooting a problem.

But it’s not all candy and rainbows. As an example, a coworker removed the “deprecated” decorator from some Python code. His commit message included “un-@deprecated”. Our GitLab instance saw the “@” and decided to add the “deprecated” group to the issue. That added the entire engineering and operations teams to the issue.

The obvious solution is to require a more explicit markup than a single character. Something like “HEYDOTHIS-NOTIFY-funnelfiasco” reduces the possibility of accidentally triggering an action. On the other hand, it’s a giant pain in the ass. This, as above, means it’s likely to not be used. Even if it is still used, manual syntax is prone to error.

So what’s the answer? I don’t have a good solution. Projects parse commit messages on a daily basis to simplify workflows and improve communication. The Asterisk community, as an example, uses more than just simple tagging. The drawbacks are mostly nuisance at this point, and I don’t think they outweigh the benefits.

What might change my mind is if commit message parsing could be used to execute arbitrary code on the server. If several vulnerabilities align in just the right way, I suppose it’s a theoretical possibility. Of course, people you trust with commit access to the repo could do damage the old fashioned way. But it would be an attack vector for pull requests, albeit an amusing one. “Hey, I improved your project with this code, but my commit message also will add your server to my botnet if you merge it.”

 

GitHub as a community management platform?

GitHub is the dominant platform for hosting open source code. It’s hardly ubiquitous, there are other hosting services and many projects self-host. Nonetheless, it’s the go-to place for many FLOSS projects and has lowered the barrier to contribution. Arguably, it’s brought the barrier too low.

At least, that’s my interpretation of an open letter to GitHub published on Thursday. Signed by dozens of project maintainers, the letter identifies troubles that often arise on the GitHub platform and offer suggestions for fixes.

The issues raised in the letter are legitimate, and they’re expressed quite reasonably for something published on the Internet, but they highlight what GitHub is and isn’t. GitHub is a source code management platform, it is not a community management platform.

That’s not to say it can’t be. GitHub is great for what it does, but it could be even better. Managing code is easy; managing contributors and other community members is not. For GitHub to take the next step in promoting open source software development, it needs to provide tools that aid in community. That includes bug and issue tracking, communication (mailing lists?), and other features that turn a project’s users into community members.

Hacktoberfest

I’m a little late to the game, but over the weekend I heard about Hacktoberfest. Sponsored by DigitalOcean in partnership with GitHub, Hacktoberfest is intended to get people to make contributions to open source projects. While I’ve made contributions before, the prospect of a free t-shirt that I don’t need was enough to get me to submit three pull requests on Saturday.

Wouldn’t it be great if I submitted a pull request every week? Then I looked at my todo list and my calendar and walked it back. I think a pull request  (or direct commit to a project I have access to) per month is a reasonable goal. I’ve been meaning to make more contributions to projects for a while, so this may be just the motivation I need.

I need to come up with a catchy name, but I’ll use this blog as a record of what I contribute. In the meantime, if you haven’t signed up for Hacktoberfest, go do that. Happy contributing!

The SourceForge treason

Many of you have undoubtedly heard of what happens to projects that SourceForge deems inactive: the installers get wrapped in a bundled adware installer. Popular projects like nmap, VLC, and the GIMP have found their packages subject to this hijacking lately. Although legally permissible (at least from a licensing standpoint), it’s ethically disturbing. SourceForge says this is a feature — an opportunity for projects to generate a little bit of revenue (assuming they opt in and aren’t hijacked), but it is antithetical to the philosophy of many projects.

Part of the problem is the the “Hotel California” nature of SourceForge. Projects can check out any time they like, but they can never leave. Once a project is hosted on SourceForge, it is reportedly near-impossible for the project to be removed, even if it moves active development to a different platform. On the one hand, this is beneficial to the community, since it ensures abandoned or closed packages will remain available for download. On the other hand, it allows for undesired insertion of adware and other antisocial activities.

SourceForge is clearly no longer a safe place for developers to host projects or for users to download software. That’s a shame, because SourceForge had a take on hosting that other sites seem to ignore. GitHub, BitBucket, and others are very focused on serving as platform for hosting code. They focus on the developer experience. GitHub’s simple interfaces for forking projects and submitting pull requests (whatever technical limitations they may have) have done a great deal for making source code readily accessible and encouraging open development.

SourceForge’s strength was that it provided an easy way for users to search for projects and to download compiled releases. The casual user almost certainly lacks the skills and desire to build packages from source (and many others who are capable certainly don’t want to). The misleading “Click here to Download!” ads aside, SourceForge made it easy for users to get an installer. GitHub has a “Releases” feature which attempts to do this, though it’s not clear that the feature is widely used (full disclosure: I have not used it as a developer or a consumer).

The looming death of SourceForge leaves a real gap in the accessibility of open source to the casual user. It also highlights the dangers of relying on a third-party hosting service (what’s to stop GitHub from doing something similar except the fear of irrelevance?). Self-hosting is not the easy answer it seems, though. Developers are not necessarily systems administrators and may not have the skill or the available resources to maintain their own hosting site, especially for trivial code or code that becomes widely popular. Hopefully SourceForge serves as an example of what not to do and an inspiration for someone to fill the gap better.