Apache Software Foundation moves to GitHub

Last week, GitHub and the Apache Software Foundation (ASF) announced that ASF migrated their git repositories to GitHub. This caused a bit of a stir. It’s not every day that “the world’s largest open source foundation” moves to a proprietary hosting platform.

Free software purists expressed dismay. One person described it as “a really strange move In part because Apache’s key value add [was] that they provided freely available infrastructure.” GitHub, while it may be “free as in beer”, is definitely not “free as in freedom”. git itself is open source software, but GitHub “special sauce” is not.

For me, it’s not entirely surprising that ASF would make this move. I’ve always seen ASF as a more pragmatically-minded organization than, for example, the Free Software Foundation (FSF). I’d argue that the ecosystem benefits from having both ASF- and FSF-type organizations.

It’s not clear what savings ASF gets from this. Their blog post says they maintain their own mirrors, so there’s still some infrastructure involved. Of course, it’s probably smaller than running the full service, but by how much?

More than a reduced infrastructure footprint, I suspect the main benefit to the ASF is that it lowers the barrier to contribution. Like it or not, GitHub is the go-to place to find open source code. Mirroring to GitHub makes the code available, but you don’t get the benefits of integrating issues and pull requests (at least not trivially). Major contributors will do what it takes to adopt the tool, but drive by contributions should be as easy as possible.

There’s also another angle, which probably didn’t the drive the decision but brings a benefit nonetheless. Events like Hacktoberfest and 24 Pull Requests help motivate new contributors, but they’re based on GitHub repositories. Using GitHub as your primary forge means you’re accessible to the thousands of developers who participate in these events.

In a more ideal world, ASF would use a more open platform. In the present reality, this decision makes sense.

Releasing open source software is not immoral

Matt Stancliff recently made a bold statement on Twitter:

He made this comment in the context of the small amount of money the largest tech companies use to fund open source. With the five largest companies contributing less than a percentage of their annual revenue, open source projects would have two billion dollars of support. These projects are already subsidizing the large corporations, he argues, so they deserve some of the rewards.

This continues the recent trend of people being surprised that people will take free things and not pay for them. Developers who choose to release software under an open source license do so with the knowledge that someone else may use their software to make boatloads of money. Downstream users are under no obligation to remunerate or support upstreams in any way.

That said, I happen to think it’s the right thing to do. I contributed to Fedora as a volunteer for years as a way to “pay back” the community that gave me a free operating system. At a previous company, we made heavy use of an open source job scheduler/resource manager. We provided support on the community mailing lists and sponsored a reception at the annual conference. This was good marketing, of course, but it was also good community citizenship.

At any rate, if you want to make a moral judgment about open source, it’s not the release of open source software that’s the issue. The issue is parasitic consumption of open source software. I’m sure all of the large tech companies would say they support open source software, and they probably do in their own way. But not necessarily in the way that allows small-but-critical projects to thrive.

Toward a more moral ecosystem

Saying “releasing open source software has become immoral” is not helpful. Depriving large companies of open source would also deprive small companies and consumers. And it’s the large companies who could best survive the loss. Witness how MongoDB’s license change has Amazon using DocumentDB instead; meanwhile Linux distributions like Fedora are dropping MongoDB.

It’s an interesting argument, though, because normally when morality and software are in the mix, it’s the position that open source (or “free software” in this context, generally) is the moral imperative. That presents us with one possible solution: licensing your projects under a copyleft license (e.g. the GNU General Public License (GPL)). Copyleft-licensed software can still be used by large corporations to make boatloads of money, but at least it requires them to make source (including of derived works) available. With permissively-licensed software, you’re essentially saying “here’s my code, do whatever you want with it.” Of course people are going to take you up on that offer.

Picking communication tools for your community

Communication is key to the success of any project. The tools we use to communicate play a part in how effective our communication is. Recent discussions in Fedora and other projects have made me consider what tool selection looks like. Should Discourse replace mailing lists? Should Telegram replace IRC? I’m not going to answer those questions.

There’s no one right tool, just a set of considerations to think about in selecting communications tooling. Each community needs to arrive at a consensus about what works best for their workflow and culture, and keep in mind that the decision may attract some contributors while driving others away.

In this post, I’m going to broadly lump tools into two categories: synchronous and asynchronous. Many tools can be used for both to a decent approximation, but most will pretty obviously fall into one category or the other. Picking one tool to rule them all is a valid option, but be aware that it immediately favors one category of communication over the other. And keep in mind that for large projects, some sub-teams may choose different platforms. That’s fine so long as people who want to participate know where to look.

Considerations for all tools

Self-hosted or externally-hosted. Do you have the resources to maintain the tool? If you do, that’s a way to save money and maintain control, but it’s also time that your community members can’t spend working on whatever your community is doing. Externally-hosted tooling (either free or paid) might give you less flexibility, but it can also be more isolated from internal infrastructure outages.

Open source or proprietary. This is entirely a value judgement for your community. For some communities, anything that’s not open source is a non-starter. Others might not care at all one way or another. Most will fall somewhere on the spectrum between.

Federated or centralized. Can the community connect their own tools together (e.g. like with email) or is it a centralized system (like most social media platforms)? The trend is definitely toward centralized systems these days, so you may have to work harder to find a federated system that meets your needs.

Public or private. Can outsiders see what you’re saying? For many open source projects, public visibility is important. But even in those communities, some conversations may need to take place in private or semi-private.

Archived or ephemeral. Do you want to be able to go back and see what was said last month, last year, or last decade? Some conversations aren’t worth keeping, but records of important decisions probably are. Does your tool allow you to meet your archival needs?

Considerations for synchronous tools

Sometimes you really need to talk to people in real time.

Mobile experience. It’s 2019. People do a lot on their phones, especially if their contribution to your community happens during their workday or if they travel frequently. What is the mobile experience like for the tools you’re evaluating? It’s not just a matter of if clients exist, but what’s the whole experience. If they disconnect while on an airplane, do they lose all the messages that were sent in their absence?

Status and alerting. What happens if someone stays logged in and goes away for a little bit? Do they have the ability to suppress notifications? Is there any way to let others know “I’m away or busy, don’t expect an immediate reply”?

Audio, video, and screen sharing. Sometimes you need the high-bandwidth modes of communication in order to get your full message across (or just shortcut a lot of back-and-forth). Does the tool you’re looking at provide this? Is it usable for those who can’t participate due to bandwidth or other constraints?

Integrations. Can you display GIFs? The ability to speak entirely in animated images can be either a feature or a bug, depending on the community’s culture. But if it’s important one way or another, you’ll want to make sure your tool matches your needs. Of course, there are other integrations that might matter to. Can your build system post alerts? Does the tool automatically recognize certain links and display them in an particular manner?

Considerations for asynchronous tools

Of course, you’re not all going to be sitting at your computer at the same time. People go on vacation. They live in different time zones. They step away for 10 minutes to get a cup of coffee. Whatever the reason, you’ll need to communicate asynchronously sometimes.

Push or pull. Email is a push mechanism. Your message arrives in my inbox whether I’ve asked it to or not. Web fora are a pull mechanism. I have to go check them (yes, some forum tools provide an email interface). Which works better for your workflow and community? Pull mechanisms are easier to ignore when you want to step away for a little while, but they also mean you might forget to check when you do want to pay attention.

Is it a ticket system? I haven’t really talked about ticket systems/issue trackers because I don’t consider them a general communication tool. But for some projects, all the discussion that needs to happen happens in GitHub issues or another ticket tracker. If that works for you, there’s no point in adding a new tool to the mix.

Microsoft bought GitHub. Now what?

Last Monday, a weekend of rumors proved to be true. Microsoft announced plans to buy code-hosting site GitHub for $7.5 billion. Microsoft’s past, particularly before Satya Nadella took the corner office a few years ago, was full of hostility to open source. “Embrace, extend, extinguish” was the operative phrase. It should come as no surprise, then, that many projects responded by abandoning the platform.

But beyond the kneejerk reaction, there are two questions to consider. First: can open source projects trust Microsoft? Secondly, should open source (and free software in particular) projects rely on corporate hosting.

Microsoft as a friend

Let’s start with the first question. With such a long history of active assault on open source, can Microsoft be trusted? Understanding that some people will never be convinced, I say “yes”. Both from the outside and from my time as a Microsoft employee, it’s clear that the company has changed under Nadella. Microsoft recognizes that open source projects are not only complementary, but strategically important.

This is driven by a change in the environment that Microsoft operates in. The operating system is less important than ever. Desktop-based office suites are giving way to web-based tools for many users. Licensed revenue may be the past and much of the present, but it’s not the future. Subscription revenue, be it from services like Office 365 or Infrastructure-as-a-Service offerings, is the future. And for many of these, adoption and consumption will be driven by open source projects and the developers (developers! developers! developers! developers!) that use them.

Microsoft’s change of heart is undoubtedly driven by business needs, but that doesn’t make it any less real. Jim Zemlin, Executive Director at the Linux Foundation, expressed his excitement, implying it was a victory for open source. Tidelift ran the numbers to look at Microsoft’s contributions to non-Microsoft projects. Their conclusion?

…today the company is demonstrating some impressive traction when it comes to open source community contributions. If we are to judge the company on its recent actions, the data shows what Satya Nadella said in his announcement about Microsoft being “all in on open source” is more than just words.

And in any acquisition, you should always ask “if not them, then who?” CNBC reported that GitHub was also in talks with Google. While Google may have a better reputation among the developer community, I’m not sure they’d be better for GitHub. After all, Google had Google Code, which it shut down in 2016. Would a second attempt in this space fare any better? Google Code had a two year head start on GitHub, but it languished.

As for other major tech companies, this tweet sums it up pretty well:

Can you trust anyone to host?

My friend Lyz Joseph made an excellent point on Facebook the day the acquisition was announced:

Unpopular opinion: If you’re an open source project using GitHub, you already sold out. You traded freedom for convenience, regardless of what company is in control.

People often forget that GitHub itself is not open source. Some projects have avoided hosting on GitHub for that very reason. Even though the code repo itself is easily mirrored or migrated, that’s not the real value in GitHub. The “social coding” aspects — the issues, fork tracking, wikis, ease of pull requests, etc — are what make GitHub valuable. Chris Siebenmann called it “sticky in a soft way.

GitLab, at least, offers a “community edition” that projects can self-host. In a fantasy world, each project would run their own infrastructure, perhaps with federated authentication for ease of use when you’re a participant in many projects. But that’s not the reality we live in. Hosting servers costs money and time. Small projects in particular lack both of those. Third-party infrastructure will always be attractive for this reason. And as good as competition is, having a dominant social coding site is helpful to users in the same way that a dominant social network is simpler: network effects are powerful.

So now what?

The deal isn’t expected to close for a while, and Microsoft plans to seek regulatory approval, which will not speed the process. Nothing will change immediately. In the medium term, I don’t expect much to change either. Microsoft has made it clear that it plans to run GitHub as a fairly autonomous business (the way it does with LinkedIn). GitHub gets the stability that comes from the support of one of the world’s largest companies. Microsoft gets a chance to improve its reputation and an opportunity to make it easier for developers to use Azure services.

Full disclosure: I am a recent employee of Microsoft and a shareholder. I was not involved in the acquisition and had no inside knowledge pertinent to the acquisition or future plans for GitHub.

sudo is not as bad as Linux Journal would have you believe

Fear, uncertainty, and doubt (FUD) is often used to undercut the use of open source solutions, particularly in enterprise settings. And the arguments are sometimes valid, but that’s not a requirement. So long as you make open source seem risky, it’s easier to push your solution.

I was really disappointed to see Linux Journal run a FUD article as sponsored content recently. I don’t begrudge them for running sponsored content generally. They clearly label it and it takes money to run a website. Linux Journal pays writers and that money has to come from somewhere. But this particular article was tragic.

Chad Erbe uses “Four Hidden Costs and Risks of Sudo Can Lead to Cybersecurity Risks and Compliance Problems on Unix and Linux Servers” to sow FUD far and wide. sudo, if you’re not familiar with it, is a Unix command that allows authorized users to run authorized commands with elevated privileges. The most common use case is to allow administrators to run commands as the root user, but it can also be used to give, for example, webmasters the ability to restart the web server without giving them full access.

So what’s wrong with this article?

Administrative costs

Erbe argues that using sudo adds administrative overhead because you have to maintain the configuration file. It’s 2017: if you’re not using configuration management already then you’re probably a lost cause. You’re not adding a whole new layer, you’re adding one more file to the dozens (or more) you’re coordinating across your environment.

Erbe sets up a “most complicated setup” strawman and knocks it down by saying commercial solutions could help. He doesn’t say how, though, and there’s a reason for that: the concerns he raises apply to any technology that provides the solution. I have seen sites that use commercial solutions to replace sudo, and they still have to configure which users are authorized to use which commands on which servers.

Forensics and audit risks

sudo doesn’t have a key logger or log chain of custody. That’s true, but that doesn’t mean it’s the wild west. Erbe says configuration management systems can repair modified configuration files, but with a delay. That’s true, but tools like Tripwire are designed to catch these very cases. And authentication/authorization logs can be forwarded to a centralized log server. That’s probably something sysadmins should have set up already.

sudo provides a better level of audit logging compared to switching to the root account. It logs every command run and who runs it. Putting a key logger in it would provide no additional benefit. The applications launched with sudo (or the operating system itself) would need it.

Business continuity risks

You can’t rollback sudo and you can’t get support. Except that you can, in fact, downgrade the sudo version if it contains a critical bug. And you can get commercial support. Not for sudo specifically, but for your Linux installs generally.

Lack of enterprise support

This sems like a repeat of the last point, with a different focus. There’s no SLA for fixing bugs in sudo, but that doesn’t mean it’s inherently less secure. How many products developed by large commercial vendors have security issues discovered years later? A given package being open source does not imply that it is more or less secure than a proprietary counterpart, only that its source code is available.

A better title for this artice

Erbe raises some good points, but loses them in the FUD. This article would be much better titled “why authorization management is hard”. That approach, followed by “and here’s how my proprietary solution addresses those difficulties” would be a very interesting article. Instead, all we get is someone paying to knock down some poorly-constructed strawmen. The fact that it appears in Linux Journal gives it a false sense of credibility and that’s what makes it dangerous.

Why I choose the licenses I do

In the discussion around Richard Stallman’s speech at Purdue and my subsequent blog post about it, there was a great focus on the philosophy of various licenses. On Twitter, the point was raised that the GPL restricts the freedom of those making derivative works. The implication being that it imposes selfish restrictions. As a response to that (and because I told “MissMorphine” that I would write on this), I thought it would be a good idea to write about my own license choices and the personal philosophy behind them. Never mind the fact that it’s five years late.

First, my general philosophy. I am a proponent of the free sharing of information, knowledge, and software. I’m also aware that producing these things requires effort and resources, so I’m sympathetic to people who prefer to receive compensation. For myself, I generally opt to make things available. In exchange, I ask that people who would build off my work not restrict the freedoms of others. In order to do that, I must restrict the freedom to restrict freedoms. This is the choice I make for my own work.

Open source licenses (and I use the term broadly here to include the Creative Commons licenses and similar) maximize freedom in one of two ways: they maximize freedom for the next person downstream or they maximize freedom to any level downstream. It’s shouldnt be a surprise to learn that the former is often favored by developers and particularly by commercial entities, as it allows open source code to be used in closed products.

My default licenses are the GNU General Public License version 2 for code and Creative Commons Attribution-ShareAlike 4.0 for non-code. The gist (and to be clear, I’m hand waving a lot of detail here) of these licenses is “do what you want with my work, but give me credit and let others have the same rights.” However, I’ve licensed work under different licenses when there’s a case for it (e.g. contributing to a project or ecosystem that has a different license).

The important thing to remember when choosing a license is to pick one that matches your goals.

 

Other writing in January 2017

Where have I been writing when I haven’t been writing here?

The Next Platform

I’m freelancing for The Next Platform as a contributing author. Here are the articles I wrote last month:

Opensource.com

Over on Opensource.com, we had our fourth consecutive month with a milion-plus page views and set a record with 1,122,064. I wrote the articles below.

Also, the 2016 Open Source Yearbook is now available. You can get a free PDF download now or wait for the print version to become available. Or you can do both!

Cycle Computing

Meanwhile, I wrote or edited a few things for work, too:

  • Use AWS EBS Snapshots to speed instance setup — Staging reference data can be a time-expensive operation. This post describes one way we cut tens of minutes off of time for a cancer research workload.
  • Various ghost-written pieces. I’ll never tell which ones!

Come see me at these conferences in the next few months

I thought I should share some upcoming conference where I will be speaking or in attendance.

  • 9/16 — Indy DevOps Meetup (Indianapolis, IN) — It’s an informal meetup, but I’m speaking about how Cycle Computing does DevOps in cloud HPC
  • 10/1 — HackLafayette Thunder Talks (Lafayette, IN) — I organize this event, so I’ll be there. There are some great talks lined up.
  • 10/26-27 — All Things Open (Raleigh, NC) — I’m presenting the results of my M.S. thesis. This is a really great conference for open source, so if you can make it, you really should.
  • 11/14-18 — Supercomputing (Salt Lake City, UT) — I’ll be working the Cycle Computing booth most of the week.
  • 12/4-9 — LISA (Boston, MA) — The 30th version of the premier sysadmin conference looks to be a good one. I’m co-chairing the Invited Talks track, and we have a pretty awesome schedule put together if I do say so myself.

March Opensource.com articles

I ended up only writing one article for Opensource.com this month, but it turned out to be pretty popular, garnering well over 10,000 views. However, that’s nothing compared to the over one million views the whole site had in the month of March. The final total is roughly 200,000 more than the previous monthly record. I’d like to congratulate the editorial team, my fellow Community Moderators, and all of the other Red Hat & community contributors who have worked so hard to reach this milestone. I’m honored to be a part of such a great team.

December Opensource.com articles

Here are the articles I wrote for Opensource.com in December: