Balancing incoming tasks in volunteer projects

Open source (and other volunteer-driven) communities are often made up of a “team of equals.” Each member of the group is equally empowered to act on incoming tasks. But balancing the load is not easy. One of two things happens: everyone is busy with other work and assumes someone else will handle it, or a small number of people immediately jump on every task that comes in. Both of these present challenges for the long-term health of the team.

Bystander effect

The first situation is known as the “bystander effect.” Because every member of the team bears an equal responsibility, each member of the team assumes that someone else will take an incoming task. The sociological research is apparently mixed, but I’ve observed this enough to know that it’s at least possible in some teams. You’ve likely heard the saying “if everyone is responsible then no one is.”

The Bystander effect has two outcomes. The first is that the team drops the task. No one acts on it. If the task happens to be an introduction from a new member or the submission of content, this demoralizes the newcomer. If the team drops enough tasks, the new tasks stop coming.

The other possibility is that someone eventually notices that no one else is taking the task, so they take it. In my experience, it’s generally the same person who does this every time. Eventually, they begin to resent the other members of the team. They may burn out and leave.

Oxygen theft

Sometimes one or two team members jump on new tasks before anyone else does. Like the delayed version in the bystander effect scenario, this can lead to burn out. But worse, it can drive away team members who want to take tasks. If they’re constantly missing work because they weren’t able to immediately jump on it, they’ll go find other places to contribute. I call this “oxygen theft” because it’s like sucking all of the oxygen out of the room: it puts out the flames.

I have been an oxygen thief myself. Shortly after I started as the Fedora Program Manager, I became an editor on the Fedora Community Blog. I was publishing regular posts and I happen to be a decent editor, so it made sense to give me that privilege. But because Fedora was my day job, I was often the first to notice new submissions. Over time, I eventually became the only editor working on posts. By accident, the editorial team became a team of one. That’s on my list to fix in the near future.

Solving the problem

Letting either the bystander effect or oxygen theft cases go for too long harms the team. But with volunteers, it’s hard to balance the work. Team members may not have consistent availability. For example, if one of the team members dayjob schedule varies from week. They probably don’t have evenly distributed availability, either. Someone who is paid to be on a project will likely have a lot more time available than someone volunteering.

One way to solve the problem is to take turns being in charge of the incoming tasks for a period of time. This addresses “if everyone is responsible then no one is” by making a single person responsible. But by making it a rotating duty, you can spread the load.

After learning my lesson with the Fedora Community Blog, I was hesitant to be too aggressive with taking tasks as an editor of the Fedora Magazine. But the Magazine team was definitely suffering from the bystander effect.

To fix this, I proposed having an Editor of the Week. Each week, one person volunteers to be responsible for making sure new article pitches got timely responses and the comments were moderated. Any of the editors are free to help with those tasks, but the Editor of the Week is the one accountable for them.

It’s not a perfect system. The Editor of the Week role is taken on a volunteer basis, so some editors serve more frequently than others. Still, it seems to work well for us overall. Pitches get feedback more quickly than in the past, and we’re not putting all of the work on one person’s plate.

[If you are intrigued by this half-baked post, you’ll enjoy my book on program management for open source projects, coming from The Pragmatic Bookshelf in 2022.]

Projects shouldn’t write their own tools

Over the weekend, the PHP project learned that its git server had been compromised. Attackers inserted malicious code into the repo. This is very bad. As a result, the project moved development to GitHub.

It’s easy to say that open source projects should run their own infrastructure. It’s harder to do that successfully. The challenges compound when you add in writing the infrastructure applications.

I understand the appeal. It’s zero-price (to write; you still need the hardware to run it). Bespoke software meets your needs exactly. And it can be a fun diversion from the main thing you’re working on: who doesn’t like going to chase a shiny for a little bit?

Of course, there’s always the matter of “the thing you wanted didn’t exist when you started the project.” PHP’s first release predates the launch of GitHub by 13 years. It’s 10 years older than git, even.

Of course, this means that at some point PHP moved from some other version control system to Git. That also means they could have moved from their homegrown platform to GitHub. I understand why they’d want to avoid the pain of making that switch, but sometimes it’s worthwhile.

Writing secure and reliable infrastructure is hard. For most projects, the effort and risk of writing their own tooling isn’t worth the benefit. If the core mission of your project isn’t the production of infrastructure applications, don’t write it.

Sidebar: Self-hosting

The question of whether or not to write your infrastructure applications is different from the question of whether or not to self-host. While the former has a pretty easy answer of “no”, the latter is mixed. Self-hosting still costs time and resources, but it allows for customization and integration that might be difficult with software-as-a-service. It also avoids being at the whims of a third party who may or may not share your project’s values. But in general, projects should do the minimum that they can reasonably justify. Sometimes that means running your own instances of an infrastructure application. Very rarely does it mean writing a bespoke infrastructure application.

The FSF does not represent my views

Earlier this week, Richard Stallman announced that he was rejoining the board of the Free Software Foundation. You may recall that he resigned as president and board member in 2019 after making unacceptable remarks about the sexual assault of a minor. This was not the first instance of unacceptable behavior. The FSF made no real changes to address the issue and now has welcomed Stallman back.

I’m thankful that the people I choose to associate with have universally condemned this as harmful. I wrote in 2012 that I think he hurts his own ideological cause. At the time I wrote the post, I was thinking entirely of his rigid aherence to free software over all else. In truth, the harm he does goes well beyond that. For me, the licensing terms of free and open source software are not as important as the human impact.

As I wrote last month, free and open source software is not the end goal. What good is free software that is used to harm others? And what good is a free software movement that is not willing to include underindexed groups. We cannot tolerate nor enable this sort of behavior.

The Fedora Council spent a lot of time debating our vision statement.

The Fedora Project envisions a world where everyone benefits from free and open source software built by inclusive, welcoming, and open-minded communities.

Fedora PRoject vision

The inclusion of the “built by” is no accident. We want our community to be vibrant and healthy. That cannot happen when bad behavior is allowed to persist.

I think it’s too late for the FSF. They’ve painted themselves into a corner long ago. This only cements that. Still, perhaps with a new slate, the organization can be reborn into something that aids the cause it purports to champion. That is why I have signed the open letter calling for the resignation of Stallman and the entire FSF Board of Directors.

Should we treat OSD compliance as a binary?

So often, we think about whether a software license complies with the Open Source Definition (OSD) as a binary: it complies or it doesn’t. But the OSD has 10 criteria. If a license complies with all except for one of those criteria, it’s non-compliant, but is it non-compliant in the same way that a license that doesn’t comply with four criteria?

I got to thinking about this as I tried to come up with names for the four quadrants in Tobie Langel’s license classification chart. It occurred to me that the bottom half represented two concepts: not explicitly OSD-compliant because it was never submitted and explicitly not OSD-compliant because it violates one or more criteria.

A diagram of the open source landscape considering licenses and norms. Created by Tobie Langel and used under CC BY-SA 4.0.

There must be 50 ways to violate the OSD

Knowing how many (and which) criteria a non-compliant license meets is important. I argue that not allowing derived works is far more important to the idea of “open source in spirit” than not restricting other software by requiring all software distributed alongside it be free.

To add even more complication, not all violations of the same criteria are equal. A license that restricts users from hunting humans for sport would be seen more favorably than a license that restricts users from making ice cream.

Saying a license is OSD-compliant tells us something. Saying it is non-compliant tells us nothing.I don’t know if there’s a succinct way to express the 1,024 possible ways a license could be non-compliant. Certainly there is not if you also include the specific reasoning.

As I showed above, saying a license is 90% compliant is not particularly useful if the 10% is really important to you. And not all 90%s are created equal. It doesn’t make sense to put the criteria on a spectrum and describe the license by how far along it gets. Again, the violation may or may not matter for your purposes. And how can we say which criteria are most important in a way that will garner any sort of widespread support?

It may be possible to group the criteria into two or three broader categories. I’m not entirely sure that would be easy to express—certainly not in a simple chart.

Do we care?

And then there’s the question of if that even matters. I wrote last week’s “free and open source software is not the end goal” post as I thought about this question. From an intellectual property law standpoint, OSD compliance matters. (In that it gives you at least a broad idea of what you’re working with.) From a “why the hell am I writing this software to begin with?” standpoint, I’m not sure that it does.

We’re back to the beginning. If the goal is to write software that advances the state of humanity, you may choose a license that is explicitly not OSD-compliant because you don’t want it used for nefarious purposes. That’s a valid choice, although a very complicated one. Is it reasonable to lump that in with all of the other non-compliant licenses? The answer depends on your context.

There is no easy answer. Tobie’s other axis (follows norms) is also messy. Even more, probably, because there’s no defined standard to measure against. Perhaps for this purpose we continue to treat it as a binary after all. The model can show which quadrant a project falls in; understanding why is left as an exercise to the reader.

Refining the model to account for all (okay, some) of the complexities I’ve discussed would make an excellent dissertation topic for an aspiring PhD student.

Free and open source software is not the end goal

When I first started thinking about this article, the title was going to be “I don’t care about free software anymore.” But I figured that would be troll bait and I thought I should be a little less spicy. It’s true in a sense, though. I don’t care about free/open source software as an end goal.

The Free Software Foundation (FSF) says “free software is about having control over the technology we use in our homes, schools and businesses”. The point isn’t that the software itself is freely-licensed, it’s about what the software license permits or restricts. I used to think that free software was a necessary-but-insufficient condition for users having control over their computing. I don’t think that’s necessarily the case anymore.

Why free software might not matter

Software isn’t useful until someone uses it. So we should evaluate software in that context. And most software use these days involves 1. data and 2. computers outside the user’s control. We’ll get back to #2 in a moment, but I want to focus on the data. If Facebook provided the source code to their entire stack tomorrow—indeed, if they had done it from the beginning—that would do nothing to prevent the harms caused by that platform. One, it does nothing to diminish the “joys” of spreading disinformation. Two, it would be no guarantee that something else isn’t reading the data.

While we were so focused on the software, we essentially ignored the data. Now, the data is just as important, if not more, as the software. There are plenty of examples of this in my talk “We won. Now what?” presented at DevConf.CZ (25 minutes) and DevConf.US (40 minutes) last year. Being open is no guarantee of data protection, just as being proprietary is not guarantee of data harm.

We’ll always use other people’s computers

Let’s return to the “computers outside the user’s control” point. There’s a lot of truth to the “there is no cloud, there’s only other people’s computers” argument. And certainly if everyone ran their own services, that would reduce the risk of harm.

But here in the real world, that’s not going to happen. Most people cannot run their own software services—they have neither the skill nor the resources. Among those who do, many have no desire to. Apart from the impossibility of people running their own services, there’s the fact that communication means that the information lives in two places, so you’re still using someone else’s computer.

It’s all very complicated

There’s also the question of whether or not the absolutist view of software freedom is the right approach. The free software movement seems to be very libertarian in nature: if each user has freedom over their computing, that is a benefit to everyone. Others would argue (as the Ethical Source movement has) that enabling unethical uses of software is harmful. These two positions are at odds.

Whether or not you think the software license is the appropriate places to address this issue, I suspect many, if not most, developers would prefer that their software not be used for evil purposes. In order to enforce that, the software becomes non-free.

This is a complicated issue, with no right answer and no universal agreement. I don’t know what the way forward is, but I know that we cannot act like free software is the end goal. If we want to get the general public on board, we have to convince them in terms that make sense to their values and concerns, not ours. We must make software that is useful and usable in addition to being free. And we must understand that people choosing non-free software is not a moral failing but a decision to optimize for other values. We must update our worldview to match the 2020s; the 1990s are not coming back.

What does “open source” mean in 2021?

The licensing discourse in the last few weeks has highlighted a difference between what “open source” means and what we’re talking about when we use the term. Strictly speaking, open source software is software released under a license approved by the Open Source Initiative. In most practical usage, we’re talking about software developed in a particular way. When we talk about open source, we talk about the communities of users and developers, (generally) not the license. “Open source” has come to define an ethos that was all have our own definition of.

Continue reading

Open source is still not a business model

If you thought 2021 was going to be the year without big drama in the world of open source licensing, you didn’t have to wait long to be disappointed. Two stories have already sprung up in the first few weeks of the year. They’re independent, but related. Both of them remind us that open source is a development model, not a business model.

Elasticsearch and Kibana

A few years ago, it seemed like I couldn’t go to any sysadmin/DevOps conference or meetup without hearing about the “ELK stack“. ELK stands for the three pieces of software involved: Elasticsearch, Logstash, and Kibana. Because it provided powerful aggregation, search, and visualization of arbitrary log files, it became very popular. This also meant that Amazon Web Services (AWS) saw value in providing an Elasticsearch service.

As companies moved more workloads to AWS it made sense to pay AWS for Amazon Elasticsearch Service instead of paying Elastic. This represented what you might call a revenue problem for Elastic. So they decided to follow MongoDB’s lead and change their license to the Server Side Public License (SSPL).

The SSPL is essentially a “you can’t use it, AWS” license. This makes it decidedly not open source. Insultingly, Elastic’s announcement and follow-up messaging include phrases like “doubling down on open”, implying that the SSPL is an open source license. It is not. It a source-available license. And, as open source business expert VM Brasseur writes, it creates business risk for companies that use Elasticsearch and Kibana.

Elastic is, of course, free to use whatever license it wants for the software it develops. And it’s free to want to make money. But it’s not reasonable to get mad at companies using the software under the license you chose to use for it. Picking a license is a business decision.

Shortly before I sat down to write this post, I saw that Amazon has forked Elasticsearch and Kibana. They will take the last-released versions and continue to develop them as open source projects under the Apache License v2. This is entirely permissible and to be expected when a project makes a significant licensing change. So now Elastic is in danger of a sizable portion of the community moving to the fork and away from their projects. If that pans out, it may end up being more harmful than Amazon Elasticsearch Service ever was.

Nmap Public Source License

The second story actually started in the fall of 2020, but didn’t seem to get much notice until after the new year. The developers of nmap, the widely-used security scanner, began using a new license. Prior to the release of version 7.90, nmap was under a modified version of the GNU General Public License version 2 (GPLv2). This license had some additional “gloss”, but was generally accepted by Linux distributions to be a valid free/open source software license.

With version 7.90, nmap is now under the Nmap Public Source License (NPSL). Version 0.92 of this license contained some phrasing that seemed objectionable. The Gentoo licenses team brought their concerns to the developers in a GitHub issue. Some of their concerns seemed like non-issues to me (and to the lawyers at work I consulted with on this), but one part in particular stood out.

Proprietary software companies wishing to use or incorporate Covered Software within their programs must contact Licensor to purchase a separate license

It seemed clear that the intent was to restrict proprietary software, not otherwise-compliant projects from companies that produce proprietary software. Nonetheless, as it was written, it constituted a violation of the Open Source Definition, and we rejected it for use in Fedora.

To their credit, the developers took the feedback well and quickly released an updated version of the license. They even retroactively licensed affected releases under the updated license. Unfortunately, version 0.93 still contains some problems. In particular, the annotations still express field of endeavor restrictions.

While the license text is the most important part, the annotations still matter. They indicate the intent of the license and guide the interpretation by lawyers and judges. So newer versions of nmap remain unsuitable for some distributions.

Licenses are not for you to be clever

Like with Elastic, I’m sympathetic to the nmap developers’ position. If someone is going to use their project to make money, they’d like to get paid, too. That’s an entirely reasonable position to take. But the way they went about it isn’t right. As noted in the GitHub issue, they’re not copyright attorneys. If they were, the license would be much better.

It seems like the developers are fine with people free-riding profit off of nmap so long as the software used to generate the profit is also open source. In that case, why not just use a professionally-drafted and vetted license like the AGPL? The NPSL is already using the GPLv2 and adding more stuff on top of it, and it’s the more stuff on top of it that’s causing problems.

Trying to write your business model into a software license that purports to be open source is a losing proposition.

Apache Software Foundation moves to GitHub

Last week, GitHub and the Apache Software Foundation (ASF) announced that ASF migrated their git repositories to GitHub. This caused a bit of a stir. It’s not every day that “the world’s largest open source foundation” moves to a proprietary hosting platform.

Free software purists expressed dismay. One person described it as “a really strange move In part because Apache’s key value add [was] that they provided freely available infrastructure.” GitHub, while it may be “free as in beer”, is definitely not “free as in freedom”. git itself is open source software, but GitHub “special sauce” is not.

For me, it’s not entirely surprising that ASF would make this move. I’ve always seen ASF as a more pragmatically-minded organization than, for example, the Free Software Foundation (FSF). I’d argue that the ecosystem benefits from having both ASF- and FSF-type organizations.

It’s not clear what savings ASF gets from this. Their blog post says they maintain their own mirrors, so there’s still some infrastructure involved. Of course, it’s probably smaller than running the full service, but by how much?

More than a reduced infrastructure footprint, I suspect the main benefit to the ASF is that it lowers the barrier to contribution. Like it or not, GitHub is the go-to place to find open source code. Mirroring to GitHub makes the code available, but you don’t get the benefits of integrating issues and pull requests (at least not trivially). Major contributors will do what it takes to adopt the tool, but drive by contributions should be as easy as possible.

There’s also another angle, which probably didn’t the drive the decision but brings a benefit nonetheless. Events like Hacktoberfest and 24 Pull Requests help motivate new contributors, but they’re based on GitHub repositories. Using GitHub as your primary forge means you’re accessible to the thousands of developers who participate in these events.

In a more ideal world, ASF would use a more open platform. In the present reality, this decision makes sense.

Releasing open source software is not immoral

Matt Stancliff recently made a bold statement on Twitter:

https://twitter.com/mattsta/status/1117794650742513664

He made this comment in the context of the small amount of money the largest tech companies use to fund open source. With the five largest companies contributing less than a percentage of their annual revenue, open source projects would have two billion dollars of support. These projects are already subsidizing the large corporations, he argues, so they deserve some of the rewards.

This continues the recent trend of people being surprised that people will take free things and not pay for them. Developers who choose to release software under an open source license do so with the knowledge that someone else may use their software to make boatloads of money. Downstream users are under no obligation to remunerate or support upstreams in any way.

That said, I happen to think it’s the right thing to do. I contributed to Fedora as a volunteer for years as a way to “pay back” the community that gave me a free operating system. At a previous company, we made heavy use of an open source job scheduler/resource manager. We provided support on the community mailing lists and sponsored a reception at the annual conference. This was good marketing, of course, but it was also good community citizenship.

At any rate, if you want to make a moral judgment about open source, it’s not the release of open source software that’s the issue. The issue is parasitic consumption of open source software. I’m sure all of the large tech companies would say they support open source software, and they probably do in their own way. But not necessarily in the way that allows small-but-critical projects to thrive.

Toward a more moral ecosystem

Saying “releasing open source software has become immoral” is not helpful. Depriving large companies of open source would also deprive small companies and consumers. And it’s the large companies who could best survive the loss. Witness how MongoDB’s license change has Amazon using DocumentDB instead; meanwhile Linux distributions like Fedora are dropping MongoDB.

It’s an interesting argument, though, because normally when morality and software are in the mix, it’s the position that open source (or “free software” in this context, generally) is the moral imperative. That presents us with one possible solution: licensing your projects under a copyleft license (e.g. the GNU General Public License (GPL)). Copyleft-licensed software can still be used by large corporations to make boatloads of money, but at least it requires them to make source (including of derived works) available. With permissively-licensed software, you’re essentially saying “here’s my code, do whatever you want with it.” Of course people are going to take you up on that offer.

Picking communication tools for your community

Communication is key to the success of any project. The tools we use to communicate play a part in how effective our communication is. Recent discussions in Fedora and other projects have made me consider what tool selection looks like. Should Discourse replace mailing lists? Should Telegram replace IRC? I’m not going to answer those questions.

There’s no one right tool, just a set of considerations to think about in selecting communications tooling. Each community needs to arrive at a consensus about what works best for their workflow and culture, and keep in mind that the decision may attract some contributors while driving others away.

In this post, I’m going to broadly lump tools into two categories: synchronous and asynchronous. Many tools can be used for both to a decent approximation, but most will pretty obviously fall into one category or the other. Picking one tool to rule them all is a valid option, but be aware that it immediately favors one category of communication over the other. And keep in mind that for large projects, some sub-teams may choose different platforms. That’s fine so long as people who want to participate know where to look.

Considerations for all tools

Self-hosted or externally-hosted. Do you have the resources to maintain the tool? If you do, that’s a way to save money and maintain control, but it’s also time that your community members can’t spend working on whatever your community is doing. Externally-hosted tooling (either free or paid) might give you less flexibility, but it can also be more isolated from internal infrastructure outages.

Open source or proprietary. This is entirely a value judgement for your community. For some communities, anything that’s not open source is a non-starter. Others might not care at all one way or another. Most will fall somewhere on the spectrum between.

Federated or centralized. Can the community connect their own tools together (e.g. like with email) or is it a centralized system (like most social media platforms)? The trend is definitely toward centralized systems these days, so you may have to work harder to find a federated system that meets your needs.

Public or private. Can outsiders see what you’re saying? For many open source projects, public visibility is important. But even in those communities, some conversations may need to take place in private or semi-private.

Archived or ephemeral. Do you want to be able to go back and see what was said last month, last year, or last decade? Some conversations aren’t worth keeping, but records of important decisions probably are. Does your tool allow you to meet your archival needs?

Considerations for synchronous tools

Sometimes you really need to talk to people in real time.

Mobile experience. It’s 2019. People do a lot on their phones, especially if their contribution to your community happens during their workday or if they travel frequently. What is the mobile experience like for the tools you’re evaluating? It’s not just a matter of if clients exist, but what’s the whole experience. If they disconnect while on an airplane, do they lose all the messages that were sent in their absence?

Status and alerting. What happens if someone stays logged in and goes away for a little bit? Do they have the ability to suppress notifications? Is there any way to let others know “I’m away or busy, don’t expect an immediate reply”?

Audio, video, and screen sharing. Sometimes you need the high-bandwidth modes of communication in order to get your full message across (or just shortcut a lot of back-and-forth). Does the tool you’re looking at provide this? Is it usable for those who can’t participate due to bandwidth or other constraints?

Integrations. Can you display GIFs? The ability to speak entirely in animated images can be either a feature or a bug, depending on the community’s culture. But if it’s important one way or another, you’ll want to make sure your tool matches your needs. Of course, there are other integrations that might matter to. Can your build system post alerts? Does the tool automatically recognize certain links and display them in an particular manner?

Considerations for asynchronous tools

Of course, you’re not all going to be sitting at your computer at the same time. People go on vacation. They live in different time zones. They step away for 10 minutes to get a cup of coffee. Whatever the reason, you’ll need to communicate asynchronously sometimes.

Push or pull. Email is a push mechanism. Your message arrives in my inbox whether I’ve asked it to or not. Web fora are a pull mechanism. I have to go check them (yes, some forum tools provide an email interface). Which works better for your workflow and community? Pull mechanisms are easier to ignore when you want to step away for a little while, but they also mean you might forget to check when you do want to pay attention.

Is it a ticket system? I haven’t really talked about ticket systems/issue trackers because I don’t consider them a general communication tool. But for some projects, all the discussion that needs to happen happens in GitHub issues or another ticket tracker. If that works for you, there’s no point in adding a new tool to the mix.