Open source is selfish: that’s good and bad

Back in May, Devin Prater wrote an excellent piece on Medium titled “Linux Accessibility: an unmaintained Mess“. Devin talks about the poor state of accessibility on mainstream Linux distributions. While blind people have certainly used Linux, it’s generally not an easy task. There’s a simple explanation for this: most open source contributors aren’t blind.

There’s no rule that you can’t make accessible software if you don’t need that particular accessibility feature. But for many open source contributors, their contributions are based on “scratching their own itch.” People work on the things that are personally interesting to them or impact them in some way.

That’s a good thing! It means they’re invested in how well the software works. I’m sure you’ve used some applications where you thought “there’s no way the people who made this actually used it.”

The problem comes when we’re excluding potential users and contributors. People with vision problems can’t contribute because they can’t easily use the software. And when they can use it, the tools for contributing add another barrier. I can’t imagine trying to understand a patch or an XML file read aloud, but there are people who have to do that.

In Program Management for Open Source Software, I wrote “software is only useful to the degree that people can use it”. I don’t have a great solution. As a community, we need to figure out how to keep the good part of the selfishness while being more inclusive.

SaaS makes the Linux desktop work

That take got fewer bites than I expected, so either it’s not very spicy or I need to repeat myself. But I want to give myself some room to expand on this idea.

To the average user, operating systems are boring. In fact, they’re mostly irrelevant. With the exception of some specialized applications (either professional or gaming), the vast majority of users could sit down at any computer and do what they need to do. Give them a web browser, and they can get to everything else.

For the purposes that matter, the Linux desktop has won. Except it’s not traditional distributions like Fedora or Debian. It’s Android and ChromeOS. And it’s not on desktop PCs. It’s on phones, tablets, and some laptops. If we meant something else when we spoke of “The Year of the Linux Desktop”, we should have been more specific.

That said, Linux desktops as Linux enthusiasts envisioned them are suitable for mainstream users. But it’s not because of native, locally-running apps; it’s because software-as-a-service (SaaS) makes the OS irrelevant.

This is not a cause for alarm. It’s actually an opportunity. It’s never been easier to move someone from Windows or macOS to Linux. You don’t have to give them a mapping of all of their old apps, you just say “here’s your browser. Have fun!” That’s not to say that the ecosystem lacks first-rate applications. Great FOSS applications exist for all OSes. But with SaaS, the barrier to changing the OS is dramatically reduced.

Of course, SaaS has problems, both technical and philosophical. We shouldn’t ignore those. The concerns have just moved up to another later. But we have the opportunity to move more people to Linux while we — as both FOSS communities and society in general —- address the concerns of SaaS. Or, perhaps more likely, move them up another layer.

Balancing advancement and legacy

Later today, I’ll submit a contentious Change proposal to the Fedora Engineering Steering Committee. Several contributors proposed deprecating support for legacy BIOS starting in Fedora Linux 37. The feedback on the mailing list thread and in social media is…let’s call it “mixed”.

The bulk of the objections distill down to: I have old hardware and it should still work. Indeed, when proprietary operating systems vendors (both in the PC and mobile spaces) embrace varying forms of planned obsolescence, open source operating systems can allow users to continue using the hardware they own. Why shouldn’t it continue to be supported?

Nothing comes for free. Maintaining legacy support requires work. Bugs need fixes. Existing code can hamper the addition of new features. Even in a community-driven project, time is not unlimited. It’s hard to ask people to keep supporting software that they’re no longer interested in.

I think some distros should strive to provide indefinite support for older hardware. I don’t think all distros need to. In particular, Fedora does not need to. That’s not what Fedora is. “First” is one of our Four Foundations for a reason. Other distros focus on long-term support and less on integrating the latest from upstreams. That’s good. We want different distros to focus on different benefits.

That’s not to say that we should abandon old hardware willy-nilly. It’s a balance between legacy support and advancing innovation. The balance isn’t always easy to find, but it’s there somewhere. There are always tradeoffs.

I don’t have a strong opinion on this specific case because I don’t know enough about it. We have to make this decision at some point. Is that now? Maybe, or maybe not.

Sidebar: it’s hard to know

One of the benefits of (most) open source operating systems also makes these kinds of decisions harder. We don’t collect detailed data about installations. This is a boon for user privacy, but it means we’re generally left guessing about the hardware that runs Fedora Linux. Some educated guesses can be made from the architecture of bug reports or from opt-in hardware surveys. But they’re not necessarily representative. So we’re largely left with hunches and anecdata.

Maybe we should think about how we use language ecosystems

Over the weekend, Bleeping Computer reported on thousands of packages breaking because the developer of a package inserted infinite loops. He did this with intent. The developer had grown frustrated with his volunteer labor being used by large corporations with no compensation. This brings up at least three issues that I see.

FOSS sustainability

How many times have we had to relearn this lesson? A key package somewhere in the dependency chain relies entirely on volunteer or vastly-underfunded labor. The XKCD “Dependency” comic is only a year and a half old, but it represents a truth that we’ve known since at least the 2014 Heartbleed vulnerability. More recently, a series of log4j vulnerabilities made the holidays very unpleasant for folks tasked with remediation.

The log4j developers were volunteers, maintaining code that they didn’t particularly like but felt obligated to support. And they worked their butts off while receiving all manner of insults. That seemingly the entire world depended on their code was only known once it was a problem.

Many people are paid well to maintain software on behalf of their employer. But certainly not everyone. And companies are generally not investing the sustainability of the projects they rely on.

We depend on good behavior

The reason companies don’t invest in FOSS in proportion to the value they get from it is simple. They don’t have to. Open source licenses don’t (and can’t) require payment. And I don’t think they should. But companies have to see open source software as something to invest in for the long-term success of their own business. When they don’t, it harms the whole ecosystem.

I’ve seen a lot of “well you chose a license that let them do that, so it’s your fault.” Yes and no. Just because people can build wildly profitable companies while underinvesting in the software they use doesn’t mean they should. I’m certainly sympathetic to the developers position here. Even the small, mostly unknown software that I’ve developed sometimes invokes a “ugh, why am I doing this for free?” from me—and no one is making money off it!

But we also depend on maintainers behaving. When they get frustrated, we expect they won’t take their ball and go home as in the left-pad case or insert malicious code as in this case. While the anger is understandable, a lot of other people got hurt in the process.

Blindly pulling from package repos is a bad idea

Speaking of lessons we’ve learned over and over again, it turns out that blindly pulling the latest version of a package from a repo is not a great idea. You never know what’s going to break, even if it’s accidental. This still seems to be a common mode in some language ecosystems and it baffles me. With the increasing interest in software supply chains, I wonder if we’ll start seeing that as an area where large companies suddenly decide to start paying attention.

The right of disattribution

While discussing the ttyp0 font license, Richard Fontana and I had a disagreement about its suitability for Fedora. My reasoning for putting it on the “good” list was taking shape as I wrote. Now that I’ve had some time to give it more thought, I want to share a more coherent (I hope) argument. The short version: authors have a fundamental right to require disattribution.

What is disattribution?

Disattribution is a word I invented because the dictionary has no antonym for attribution. Attribution, in the context of open works, means saying who authored the work you’re building on. For example, this post is under the Creative Commons Attribution-ShareAlike 4.0 license. That means you can use and remix it, provided you credit me (Attribution) and also let others use and remix your remix (ShareAlike). On the other hand, disattribution would say something like “you can use and remix this work, but don’t put my name on it.”

Why disattribution?

There are two related reasons an author might want to require disattribution. The first is that either the original work or potential derivatives are embarrassing. Here’s an example: in 8th grade, my friend wrote a few lines of a song about the destruction of Pompeii. He told me that I could write the rest of it on the condition that I don’t tell anyone that he had anything to do with it.

The other reason is more like brand protection. Or perhaps avoiding market confusion. This isn’t necessarily due to embarrassment. Open source maintainers are often overworked. Getting bugs and support requests from a derivative project because the user is confused is a situation worth avoiding.

Licenses that require attribution are uncontroversial. If we can embrace the right of authors to require attribution, we can embrace the right of authors to require disattribution.

Why not disattribution?

Richard’s concerns seemed less philosophical and more practical. Open source licenses are generally concerned with copyright law. Disattribution, particularly in the second reasoning, is closer to trademark law. But licenses are the tool we have available; don’t be surprised when we ask them to do more than they should.

Perhaps the bigger concern is the constraint it places on derivative works. The ttyp0 license requires not using “UW” as the foundry name. Richard’s concern was that two-letter names are too short. I don’t agree. There are plenty of ways to name a project that avoid one specific word. Even in this specific case, a name like “nuwave”—which contains “uw”—because it’s an unrelated “word.”

Excluding a specific word is fine. A requirement that excludes many words or provides some other unreasonable constraint would be the only reason I’d reject such a license.

Reading code is just as important as writing it

Recently, my friend said she quit a job leading development at a company after the CTO went on a 15 minute rant when she said reading code is just as important as writing it. I don’t blame her. I agree. I might even go as far as saying it’s more important.

Of course, there’s no code to read if no one writes code. I’m not saying that writing code is unimportant. But even for developers, reading is a critical skill. You’re not writing new code all of the time. So much development work is reading existing code to fix bugs, add new features, etc. Of course, developers spend a lot of time reading code people wrote on Stack Overflow, too.

Reading code is also critical for people who aren’t developers. I could probably code my way out of a proverbial paper sack, but just barely. But I’ve been able to point developers in the right direction by being able to dig into code. The ability to read code—even if you don’t fully comprehend it—is invaluable in troubleshooting technical issues.

Isn’t it better to contribute code than money?

Recently, I was in a discussion about making contributions to open source projects. One person said it would be nice if their employer gave each employee a budget that could be directed to open source projects at the employee’s discretion. The idea is that it would be a way for employees to support the specific projects that make their jobs or lives better. Another person said “isn’t it better to contribute” code to the project?

No, it is not. Even in software companies, a large percentage of employees lack the skills necessary to make meaningful code contributions to projects. Even when you consider (the very valuable) non-code contributions like documentation, testing, graphic design, et cetera. Money is quicker and easier.

Money gives the project maintainers to put it where they need it. They could buy test hardware, pay for web hosting, hire a contractor, buy themselves a nice cup of coffee. Whatever. This is the same reason charities prefer money over goods for disaster relief donations.

Of course, money isn’t perfect either. Not all projects are equipped to accept financial donations. Even if there’s a way to route money to them, they may not want to deal with tax implications. Loosely-governed projects may not have a good mechanism for deciding how to spend the money. Money can make relationships go south in a hurry.

If you’re a company looking for ways to let employees support the open source projects that they depend on, I advocate the “¿por que no los dos?” approach. Give your employees time to contribute effort in whatever way they’re able. But also give them a pool of money to sprinkle on the projects that provide value to your company.

GitHub is my Copilot

It isn’t, but I thought that made for a good title. You have probably heard about GitHub Copilot, the new AI-driven pair programming buddy. Copilot is trained on a wealth of publicly-available code, including code under copyleft licenses like the GPL. This has lead many people to question the legality of using Copilot. Does code developed with it require a copyleft license?

The legal parts

Reminder: I am not a lawyer.

No. While I’d love to see the argument play out in court, I don’t think it’s an issue. For as much as I criticize how we apply AI in society, I don’t think this is an illegal case. In the same way that the book I’m writing on program management isn’t a derivative work of all of the books and articles I’ve read over the years, Copilot-produced code isn’t a derivative work either.

“But, Ben,” you say. “What about the cases where machine learning models have produce verbatim snippets from code?” In those cases, I doubt the snippets rise to the level of copyrightability on their own. It’d be one thing to reproduce a dozen-line function. But even giving two or three lines…eh.

The part where verbatim reproduction gets interesting is by leaking secrets. I’ve seen anecdotal tales of Copilot helpfully suggesting private keys. This is either: Copilot producing strings that are gibberish because it expects gibberish or Copilot producing a string that someone accidentally checked into a repo. The latter seems more likely. And it’s not a licensing concern at that point. I’m not sure it’s any legal concern at all. But it’s a concern to the owner of the secret if that information gets out into the wild.

The community parts

But being legally permissible doesn’t mean Copilot is acceptable to the community. It certainly feels like it’s a two-trillion dollar company (Microsoft, the parent of GitHub) taking advantage of individual and small-team developers—people who are generally under-resourced. I can’t argue with that. I understand why people would find it gross, even if it’s legal. Of course, open source licenses by nature often permit behavior we don’t like.

Pair programming works well, or so I’m told. If a service like Copilot can be that second pair of eyes sometimes, then it will have a net benefit for open and proprietary code alike. In the right context, I think it’s a good idea. The execution needs some refinement. It would be good to see GitHub proactively address the concerns of the community in services like this. I don’t think Copilot is necessarily the best solution, but it’s a starting point.

[Full disclosure: I own a limited number of Microsoft shares.]

How much will Windows 11 benefit Linux?

Microsoft’s announcement of the hardware requirements for Windows 11 caused quite a stir recently. In particular, the TPM 2.0 and processor requirements exclude a lot of perfectly-usable hardware. I’ve heard folks in the Linux community say this could be an opportunity for Linux to make inroads on the consumer desktop. I disagree.

In free/open source software, we have a tendency to assume that other people care about what we care about. That’s why our outreach efforts often fall flat. As I wrote in February: If we want to get the general public on board, we have to convince them in terms that make sense to their values and concerns, not ours.

The idea that Windows 11 will be a benefit for Linux is founded on the idea that people care what operating system they’re running. They’ll want to upgrade to Windows 11, the thinking goes, but realize they can’t. So this is an opportunity for them to try Linux instead.

The logic is sound, but the premise is flawed. The average user does not care—or maybe even know!—what operating system they have. They care about what the computer does, not what it is. They’ll keep using it until Microsoft drops support for the OS…and then they’ll keep using it well beyond that. That’s why Windows XP had a greater install base in August 2020 (6+ years after support ended) than Windows 8. It’s why Fedora Linux 20 machines still show in repo data a dozen releases later. And it’s not just consumer devices. EPEL 5 still had plenty of activity long after RHEL 5 reached end of life.

For most people, the way they upgrade their operating system these days is by buying a new computer. So it never matters to them if their current computer can run the new version.

Do I like this move by Microsoft? No. I also didn’t like it when Fedora considered changing the CPU baseline last year. Thankfully, the community agreed that it was not the right decision. But whether I like it or not, I don’t expect that it will provide any meaningful boost in Linux desktop adoption.

We’ll have to find other ways to make inroads. Ways that resonate with how people use their computers.

What does it mean for a Linux distribution to be “fresh”?

I recently had a discussion with Luboš Kocman of openSUSE about how distros can monitor their “freshness”. In other words: how close is a distro to upstream? From our perspectives, it’s helpful to know which packages are significantly behind their upstreams. These packages represent areas that might need attention, whether that be a gentle nudge to the maintainer or recruiting additional volunteers from the community.

The challenge is that freshness can mean different things. The Repology project monitors a large number of distributions and upstreams to report on the status. But simply comparing the upstream version number to the packaged version number ignores a lot of very important context.

Updating to the latest upstream version as soon as it comes out is the most obvious definition of “fresh”, but it’s not always the best. Rolling releases (and their users) probably want that. In Fedora, policy is to not do “major updates” within a release. Many other release-oriented distributions have a similar policy, with varying degrees of “major”. Enterprise distributions add another wrinkle: they’ll backport security fixes (and sometimes key features), so the difference in version number doesn’t necessarily tell you what’s missing.

Of course, the upstream’s version number doesn’t necessarily tell you much. Semantic versioning is great, but not everyone uses it. And not everyone that uses it uses it well. If a distribution has version 1.4 and upstream released 1.5, is that a lack of freshness or an intentional decision to avoid mid-release compatibility changes?

I don’t have a good answer. This is a hard problem to solve. Something like Repology may be the best we can do with reasonable effort. But I’d love to have a more accurate view of how fresh Fedora packages are within the bounds of policy.