Should we treat OSD compliance as a binary?

So often, we think about whether a software license complies with the Open Source Definition (OSD) as a binary: it complies or it doesn’t. But the OSD has 10 criteria. If a license complies with all except for one of those criteria, it’s non-compliant, but is it non-compliant in the same way that a license that doesn’t comply with four criteria?

I got to thinking about this as I tried to come up with names for the four quadrants in Tobie Langel’s license classification chart. It occurred to me that the bottom half represented two concepts: not explicitly OSD-compliant because it was never submitted and explicitly not OSD-compliant because it violates one or more criteria.

A diagram of the open source landscape considering licenses and norms. Created by Tobie Langel and used under CC BY-SA 4.0.

There must be 50 ways to violate the OSD

Knowing how many (and which) criteria a non-compliant license meets is important. I argue that not allowing derived works is far more important to the idea of “open source in spirit” than not restricting other software by requiring all software distributed alongside it be free.

To add even more complication, not all violations of the same criteria are equal. A license that restricts users from hunting humans for sport would be seen more favorably than a license that restricts users from making ice cream.

Saying a license is OSD-compliant tells us something. Saying it is non-compliant tells us nothing.I don’t know if there’s a succinct way to express the 1,024 possible ways a license could be non-compliant. Certainly there is not if you also include the specific reasoning.

As I showed above, saying a license is 90% compliant is not particularly useful if the 10% is really important to you. And not all 90%s are created equal. It doesn’t make sense to put the criteria on a spectrum and describe the license by how far along it gets. Again, the violation may or may not matter for your purposes. And how can we say which criteria are most important in a way that will garner any sort of widespread support?

It may be possible to group the criteria into two or three broader categories. I’m not entirely sure that would be easy to express—certainly not in a simple chart.

Do we care?

And then there’s the question of if that even matters. I wrote last week’s “free and open source software is not the end goal” post as I thought about this question. From an intellectual property law standpoint, OSD compliance matters. (In that it gives you at least a broad idea of what you’re working with.) From a “why the hell am I writing this software to begin with?” standpoint, I’m not sure that it does.

We’re back to the beginning. If the goal is to write software that advances the state of humanity, you may choose a license that is explicitly not OSD-compliant because you don’t want it used for nefarious purposes. That’s a valid choice, although a very complicated one. Is it reasonable to lump that in with all of the other non-compliant licenses? The answer depends on your context.

There is no easy answer. Tobie’s other axis (follows norms) is also messy. Even more, probably, because there’s no defined standard to measure against. Perhaps for this purpose we continue to treat it as a binary after all. The model can show which quadrant a project falls in; understanding why is left as an exercise to the reader.

Refining the model to account for all (okay, some) of the complexities I’ve discussed would make an excellent dissertation topic for an aspiring PhD student.

Free and open source software is not the end goal

When I first started thinking about this article, the title was going to be “I don’t care about free software anymore.” But I figured that would be troll bait and I thought I should be a little less spicy. It’s true in a sense, though. I don’t care about free/open source software as an end goal.

The Free Software Foundation (FSF) says “free software is about having control over the technology we use in our homes, schools and businesses”. The point isn’t that the software itself is freely-licensed, it’s about what the software license permits or restricts. I used to think that free software was a necessary-but-insufficient condition for users having control over their computing. I don’t think that’s necessarily the case anymore.

Why free software might not matter

Software isn’t useful until someone uses it. So we should evaluate software in that context. And most software use these days involves 1. data and 2. computers outside the user’s control. We’ll get back to #2 in a moment, but I want to focus on the data. If Facebook provided the source code to their entire stack tomorrow—indeed, if they had done it from the beginning—that would do nothing to prevent the harms caused by that platform. One, it does nothing to diminish the “joys” of spreading disinformation. Two, it would be no guarantee that something else isn’t reading the data.

While we were so focused on the software, we essentially ignored the data. Now, the data is just as important, if not more, as the software. There are plenty of examples of this in my talk “We won. Now what?” presented at DevConf.CZ (25 minutes) and DevConf.US (40 minutes) last year. Being open is no guarantee of data protection, just as being proprietary is not guarantee of data harm.

We’ll always use other people’s computers

Let’s return to the “computers outside the user’s control” point. There’s a lot of truth to the “there is no cloud, there’s only other people’s computers” argument. And certainly if everyone ran their own services, that would reduce the risk of harm.

But here in the real world, that’s not going to happen. Most people cannot run their own software services—they have neither the skill nor the resources. Among those who do, many have no desire to. Apart from the impossibility of people running their own services, there’s the fact that communication means that the information lives in two places, so you’re still using someone else’s computer.

It’s all very complicated

There’s also the question of whether or not the absolutist view of software freedom is the right approach. The free software movement seems to be very libertarian in nature: if each user has freedom over their computing, that is a benefit to everyone. Others would argue (as the Ethical Source movement has) that enabling unethical uses of software is harmful. These two positions are at odds.

Whether or not you think the software license is the appropriate places to address this issue, I suspect many, if not most, developers would prefer that their software not be used for evil purposes. In order to enforce that, the software becomes non-free.

This is a complicated issue, with no right answer and no universal agreement. I don’t know what the way forward is, but I know that we cannot act like free software is the end goal. If we want to get the general public on board, we have to convince them in terms that make sense to their values and concerns, not ours. We must make software that is useful and usable in addition to being free. And we must understand that people choosing non-free software is not a moral failing but a decision to optimize for other values. We must update our worldview to match the 2020s; the 1990s are not coming back.

Thoughts on Elastic License v2

Yesterday Elastic announced a revision to their not-great Elastic License. The Elastic License v2 was updated based on feedback from the community and apparently had a lawyer’s input. And while they seem to be backing off trying to imply that it is open source (because it decidedly is not), it still doesn’t seem like a good license.

First of all, it doesn’t comply with the Open Source Definition, so if that’s important to you, that’s all you need to know. I’m assuming if you’re reading this, you care about the license beyond that. And while I’m not a lawyer (so this is very much not legal advice), here are my thoughts: it’s vague! Seriously, the vagueness makes it a big risk whether or not you care about OSD compliance (and there are many reasons you might not, as I’ll discuss in an upcoming post).

The first line in the Limitations section reads thus:

You may not provide the software to third parties as a hosted or managed service, where the service provides users with access to any substantial set of the features or functionality of the software.

This contains two things I have questions about. First of all, what is a “managed service” exactly? Does that include consulting services where someone provides direct management of a customer’s software? I have a good idea of what “managed service” means in industry terms, but if a licensor using this software decides they don’t like what you’re doing, there’s enough vagueness there for them to cause you problems. And of course, if you want to use it in a Software-as-a-Service model, you can’t use it under this license. You can use it under the SSPL, of course, but that is a non-starter for a lot of users.

Secondly, what is a “substantial set of the features or functionality of the software”? If someone does their own implementation of the functionality, does that count? If someone develops additional code that extends the functionality of the software and the upstream project later adds that functionality, does the additional code now violate the license?

Another problem is that it treats “you” and “your company” as distinct entities. This doesn’t make a lot of sense to me. If I use software on behalf of my employer, the employer is the licensee. The “Patents” section contains the only uses of “your company” and says “[i]f your company makes such a claim, your patent license ends immediately for work on behalf of your company”, but that’s redundant because the license was always for my company, not for me.

Frankly, I don’t see why anyone would use this license, particularly now that Amazon has forked the project.

What does “open source” mean in 2021?

The licensing discourse in the last few weeks has highlighted a difference between what “open source” means and what we’re talking about when we use the term. Strictly speaking, open source software is software released under a license approved by the Open Source Initiative. In most practical usage, we’re talking about software developed in a particular way. When we talk about open source, we talk about the communities of users and developers, (generally) not the license. “Open source” has come to define an ethos that was all have our own definition of.

Continue reading

Open source is still not a business model

If you thought 2021 was going to be the year without big drama in the world of open source licensing, you didn’t have to wait long to be disappointed. Two stories have already sprung up in the first few weeks of the year. They’re independent, but related. Both of them remind us that open source is a development model, not a business model.

Elasticsearch and Kibana

A few years ago, it seemed like I couldn’t go to any sysadmin/DevOps conference or meetup without hearing about the “ELK stack“. ELK stands for the three pieces of software involved: Elasticsearch, Logstash, and Kibana. Because it provided powerful aggregation, search, and visualization of arbitrary log files, it became very popular. This also meant that Amazon Web Services (AWS) saw value in providing an Elasticsearch service.

As companies moved more workloads to AWS it made sense to pay AWS for Amazon Elasticsearch Service instead of paying Elastic. This represented what you might call a revenue problem for Elastic. So they decided to follow MongoDB’s lead and change their license to the Server Side Public License (SSPL).

The SSPL is essentially a “you can’t use it, AWS” license. This makes it decidedly not open source. Insultingly, Elastic’s announcement and follow-up messaging include phrases like “doubling down on open”, implying that the SSPL is an open source license. It is not. It a source-available license. And, as open source business expert VM Brasseur writes, it creates business risk for companies that use Elasticsearch and Kibana.

Elastic is, of course, free to use whatever license it wants for the software it develops. And it’s free to want to make money. But it’s not reasonable to get mad at companies using the software under the license you chose to use for it. Picking a license is a business decision.

Shortly before I sat down to write this post, I saw that Amazon has forked Elasticsearch and Kibana. They will take the last-released versions and continue to develop them as open source projects under the Apache License v2. This is entirely permissible and to be expected when a project makes a significant licensing change. So now Elastic is in danger of a sizable portion of the community moving to the fork and away from their projects. If that pans out, it may end up being more harmful than Amazon Elasticsearch Service ever was.

Nmap Public Source License

The second story actually started in the fall of 2020, but didn’t seem to get much notice until after the new year. The developers of nmap, the widely-used security scanner, began using a new license. Prior to the release of version 7.90, nmap was under a modified version of the GNU General Public License version 2 (GPLv2). This license had some additional “gloss”, but was generally accepted by Linux distributions to be a valid free/open source software license.

With version 7.90, nmap is now under the Nmap Public Source License (NPSL). Version 0.92 of this license contained some phrasing that seemed objectionable. The Gentoo licenses team brought their concerns to the developers in a GitHub issue. Some of their concerns seemed like non-issues to me (and to the lawyers at work I consulted with on this), but one part in particular stood out.

Proprietary software companies wishing to use or incorporate Covered Software within their programs must contact Licensor to purchase a separate license

It seemed clear that the intent was to restrict proprietary software, not otherwise-compliant projects from companies that produce proprietary software. Nonetheless, as it was written, it constituted a violation of the Open Source Definition, and we rejected it for use in Fedora.

To their credit, the developers took the feedback well and quickly released an updated version of the license. They even retroactively licensed affected releases under the updated license. Unfortunately, version 0.93 still contains some problems. In particular, the annotations still express field of endeavor restrictions.

While the license text is the most important part, the annotations still matter. They indicate the intent of the license and guide the interpretation by lawyers and judges. So newer versions of nmap remain unsuitable for some distributions.

Licenses are not for you to be clever

Like with Elastic, I’m sympathetic to the nmap developers’ position. If someone is going to use their project to make money, they’d like to get paid, too. That’s an entirely reasonable position to take. But the way they went about it isn’t right. As noted in the GitHub issue, they’re not copyright attorneys. If they were, the license would be much better.

It seems like the developers are fine with people free-riding profit off of nmap so long as the software used to generate the profit is also open source. In that case, why not just use a professionally-drafted and vetted license like the AGPL? The NPSL is already using the GPLv2 and adding more stuff on top of it, and it’s the more stuff on top of it that’s causing problems.

Trying to write your business model into a software license that purports to be open source is a losing proposition.

Did software stagnate in 1996?

Betteridge’s Law says “no”. But in a blog post last week, Jonathan Edwards says “yes”. Specifically, he says:

Software is eating the world. But progress in software technology itself largely stalled around 1996. 

It’s not clear what Edwards thinks happened in 1996. Maybe he blames the introduction of the Palm PIlot? In any case, he argues that the developments since 1996 have all been incremental improvements upon existing technology. Nothing revolutionary has happened in programming languages, databases, etc.

This has real “old man yells at cloud” energy. Literally. He includes “AWS” in his list of technology he dismisses.

Edwards sets up a strawman to knock down. Maybe “[t]his is as good as it gets: a 50 year old OS, 30 year old text editors, and 25 year old languages,” he proposes. “Bullshit,” he says.

I’d employ my expletive differently: who gives a shit?

Programming does not exist for the benefit of programmers. Software is written to do something for people. The universe of what is possible with computing is inarguably broader than in 1996. Much of that is owed to improvements in hardware, to be sure. And you can certainly argue of what’s possible with computing is bad. But that’s not what’s at issue here.

I don’t see carpenters bemoaning the lack of innovation in hammers. Software development isn’t special. It’s a trade like any other. And if the tools are working, let them work.

I won’t even bother with his “open source is stifling innovation” nonsense. Rebutting that is left as an exercise to the reader.

Moving the website to Lektor

Years ago, I moved all of funnelfiasco.com (except the blog, which runs on WordPress) from artisinally hand-crafted HTML to using a static site generator. At the time, I chose a project called “blatter” which used jinja2 templates to generate a site. This gave me the opportunity to change basic information across the whole site at once. Not something I do often, but it’s a pain when I do.

Unfortunately, blatter was apparently quietly abandoned by the developer. This wasn’t really a problem until Python 2 reached end of life. Fedora (reasonably) retired much of the Python 2 ecosystem. I tried to port it to Python 3, but ran into a few problems. And frankly, the idea of taking on the maintenance burden for a project that hadn’t been updated in years was not at all appealing. So I went looking for something else.

I wanted to find something that used jinja2 in order to minimize the amount of work involved. I also wanted something focused on websites, not blogs specifically. It seems like so many platforms today are blog-first. That’s fine, it’s just not what I want. After some searching and a little bit of trial and error, I ended up selecting Lektor.

The good

Lektor is written in (primarily) Python 3 and uses jinja2 templates, so it hit my most important points. It has a command to run a local webserver for testing. In addition, you can set up multiple servers configurations for deployment. So I can have the content sync to my local web server to verify it and then deploy that to my “production” webserver. Builds are destructive, but the deploys are not, which means I don’t have to shoe-horn everything into Lektor.

Another great feature is the ability to programmatically generate thumbnails of images. I’ve made a little bit of use of that for the time being. In the future, especially if I ever go storm chasing again, I can see myself using that feature a lot more.

Lektor optionally supports writing the page content in markdown. I haven’t done this much since I was migrating pre-written content. I expect new content will be much markdownier. Markdown isn’t flexible enough for a lot of web purposes, but it covers some use cases well. Why write HTML when it’s not needed?

Lektor uses databags to provide input data to templates. I do this using JSON files. Complex operations with that are a lot easier than the embedded Python data structures that Blatter supported.

If I were interested in translating my site into multiple languages, Lektor has good support for that (including changing URLs). It also has a built-in admin and editing console, which is not something I use, but I can see the appeal.

The bad

Unlike Blatter, Lektor puts contents and templates in separate files. This makes it a little more difficult to special-case a specific site.

It also has a “one directory, one file” paradigm. Directories can have “attachments”, which can include html files, but they won’t get processed, so they need to stand alone. This is not such an issue if you’re starting from scratch. Since I’m not, it was more of a headache. You can overwrite the page’s slug, but that also makes certain assumptions.

For the Forecast Discussion Hall of Fame, I wanted to keep URLs as-is. That site has been linked to from a lot of places, and I’d hate to break those inbound links. Writing an htaccess file to redirect to the new URLs didn’t sound ideal either. I ended up writing a one-line patch that passed the argument I need to the python-slugify library. I tried to do it the right way so that it would be configurable, but it was beyond my skill to do so.

The big down side is the fact that the development has ground to a halt. It’s not abandoned, but the development activity happens in spurts. Right now it’s doing what I need it to do, but I worry at some point I’ll have to make a switch again. I’d like to contribute more upstream, but my skills are not advanced enough for this.

GitHub should stand up to the RIAA over youtube-dl

Earlier this week, GitHub took down the repository for the youtube-dl project. This came in response to a request from the RIAA—the recording industry’s lobbying and harassment body. youtube-dl is a tool for downloading videos. The RIAA argued that this violates the anticircumvention protections of the Digital Millennium Copyright Act (DMCA). While GitHub taking down the repository and its forks is true to the principle of minimizing corporate risk, it’s the wrong choice.

Microsoft—currently the world’s second-most valuable company with a market capitalization of $1.64 trillion—owns GitHub. If anyone is in a position to fight back on this, it’s Microsoft. Microsoft’s lawyers should have a one word answer to the RIAA’s request: “no”. (full disclosure: I own a small number of shares of Microsoft)

The procedural argument

The first reason to tell the RIAA where to stick it is procedural. The RIAA isn’t arguing that youtube-dl is infringing its copyrights or circumventing its protections. It is arguing that youtube-dl infringes YouTube’s protections. So even if it is, that’s YouTube’s problem, not the RIAA’s.

The factual argument

I have some sympathy for the anticircumvention argument. I’m not familiar with the specifics of how youtube-dl works, but it’s at least possible that youtube-dl circumvents YouTube’s copy protection. This would be a reasonable basis for YouTube to take action. Again, YouTube, not the RIAA.

I have less sympathy for the infringement argument. youtube-dl doesn’t induce infringement more than a web browser or screen recorder does. There are a variety of uses for youtube-dl that are not infringing. Foremost is the fact that some YouTube videos are under a license that explicitly allows sharing and remixing. Archivers use it to archive content. Some people who have time-variable Internet billing use it to download videos overnight.

So, yes, youtube-dl can be used to infringe the RIAA’s copyrights. It can also be used for non-infringing purposes. The code itself does not infringe. There’s nothing about it that gives the RIAA a justification to take it down.

youtube-dl isn’t the whole story

youtube-dl provides a focal point, but there’s more to it. Copyright law is now used to suppress instead of promote creative works. The DMCA, in particular, favors the large rightsholders over smaller developers and creators. It essentially forces sites to act on a “guilty until proven innocent” model. Companies in a position to push back have an obligation to do so. Microsoft has become a supporter of open source, now it’s time to show they mean it.

We should also consider the risks of consolidation. git is a decentralized system. GitHub has essentially centralized it. Sure, many competitors exist, but GitHub has become the default place to host open source code projects. The fact that GitHub’s code is proprietary is immaterial to this point. A FOSS service would pose the same risk if it became the centralized service.

I saw a quote on this discussion (which I can’t find now) that said “code is free, infrastructure is not.” And while projects self-hosting their code repository, issue tracker, etc may be philosophically appealing, that’s not realistic. Software-as-a-Service has lowered the barrier for starting projects, which is a good thing. But it doesn’t come without risk, which we are now seeing.

I don’t know what the right answer is for this. I know the answer won’t be easy. But both this specific case and the general issues they highlight are important for us to think about.

Linux distros should be opinionated

Last week, the upstream project for a package I maintain was discussing whether or not to enable autosave in the default configuration. I said if the project doesn’t, I may consider making that the default in the Fedora package. Another commenter said “is it a good idea to have different default settings per packaging ? (ubuntu/fedora/windows)”

My take? Absolutely yes. As I said in the post on “rolling stable” distros, a Linux distribution is more than an assortment of packages; it is a cohesive whole. This necessarily requires changes to upstream defaults.

Changes to enable a functional, cohesive whole are necessary, of course. But there’s more than “it works”, there’s “it works the way we think it should.” A Linux distribution targets a certain audience (or audiences). Distribution maintainers have to make choices to make the distro meet that audience’s needs. They are not mindless build systems.

Of course, opinions do have a cost. If a particular piece of software works differently from one distro to another, users get confused. Documentation may be wrong, sometimes harmfully so. Upstream developers may have trouble debugging issues if they are not familiar with the distro’s changes.

Thus, opinions should be implemented judiciously. But when a maintainer has given a change due thought, they should make it.

What do “rolling release” and “stable” mean in the context of operating systems?

In a recent post on his blog, Chris Siebenmann wrote about his experience with Fedora upgrades and how, because of some of the non-standard things he does, upgrades are painful for him. At the end, he said “What I really want is a rolling release of ‘stable’ Fedora, with no big bangs of major releases, but this will probably never exist.”

I’m sympathetic to that position. Despite the fact that developers have worked to improve the ease of upgrades over the years, they are inherently risky. But what would a stable rolling release look like?

“Arch!” you say. That’s not wrong, but it also misses the point. What people generally want is new stuff so long as it doesn’t cause surprise. Rolling releases don’t prevent that, they spread it out. With Fedora’s policy, for example, major changes (should) happen as the release is being developed. Once it’s out, you get bugfixes and minor enhancements, but no big changes. You get the stability.

On the other hand, you can run Fedora Rawhide, which gets you the new stuff as soon as it’s available, but you don’t know when the big changes will come. And sometimes, the changes (big and little) are broken. It can be nice because you get the newness quickly. And the major changes (in theory) don’t all come at once.

Rate of change versus total change

For some people, it’s the distribution of change, not the total amount of change that makes rolling releases compelling. And in most cases, the changes aren’t that dramatic. When updates are loosely-coupled or totally independent, the timing doesn’t matter. The average user won’t even notice the vast majority of them.

But what happens when a really monumental change comes in? Switching the init system, for example, is kind of a big deal. In this case, you generally want the integration that most distributions provide. It’s not just that you get an assortment of packages from your distribution, it’s that you get a set of packages that work together. This is a fundamental feature for a Linux distribution (excepting those where do-it-yourself is the point).

Applying it to Fedora

An alternate phrasing of what I understand Chris to want is “release-quality packages made available when they’re ready, not on the release schedule.” That’s perfectly reasonable. And in general, that’s what Fedora wants Rawhide to be. It’s something we’re working on, particularly with the ability to gate Rawhide updates.

But part of why we have defined releases is to ensure the desired stability. The QA team and other testers put a lot of effort into automated and manual tests of releases. It’s hard to test against the release criteria when the target keeps shifting. It’s hard to make the distribution a cohesive whole instead of a collection of packages.

What Chris asks for isn’t wrong or unreasonable. But it’s also a difficult task to undertake and sustain. This is one area where ostree-based variants like Fedora CoreOS (for servers/cloud), Silverblue (for desktops), and IoT (for edge devices) bring a lot benefit. The big changes can be easily rolled back if there are problems.