What I told my kids this morning

It’s more important than ever that we work very hard to be the people we want to be. We must be kind and loving. We must care for ourselves, each other, and those around us who need our help.

Things are not okay. I cannot promise that they will be okay. What I can promise is that we will do our best to love and support you no matter what goes on in the world around us.

Other writing: October 2024

What have I been writing when I haven’t been writing here?

Duck Alignment Academy

Kusari

IT Business Net

GUAC

100 miles on a bike

Sometime in August, I saw a post on Facebook about a fundraiser for the American Heart Association. The goal was to bike for 100 miles in the month of September and get donations. “What the heck? Why not?” I said to myself in a fit of committing myself to things I don’t have the capacity for. It’s on brand, you have to give me that much.

Keep in mind, I haven’t biked 100 miles in the last decade. There was a time when I commuted to work on my bike a couple of times a week. I was much younger then. But what the heck, there’s no reason I couldn’t do this. Plus, my doctor wants me to lose a few pounds, anyway.

The ride

The hardest part, I knew, would be finding the time to ride. It’s a busy time with kids’ activities and whatnot, so I had to get the miles in where I could. I got off to a strong start on Labor Day weekend, and used Sundays to good effect, generally. I snuck in some midday and evening rides when I could.

A bar chart of daily miles for the month of September. On the first, second, 8th, and 15th, I rode more than 10 miles.

Here’s a thing you might not know: Indiana isn’t all flat. The area I live now is far flatter than where I grew up, but it’s not without some hills. The Wabash River, over the millennia, has carved some contours into the elevation map. As an unfortunate result, most of the interesting places to ride are downhill from my house. I used the bike rack at first, but after I’d done a few rides, I got up the nerve to tackle the hill. As you might have guessed, I survived, but it wasn’t always pleasant. On one ride, I went all the way up the trail through Happy Hollow Park (and then back down Happy Hollow Road, which was fun). Only later did I think “oh yeah, I still need to get back up to my house.” My heart rate hit the low 180s, but I got home without walking the bike.

I also met my (admittedly modest) fundraising goal. I tried to goad people into donating more by saying I’d add an extra mile for every $10 over the goal before I reach the 100 mark. But I was chicken and didn’t make that offer until I was almost there.

Line graph of miles and donations over the course of September.

The joy

The exercise was good, as was the fundraising. But the best part was just the joy of being out and about. I’m unabashedly a fan of Greater Lafayette, and I tried to plan my routes in such a way that I could enjoy some of what makes it My City. Some of the places I enjoyed:

Sometimes I rode solo, which gave me some rare alone time. Sometimes I rode with my wife. Sometimes I rode with my youngest kid. Sometimes my two youngest kids and my wife and I all rode around the neighborhood together.

I don’t know if I’ll want to put myself through the stress of trying to make sure I can meet my goal again, but it definitely got me more active and wanting to spend more time on my bike.

Other writing: September 2024

What have I been writing when I haven’t been writing here?

Security Boulevard

Duck Alignment Academy

GUAC

Kusari

Other writing: August 2024

What have I been writing when I haven’t been writing here?

Duck Alignment Academy

  • “Helping” versus being helpful — Every situation is different, so ask “would it be helpful if I _?” When you get agreement on your help, your work will be helpful.
  • Volunteers can’t be blockers — Figure out the truly critical processes and accept that non-critical ones may get missed. Give critical processes to paid contributors.
  • Companies: make sustainable contributions — We learned over the past year or two that corporate participation in an open source project is not guaranteed. The contributions a company makes need to be sustainable long after the company stops participating.
  • Strategic use of bikeshedding — Painting a bike shed can be an excellent icebreaker when getting a group to review a document draft.

GUAC

Kusari

Does it matter if you anger practitioners and enthusiasts?

I put this topic on my kanban board in August of 2023. At the time, Hashicorp had just angered a lot of people by switching their popular tools to the Business Source License (BuSL). This was not long after Red Hat angered a lot of people by changing how they made source code for Red Hat Enterprise Linux (RHEL) public. While the anger was numerous and voluminous, my suspicion was that it didn’t matter.

The people who buy enterprise software are, by and large, not the enthusiasts who care about the practical or philosophical importance of open source software. You can anger the nerds (I use that term with endearment, as I am one, too) all you want, because they’re not the ones who sign the checks. This is a cynical take, but Hashicorp’s subsequent (pending) acquisition by IBM would seem to reinforce it. Much of the analysis I read at the time suggested that switching to the BuSL was part of Hashicorp trying to be more appealing to potential buyers and perhaps the reason that IBM made the acquisition offer. I’m not convinced of the latter, since IBM was willing to pay a lot more for Red Hat, who open sourced everything*.

It doesn’t seem to matter

Shortly after the Hashicorp acquisition was announced, SJVN wrote an article in which he noted that IBM’s share price had fallen 8% on the news. This, combined with the fact that Hashicorp stock fell 22% between the announcement of the BuSL switch and the acquisition news, led him to wonder if the acquisition was a “blunder” on IBM’s part. But as of Friday’s close, just over four months since the news was announced, IBM shares are up almost 10% from the pre-announcement price and have reached a one-year high. If the price isn’t quite at an all-time high, it’s within spitting distance of the 2013 peak.

But now we have actual analysis. RedMonk analyst Rachel Stephens published an article last week examining the effect of license changes on Hashicorp, MongoDB, Elastic, and Confluent. You should read the whole article, but the quick summary is that the license changes appear to have no impact on revenue. The changes in valuation and net income are a little messier, but the conclusion holds: “here does not seem to be a clear link between moving from an open source to proprietary license and increasing the company’s value.”

Importantly, though, the analysis does not show a clear link between moving to a proprietary license and decreasing the company’s value. It’s a small sample size, but the best we can tell, changing the license doesn’t matter either way. If your primary business is enterprises, not SMBs or individuals, there’s little evidence that a change that angers practitioners matters, at least in the short term.

What about Elasticsearch?

Elastic dropped a surprise on Thursday. They announced that Elasticsearch is once again open source, this time under the AGPL. Elastic announced a change from the Apache Software License in January 2021, driven in part by issues with Amazon Web Services. Does this mean they finally caved to pressure from the people who were upset about the initial change?

I don’t think so. On the same day Elastic announced the shift to the AGPL, they also released their quarterly earnings. Although the earnings and revenue beat expectations, the company cut their guidance from the coming quarter. The market was displeased, and about a quarter of Elastic’s market value went poof in a single day.

I’m not a market analyst and I don’t have any inside knowledge, so the best I can do is speculate based on what I know about the software industry. But my take is that this license change is less about responding to the pressure from open source fans and more about reducing ambiguity for potential adoptees. The Server Side Public License has some vague terms that could scare away someone trying the free offering, and a better-known license can remove that friction. Is this a win? Maybe, but not a philosophically-driven one.

Elastic’s re-re-license announcement mentions working issues out with AWS. The fact that they switched to the AGPL seems like a defensive move against future perceived shenanigans. If this is the primary motivator, then it seems like the move away from open source was successful and justifies the first license change.

So what do we do?

Anger from practitioners doesn’t seem like a meaningful consequence, so how do we prevent the sort of license changes that have caused such a stir in the last few years? If I had an easy answer, I’d get paid a lot of money as a consultant. (Spoiler: I do not get paid a lot of money as a consultant.)

But it’s clear when making the case for open source at our employers that we should not rely on philosophical arguments (because those only work when money is cheap and times are good). Nor should we rely on a “this will kill us with enthusiasts” because it turns out that might not matter.

The case for open source has to be presented in terms that matter to the business. This doesn’t have to be revenue or profit, which is good since the links are not always direct. But it does need to connect in clear, direct ways to the business’s goals, strategy, or other decision influences.

The role of the Linux distro in modern computing

John Mark Walker recently published a post on how the Linux distribution has lost prominence in the DevOps ecosystem (right around the time we started using the term “DevOps”) and how it might be poised for a comeback. His take is centered around the idea of the “download your dependencies on the fly as you build and deploy software” being a risky proposition, supply-chain-securitally-speaking.

I can’t disagree with that. I’ve always felt that modern language ecosystems are bonkers in that regard. Yes, it’s great that there are better dependency relationships than in C/C++ where the relationships might as well be “LOL, GFY”. But the idea that you’d just download some libraries on the fly and hope it all works as expected? Well, we’ve learned a few times how that can go sideways.

So, yeah. I generally agree with what John Mark says here. Heck, I’ve even given a talk on several occasions with a premise of “operating systems are boring.” So can the Linux distribution become un-boring again and help us fix our woeful supply chains?

Why distros aren’t the answer

The fundamental problem that Linux distributions face is sometimes called “too fast, too slow.” Distributions prioritize stability and coherence to some degree, even the cutting edge distributions. Enterprise distributions — or more accurately, their customers — place a particular emphasis on long-term stability. But development tools and programming languages change quickly.

Some applications need the latest version of libraries. Others update slowly and need old versions. It’s hard to meet both of those needs at the same time. Various attempts have been made, like Fedora’s Modularity, but there’s no great answer. The modern language ecosystems exist, in part, to sidestep this problem.

Another problem is that not all distribution package maintainers are technically solid or even ethical. The vast majority are, of course, but there are exceptions. As a package maintainer myself, I’m not adding a ton of value from a supply chain security standpoint. I take the upstream sources, wrap them in an RPM spec file, and submit them to the build system.

Distributions also create a disconnect between developers and end users. This separation has some value, but it also creates additional points for slow bug fixes, malware injection, and social engineering attacks.

Why distros can be the answer

While it’s true that not all distribution package maintainers are capable of fixing issues in the packages they maintain, some can. And even when a maintainer can’t fix an issue, they can get help from other maintainers in the distribution’s community. Issues that upstream chooses not to fix can be fixed in the distribution’s build.

Distributions aren’t just RPMs or debs or whatever anymore. Contained application delivery methods like Flatpak, Snap, and toolbx give distributions the ability to provide curated environments in a way that improves parallel installability. Of course, upstreams can produce their own containers, but this gives end users the ability to pick the right option for their needs.

In conclusion

I’m not sure I see distributions making the comeback that John Mark hopes for. Enterprise distros move too slowly, by and large, to address the needs of language ecosystems. And there’s not a ton of value in distributions blindly packaging language libraries.

John Mark suggested that organizations would want to “outsource risk mitigation to a curated distribution as much as possible.” I don’t disagree. But community distributions can’t take on the risk that companies want to outsource.

That said, the problem won’t fix itself, so let’s work toward something. If that ends up being Linux distributions, then great. If not, distributions will still have an important role to play.

Other writing: July 2024

What have I been writing when I haven’t been writing here?

Duck Alignment Academy

Kusari

GUAC

Other writing: June 2024

What have I been writing when I haven’t been writing here?

Duck Alignment Academy

Kusari

Open source AI and open data

I’m a little late to the party with this post, but I need to get it out of my head. The question of “what is ‘open source AI’, exactly?” has been a hot topic in some circles for a while now. The Open Source Initiative, keepers of the Open Source Definition, have been working on developing a definition for open source AI. The latest draft notably does not require the training data to be available under an open license. I believe this is a mistake.

Open source AI must include open data

Data is critical to modern computing. I called this out in a 2020 DevConf talk and I can hardly claim to be the first or only person to make this observation. More recently, Tom “spot” Callaway wrote his objections to a definition of “open source AI” that doesn’t include open data. My objections (and I venture to say spot’s as well) have nothing to do with ideological purity. I wrote over three years ago that I don’t care about free/open source software as an end goal. What matters is the human impact.

Even before ChatGPT hit the scene, there were countless examples of AI exacerbating biases and inequities. Part of addressing that issue is providing a better training data set. But if we don’t know what an AI model is trained on, we don’t know what sort of biases it’s reproducing. This is a data problem, not a model weights problem. The most advanced AI in the world is still going to produce biased output if trained on biased sources.

OSI attempts to address this by requiring “data information.” This is insufficient. I’ll again defer to spot to make this case better than I could. OSI raises valid points about how rules governing data can be different than those covering code. Oh well. The solution is to acknowledge that some models won’t meet the requirements instead of watering down the requirements.

No one is owed an “open source AI”

Part of the motivation behind OSI’s choices here seem to be the creation of a definition that commercially-viable AI models can meet. They say “We need an Open Source AI Definition that can effectively guide users and developers to make the right choice. We need one that doesn’t put developers of Open Source AI at a disadvantage compared to proprietary ones.” Tara Tarakiyee wrote in response “Well, if the price of making Open Source ‘AI’ competitive with proprietary ‘AI’ is to break the openness that is fundamental to the definition, then why are we doing it?”

I agree with Tara. His whole post is well worth a read. But what this particular thread comes down to is this: we don’t owe anyone a commercially-viable definition just because doing otherwise is hard. There’s nothing in the Open Source Definition that says “but you can skip some of these requirements if you can’t figure out how to make money.”

“Can’t” and “won’t” aren’t the same thing

I’ve seen some people argue that creating an definition that results in zero “open source AI” models is useless. It’s important to distinguish here between “can’t” and “won’t”: they are not the same.

It’s true that a definition that no model could possibly meet is useless. But a definition that no model currently chooses to meet is valuable. AI developers could certainly choose to make their training data available. If they don’t want to, they don’t get to call their model open source. It’s the same as wanting to release software under a license that doesn’t meet some part of the Open Source Definition. As I said in the previous section, no one is owed a definition that meets their business needs.

The argument is silly, anyway. There are at least two models that would meet a more appropriate definition.

Where to go from here?

I wrote this post because I needed to get the words out of my head and onto “paper”. I have no expectation it will change the direction of OSI’s next draft. They seem pretty committed to their choice at this point. I’m not really sure what is gained by making this compromise. Nothing of worth, I think.

This is a problem we should have been addressing years ago, instead of rushing to catch up once the cat was out of the proverbial bag, Collectively, we seem to have a tendency to skate to where the puck was, not where it will be. This isn’t the first time. At FOSDEM 2021, Bradley Kuhn said something to the effect of “if I would have known proprietary software would be funded by advertising instead of license sales, I would have done a lot of things differently.”

I’m not sure what the next big challenge will be. But you can be sure if I figure it out, I’ll push a lot harder to address it before we get passed by again.