Maybe we should think about how we use language ecosystems

Over the weekend, Bleeping Computer reported on thousands of packages breaking because the developer of a package inserted infinite loops. He did this with intent. The developer had grown frustrated with his volunteer labor being used by large corporations with no compensation. This brings up at least three issues that I see.

FOSS sustainability

How many times have we had to relearn this lesson? A key package somewhere in the dependency chain relies entirely on volunteer or vastly-underfunded labor. The XKCD “Dependency” comic is only a year and a half old, but it represents a truth that we’ve known since at least the 2014 Heartbleed vulnerability. More recently, a series of log4j vulnerabilities made the holidays very unpleasant for folks tasked with remediation.

The log4j developers were volunteers, maintaining code that they didn’t particularly like but felt obligated to support. And they worked their butts off while receiving all manner of insults. That seemingly the entire world depended on their code was only known once it was a problem.

Many people are paid well to maintain software on behalf of their employer. But certainly not everyone. And companies are generally not investing the sustainability of the projects they rely on.

We depend on good behavior

The reason companies don’t invest in FOSS in proportion to the value they get from it is simple. They don’t have to. Open source licenses don’t (and can’t) require payment. And I don’t think they should. But companies have to see open source software as something to invest in for the long-term success of their own business. When they don’t, it harms the whole ecosystem.

I’ve seen a lot of “well you chose a license that let them do that, so it’s your fault.” Yes and no. Just because people can build wildly profitable companies while underinvesting in the software they use doesn’t mean they should. I’m certainly sympathetic to the developers position here. Even the small, mostly unknown software that I’ve developed sometimes invokes a “ugh, why am I doing this for free?” from me—and no one is making money off it!

But we also depend on maintainers behaving. When they get frustrated, we expect they won’t take their ball and go home as in the left-pad case or insert malicious code as in this case. While the anger is understandable, a lot of other people got hurt in the process.

Blindly pulling from package repos is a bad idea

Speaking of lessons we’ve learned over and over again, it turns out that blindly pulling the latest version of a package from a repo is not a great idea. You never know what’s going to break, even if it’s accidental. This still seems to be a common mode in some language ecosystems and it baffles me. With the increasing interest in software supply chains, I wonder if we’ll start seeing that as an area where large companies suddenly decide to start paying attention.

My 2021 Todoist year in review

I use Todoist to manage my to-do lists. I was surprised to receive a “2021 year in review” email from the service the other day, so I thought I’d share some thoughts on it. I’m not what I’d call a productivity whiz, or even a particularly productive person. But the data provide some interesting insights.

2021 status

First, I apparently completed 2900 tasks in 2021. That’s an interestingly-round number. The completed the most tasks in November, which was a little surprising. I don’t feel like November was a particularly busy month for me. However, I’m not surprised that May had my lowest number. I was pretty burnt out, had just handed off a major project at work, and took over e-learning for my kids. May was unpleasant.

MonthTasks
January201
Feburary205
March254
April261
May196
June260
July201
August227
September251
October280
November296
December268

Looking at the days of the week, Thursday had the most completed tasks. Saturday had the fewest (yay! I’m doing an okayish job of weekending.)

Zooming in to the time of day, the 10 AM–12 PM window was my most productive. That makes sense, since it’s after I’ve had a chance to sort through email and have my early meetings. I definitely tend to feel most productive during that time. Perhaps I should start blocking that out for focus work. Similarly, I was most likely to postpone tasks at 3pm. This is generally the time that I either recognize that I’m not going to get something done that day or that I decide I just don’t want to do that thing.

Making sense of it all

Todoist says I’m in the top 2% of users in 2021. Perhaps that argues against my “I’m not particularly productive” assertion. It’s more likely that I just outsource my task management more than the average person. I put a lot of trivial tasks in so that I can get that sweet dopamine hit, but also that I just don’t have to think about them.

I don’t remember if Todoist did a year in review last year, but if they did I spent no time thinking about it. But based on what I’ve learned about the past year, I’m going to guard my late-morning time a little more jealously. I’ll try to save the trivial tasks for the last hour of the work day. This may prove challenging for me. It’s basically a more boring version of the marshmallow test.

Other writing: December 2021

What have I been writing when I haven’t been writing here?

Stuff I wrote

Fedora

Stuff I curated

Sysadvent

Fedora

Indiana COVID-19 update: 21 December 2021

Oh hey, one of these again. I’m mostly doing this as a timestamp of sorts. Indiana identified its first case of the Omicron variant about a week and a half ago. Given the 2–3 day doubling interval seen elsewhere, Indiana could potentially see daily case records by mid-January. Even if Omicron proves to be less virulent, the increased transmissibility may result in steady or increased hospitalizations and deaths. So that’s what the future might hold. Where does the present stand?

Current state

Cases have peaked after climbing since around Halloween. My “weekly cumulative cases change” (the change in the sum of the daily positive cases for the last seven days compared to the sum for the seven days prior) has been in the single negative digits for the last nine days. It was as high as 95% earlier this month. The rate of decrease is slowing a bit in the last few days, though.

Week-over-week (blue) and week-over-two-week (red) differences in COVID-19 cases

Hospitalizations have peaked as well. We spent five consecutive days above 3,000. While we’re below that number again, we’re still at a higher hospitalization rate than the peak of the Delta variant wave in late summer. Late last week, we high a pandemic low for percentage of available ICU bed capacity statewide. As I told my friend the other day, I’m not personally concerned about COVID, I’m concerned about driving.

Day-over-day (blue) and week-over-week (red) changes in hospitalizations.

It’s hard to tell if deaths are peaking or not. The numbers tend to get revised upward for longer and longer periods these days. I do know that (as of this writing), 53 people died a week ago. That’s the highest single-day death toll since early February. While the current cumulative weekly death difference shows a decline starting yesterday, I think the 15–20% numbers a few days back are probably closer to reality. Considering that hospitalization just peaked on Thursday, we’re probably a few days out from the peak in deaths.

Coming up

The Institute for Health Measurement and Evaluation hasn’t done an Indiana model run since 17 November. They’re currently trying to incorporate Omicron into the model. Looking at last winter, we’re at or slightly ahead of this time a year ago. Depending on what measure you look at, the peak last year was in roughly mid-December. With vaccines available, we should see hospitalization and death rates far below that miserable winter. On the other hand, indoor masking is nearly non-existent and the Omicron variant presents a rather significant unknown.

Indiana is the worst state for COVID safety, with low vaccination and high hospitalization. This is a failure of leadership, especially considering that most deaths since July 1 would have been prevented with better vaccination rates. Nearly 25% of Indiana’s total COVID-19 fatalities could have been avoided had right-wing politicians and media not made COVID-19 into a culture war.

As usual, I’ll keep my dashboard updated most days that the Department of Health provides data.

The right of disattribution

While discussing the ttyp0 font license, Richard Fontana and I had a disagreement about its suitability for Fedora. My reasoning for putting it on the “good” list was taking shape as I wrote. Now that I’ve had some time to give it more thought, I want to share a more coherent (I hope) argument. The short version: authors have a fundamental right to require disattribution.

What is disattribution?

Disattribution is a word I invented because the dictionary has no antonym for attribution. Attribution, in the context of open works, means saying who authored the work you’re building on. For example, this post is under the Creative Commons Attribution-ShareAlike 4.0 license. That means you can use and remix it, provided you credit me (Attribution) and also let others use and remix your remix (ShareAlike). On the other hand, disattribution would say something like “you can use and remix this work, but don’t put my name on it.”

Why disattribution?

There are two related reasons an author might want to require disattribution. The first is that either the original work or potential derivatives are embarrassing. Here’s an example: in 8th grade, my friend wrote a few lines of a song about the destruction of Pompeii. He told me that I could write the rest of it on the condition that I don’t tell anyone that he had anything to do with it.

The other reason is more like brand protection. Or perhaps avoiding market confusion. This isn’t necessarily due to embarrassment. Open source maintainers are often overworked. Getting bugs and support requests from a derivative project because the user is confused is a situation worth avoiding.

Licenses that require attribution are uncontroversial. If we can embrace the right of authors to require attribution, we can embrace the right of authors to require disattribution.

Why not disattribution?

Richard’s concerns seemed less philosophical and more practical. Open source licenses are generally concerned with copyright law. Disattribution, particularly in the second reasoning, is closer to trademark law. But licenses are the tool we have available; don’t be surprised when we ask them to do more than they should.

Perhaps the bigger concern is the constraint it places on derivative works. The ttyp0 license requires not using “UW” as the foundry name. Richard’s concern was that two-letter names are too short. I don’t agree. There are plenty of ways to name a project that avoid one specific word. Even in this specific case, a name like “nuwave”—which contains “uw”—because it’s an unrelated “word.”

Excluding a specific word is fine. A requirement that excludes many words or provides some other unreasonable constraint would be the only reason I’d reject such a license.

Other writing: November 2021

What have I been writing when I haven’t been writing here?

Now you see me

  • Compiler s01e08 (podcast) — I talk about contributing technical documentation in open source projects and why you (yes, you!) should contribute.

Stuff I wrote

Fedora

Stuff I curated

Fedora

Using variables in Smartsheet task names

I use Smartsheet to generate the Fedora Linux release schedules. I generally copy the previous release’s schedule forward and update target release date. But then I have to search for the release number (and the next release number, the previous release number, and the previous-previous release number) to update them. Find and replace is a thing, but I don’t want to do it blindly.

But last week, I figured out a trick to use variables in the task names. This way when I copy a new schedule, I just have to update the number once and all of the numbers are updated automatically.

First you have to create a field in the Sheet Summary view. I called it “Release” and set it to be of the Text/Number type. I put the release number in there.

Then in the task name, I can use that field. What tripped me up at first was that I was trying to do variable substitution like you might do in the Bash shell. But really, what you need to do is string concatenation. So I’d use

="Fedora Linux " + Release# + " release"

This results in “Fedora Linux 37 release” when release is set to 37. To get the next release, you do math on the variable:

="Fedora Linux " + (Release# + 1) + " release"

This results in “Fedora Linux 38 release” when release is set to 37. This might be obvious to people who use Smartsheet deeply, but for me, it was a fun discovery. It saves me literally minutes of work every three years.

You can do real HPC in the cloud

The latest Top 500 list came out earlier this week. I’m generally indifferent to the Top 500 these days (in part because China has two exaflop systems that it didn’t submit benchmarks for). But for better or worse, it’s still an important measure for many HPC practitioners. And that’s why the fact that Microsoft Azure cracked the top 10 is such a big deal.

For years, I heard that the public cloud can’t be used for “real” HPC. Sure, you can do throughput workloads, or small MPI jobs as a code test, but once it’s time to do the production workload, it has to be bare metal. This has never not been wrong. With a public cloud cluster as the 10th most powerful supercomputer* in the world, there’s no question that it can be done.

So the question becomes: should you do “real” HPC in the cloud? For whatever “real” means. There are cases where buying hardware and running it makes sense. There are cases where the flexibility of infrastructure-as-a-service wins. The answer has always been—and always will be—run the workload on the infrastructure that best fits the needs. To dismiss cloud for all use cases is petty gatekeeping.

I congratulate my friends at Azure for their work in making this happen. I couldn’t be happier for them. Most of the world’s HPC happens in small datacenters, not the large HPC centers that tend to dominate the Top 500. The better public cloud providers can serve the majority of the market, the better it is for us all.

Book review: The Address Book

How did your street get its name? When did we start numbering buildings? What does it mean to have an address—or to not have one? If any of these questions are interesting to you, you’ll appreciate The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power by Deirdre Mask.

I first heard about this book on the podcast “Every Little Thing“. Mask was a guest on a recent episode and shared the story of a project to name roads in rural West Virginia. This story was relevant to a memory I had long forgotten. Although I grew up on a named road, we didn’t have a numbered address until 911 service came to the area when I was in early elementary school. Prior to that, addresses were just box numbers on rural routes.

But newly-named and newly-numbered roads are not unique to the US. Mask explores how roads were named and renamed in different places over the centuries. Naming, of course, is an expression of power so names and numbers reflect the power at the time. Even today, there are millions of people who don’t have addresses, which increasingly cuts them off from what we understand as modern society.

I’d love a book of trivia about road names. The Address Book is not that. But it’s a fascinating look at the deeper meaning behind the act of naming.

Zillow’s failure isn’t AI, it’s hubris

Zillow’s recent exit from the house-flipping arena was big news recently. In business news, the plummeting stock price and looming massive layoff made headlines. In tech circles, the talk was about artificial intelligence, and how Zillow’s algorithms failed them. And while I love me some AI criticism, I don’t think that’s what’s at play here.

Other so-called “iBuyers” haven’t suffered the same fate as Zillow. In fact, they vastly out-performed Zillow from the reporting I heard. Now maybe the competitors aren’t as AI-reliant as Zillow and that’s why. But I think a more likely cause is one we see time and time again: smart people believing themselves too much.

Being smart isn’t a singular value. Domain and context play big roles. And yet we often see people who are very smart speak confidently on topics they know nothing about. (And yes, this post may be an example of that. I’d counter that this post isn’t really about Zillow, it’s about over-confidence, a subject that have a lot of experience with.) Zillow is really good at being a search engine for houses. It’s okay at estimating the value of houses. But that doesn’t necessarily translate to being good at flipping houses.

I’m sure there are ways the algorithm failed, too. But as in many cases, it’s not a problem with AI as a technology, but how the AI is used. The lesson here, as in every AI failure, should be that we have to be a lot more careful with the decisions we trust to computers.