Reading code is just as important as writing it

Recently, my friend said she quit a job leading development at a company after the CTO went on a 15 minute rant when she said reading code is just as important as writing it. I don’t blame her. I agree. I might even go as far as saying it’s more important.

Of course, there’s no code to read if no one writes code. I’m not saying that writing code is unimportant. But even for developers, reading is a critical skill. You’re not writing new code all of the time. So much development work is reading existing code to fix bugs, add new features, etc. Of course, developers spend a lot of time reading code people wrote on Stack Overflow, too.

Reading code is also critical for people who aren’t developers. I could probably code my way out of a proverbial paper sack, but just barely. But I’ve been able to point developers in the right direction by being able to dig into code. The ability to read code—even if you don’t fully comprehend it—is invaluable in troubleshooting technical issues.

Isn’t it better to contribute code than money?

Recently, I was in a discussion about making contributions to open source projects. One person said it would be nice if their employer gave each employee a budget that could be directed to open source projects at the employee’s discretion. The idea is that it would be a way for employees to support the specific projects that make their jobs or lives better. Another person said “isn’t it better to contribute” code to the project?

No, it is not. Even in software companies, a large percentage of employees lack the skills necessary to make meaningful code contributions to projects. Even when you consider (the very valuable) non-code contributions like documentation, testing, graphic design, et cetera. Money is quicker and easier.

Money gives the project maintainers to put it where they need it. They could buy test hardware, pay for web hosting, hire a contractor, buy themselves a nice cup of coffee. Whatever. This is the same reason charities prefer money over goods for disaster relief donations.

Of course, money isn’t perfect either. Not all projects are equipped to accept financial donations. Even if there’s a way to route money to them, they may not want to deal with tax implications. Loosely-governed projects may not have a good mechanism for deciding how to spend the money. Money can make relationships go south in a hurry.

If you’re a company looking for ways to let employees support the open source projects that they depend on, I advocate the “¿por que no los dos?” approach. Give your employees time to contribute effort in whatever way they’re able. But also give them a pool of money to sprinkle on the projects that provide value to your company.

GitHub is my Copilot

It isn’t, but I thought that made for a good title. You have probably heard about GitHub Copilot, the new AI-driven pair programming buddy. Copilot is trained on a wealth of publicly-available code, including code under copyleft licenses like the GPL. This has lead many people to question the legality of using Copilot. Does code developed with it require a copyleft license?

The legal parts

Reminder: I am not a lawyer.

No. While I’d love to see the argument play out in court, I don’t think it’s an issue. For as much as I criticize how we apply AI in society, I don’t think this is an illegal case. In the same way that the book I’m writing on program management isn’t a derivative work of all of the books and articles I’ve read over the years, Copilot-produced code isn’t a derivative work either.

“But, Ben,” you say. “What about the cases where machine learning models have produce verbatim snippets from code?” In those cases, I doubt the snippets rise to the level of copyrightability on their own. It’d be one thing to reproduce a dozen-line function. But even giving two or three lines…eh.

The part where verbatim reproduction gets interesting is by leaking secrets. I’ve seen anecdotal tales of Copilot helpfully suggesting private keys. This is either: Copilot producing strings that are gibberish because it expects gibberish or Copilot producing a string that someone accidentally checked into a repo. The latter seems more likely. And it’s not a licensing concern at that point. I’m not sure it’s any legal concern at all. But it’s a concern to the owner of the secret if that information gets out into the wild.

The community parts

But being legally permissible doesn’t mean Copilot is acceptable to the community. It certainly feels like it’s a two-trillion dollar company (Microsoft, the parent of GitHub) taking advantage of individual and small-team developers—people who are generally under-resourced. I can’t argue with that. I understand why people would find it gross, even if it’s legal. Of course, open source licenses by nature often permit behavior we don’t like.

Pair programming works well, or so I’m told. If a service like Copilot can be that second pair of eyes sometimes, then it will have a net benefit for open and proprietary code alike. In the right context, I think it’s a good idea. The execution needs some refinement. It would be good to see GitHub proactively address the concerns of the community in services like this. I don’t think Copilot is necessarily the best solution, but it’s a starting point.

[Full disclosure: I own a limited number of Microsoft shares.]

How much will Windows 11 benefit Linux?

Microsoft’s announcement of the hardware requirements for Windows 11 caused quite a stir recently. In particular, the TPM 2.0 and processor requirements exclude a lot of perfectly-usable hardware. I’ve heard folks in the Linux community say this could be an opportunity for Linux to make inroads on the consumer desktop. I disagree.

In free/open source software, we have a tendency to assume that other people care about what we care about. That’s why our outreach efforts often fall flat. As I wrote in February: If we want to get the general public on board, we have to convince them in terms that make sense to their values and concerns, not ours.

The idea that Windows 11 will be a benefit for Linux is founded on the idea that people care what operating system they’re running. They’ll want to upgrade to Windows 11, the thinking goes, but realize they can’t. So this is an opportunity for them to try Linux instead.

The logic is sound, but the premise is flawed. The average user does not care—or maybe even know!—what operating system they have. They care about what the computer does, not what it is. They’ll keep using it until Microsoft drops support for the OS…and then they’ll keep using it well beyond that. That’s why Windows XP had a greater install base in August 2020 (6+ years after support ended) than Windows 8. It’s why Fedora Linux 20 machines still show in repo data a dozen releases later. And it’s not just consumer devices. EPEL 5 still had plenty of activity long after RHEL 5 reached end of life.

For most people, the way they upgrade their operating system these days is by buying a new computer. So it never matters to them if their current computer can run the new version.

Do I like this move by Microsoft? No. I also didn’t like it when Fedora considered changing the CPU baseline last year. Thankfully, the community agreed that it was not the right decision. But whether I like it or not, I don’t expect that it will provide any meaningful boost in Linux desktop adoption.

We’ll have to find other ways to make inroads. Ways that resonate with how people use their computers.

What does it mean for a Linux distribution to be “fresh”?

I recently had a discussion with Luboš Kocman of openSUSE about how distros can monitor their “freshness”. In other words: how close is a distro to upstream? From our perspectives, it’s helpful to know which packages are significantly behind their upstreams. These packages represent areas that might need attention, whether that be a gentle nudge to the maintainer or recruiting additional volunteers from the community.

The challenge is that freshness can mean different things. The Repology project monitors a large number of distributions and upstreams to report on the status. But simply comparing the upstream version number to the packaged version number ignores a lot of very important context.

Updating to the latest upstream version as soon as it comes out is the most obvious definition of “fresh”, but it’s not always the best. Rolling releases (and their users) probably want that. In Fedora, policy is to not do “major updates” within a release. Many other release-oriented distributions have a similar policy, with varying degrees of “major”. Enterprise distributions add another wrinkle: they’ll backport security fixes (and sometimes key features), so the difference in version number doesn’t necessarily tell you what’s missing.

Of course, the upstream’s version number doesn’t necessarily tell you much. Semantic versioning is great, but not everyone uses it. And not everyone that uses it uses it well. If a distribution has version 1.4 and upstream released 1.5, is that a lack of freshness or an intentional decision to avoid mid-release compatibility changes?

I don’t have a good answer. This is a hard problem to solve. Something like Repology may be the best we can do with reasonable effort. But I’d love to have a more accurate view of how fresh Fedora packages are within the bounds of policy.

How I configure sshd at home

My “server” at home isn’t particularly important to the outside world. But by virtue of being on the Internet, it’s subject to a lot of SSH logins. The easiest thing to do is to shut it off from the outside world. But I need to access it when away from home, so that’s not a particularly useful solution.

So what I’ve done is use the SSH daemon’s (sshd) configuration to reduce the risk profile. The first thing I wanted to do is forbid login as root:

PermitRootLogin no

I also don’t want anyone to be able to log in with passwords. “Anyone” is essentially me here, but since I have sudo on the box, if someone is able to figure out my password they are able to get root remotely.

PermitRootLogin no
ChallengeResponseAuthentication no

Finally, I want to restrict remote login to only explicitly-permitted users. I do this with a dedicated Unix group that I call “sshusers”.

AllowGroups sshusers

These are pretty standard changes and not really worth a blog post. But it turns out that sshd has a very flexible configuration. When a client is coming from inside the LAN, I want to enable password authentication. This is particularly helpful when I’m installing a new system and don’t have SSH keys setup yet.

Match Address 192.168.1.*
    PasswordAuthentication yes

Also within the LAN, it’s easier to run Ansible playbooks across machines if the root user can SSH in with a key. So I combine user and address matching to permit key-based root login only from the server with the Ansible playbooks.

Match User root Address 192.168.1.10
    AllowGroups root sshusers
    PermitRootLogin prohibit-password

Finally, I want my ex to be able to access the server in order to access photos, etc. So I set up her account so that she can use an sftp client but can’t log in (not that she would anyway, but it was a fun challenge to set this up).

Match User angie
    ForceCommand internal-sftp
    PasswordAuthentication yes
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no

Why didn’t you … ?

The configuration above isn’t the only way to secure my SSH server from the outside world. It’s not even necessarily the best. I could, for example, move SSH to a different port, which would cut down on the drive-by attempts significantly. I resisted that in the past because I felt “security through obscurity isn’t security.” But in practice, it can be a layer in a more secure approach. In the past, I also recall some clients I used (particularly on mobile) not having the ability to use a non-default port. If that recollection is correct, it seems to also be outdated now. So basically I’m still on port 22 because of inertia.

I could also set up a VPN server and use that for remote access. That requires an additional service to manage, of course. And it also presents challenges when I’m also connected to a work VPN server. The sshd configuration approach is a simpler way for my needs.

Using Element as an IRC client

Like many who work in open source communities, IRC is a key part of my daily life. Its simplicity has made it a mainstay. But the lack of richness also makes it unattractive to many newcomers. As a result, newer chat protocols are gaining traction. Matrix is one of those. I first created a Matrix account to participate in the Fedora Social Hour. But since Matrix.org is bridged to Freenode, I thought I’d give Element (a popular Matrix client) a try as an IRC client, too.

I’ve been using Element almost exclusively for the last few months. Here’s what I think of it.

Pros

The biggest pro for me is also the most surprising. I like getting IRC notifications on my phone. Despite being bad at it (as you may have read last week), I’m a big fan of putting work aside when I’m done with work. But I’m also an anxious person who constantly worries about what’s going on when I’m not around. It’s not that I think the place will fall apart because I’m not there. I just worry that it happens to be falling apart when I’m not there.

Getting mobile notifications means I can look, see that everything is fine (or at least not on fire enough that I need to jump in and help), and then go back to what I’m doing. But it also means I can engage with conversations if I choose to without having to sit at my computer all day. As someone who has previously had to learn and re-learn not to have work email alert on the phone, I’m surprised at my reaction to having chat notifications on my phone.

Speaking of notifications, I like the ability to set per-room notification settings. I can set different levels of notification for each channel and those settings reflect across all devices. This isn’t unique to Element, but it’s a nice feature nonetheless. In fact, I wish it were even richer. Ideally, I’d like to have my mobile notifications be more restrictive than my desktop notifications. Some channels I want to see notifications for when I’m at my desk, but don’t care enough to see them when I’m away.

I also really like the fact that I can have one fewer app open. Generally, I have Element, Signal, Slack, and Telegram, plus Google Chat all active. Not running a standalone IRC client saves a little bit of system resources and also lets me find the thing that dinged at me a little quicker.

Cons

By far the biggest drawback, and the reason I still use Konversation sometimes, is the mishandling of multi-line copy/paste. Element sends it as a single multi-line message, which appears on the IRC side as “bcotton has sent a long message: <url>”. When running an IRC meeting, I often have reason to paste several lines at once. I’d like them to be sent as individual lines so that IRC clients (and particularly our MeetBot implementation), see them.

The Matrix<->IRC bridge is also laggy sometimes. Every so often, something gets stuck and messages don’t go through for up to a few minutes. This is not how instant messaging is supposed to work and is particularly troublesome in meetings.

Overall

Generally, using Element for IRC has been a net positive. I’m looking forward to more of the chats I use becoming Matrix-native so I don’t have to worry about the IRC side as much. I’d also like the few chats I have on Facebook Messenger and Slack to move to Matrix. But that’s not a windmill I’m willing to tilt at for now. In the meantime, I’ll keep using Element for most of my IRC need,s, but I’m not quite ready to uninstall Konversation.

Setting boundaries when working in communities

Kat Cosgrove recently had a tweet that hit home:

I haven’t taken any meaningful time off of work in the last 14 months because it feels kinda pointless. I’m just going to be sitting at home thinking about work so I might as well be doing work. Invariably, what I fear is happening while I’m not at work is much worse than what is actually happening. Yay, anxiety!

But also, there’s some guilt when you’re paid to work in a community where a lot of people are volunteering. I don’t feel like I can say “hey, it’s after my work hours” because many in my community only participate outside of their work hours. Add to that the global nature of open source communities and that means that there’s always something to devote my time to.

I think it would be easier to come in as an outsider who is just doing the job for a paycheck. But working in a community where you previously volunteered makes the urge to be around all the time so much stronger. It can be really hard to set boundaries because it feels like you’re devaluing the donated time of others.

It’s a blessing and a curse. I happen to think I’m pretty good at my job (and the fact that I’m anything other than a failure should tell you something) and I know that’s because it’s more than a paycheck to me. But that’s also what makes it so hard to draw boundaries.

My manager (who is very good at reminding me to take care of myself) recently compared it to working for a startup. Everyone pitches in wherever they can, even when it’s not on the job description. That’s incredibly true in open source projects, except there’s no exit. It’s not like you’re working hard now so you’ll get a stupid-large pile of cash when a big company acquires you or you have an IPO. If the project is successful it…keeps being a startup forever.

For now, I’m holding up pretty well. I’m balancing working too much with non-work interests (even if a lot of them look like work to the outside observer). But I wonder how long that can hold. And I wonder how others in a similar position make it work over the long term.

FOSS licenses permit, not restrict

Last week, Matthew Wilson shared a very correct take on Twitter:

A few people in the mentions argued that the GPL is doing it wrong by his definition. This is incorrect. Copyleft licenses do not prevent the user from doing things, they ensure that subsequent users can do the same thing.

This may seem like a semantic argument, but there’s substance to it. All licenses (except those that amount to a public domain dedication) contain some conditions, minimal though they may be. It’s important to remember that the default is that you can do nothing with a work. Copyright is by definition a monopoly on a work.The entire point of free and open source software licenses is to tell you what you can do, because the default position is that you can’t.

One of the most annoying things about license wars is the argument that one category of license is somehow more free than another. That’s dumb. Both copyleft and permissive licenses promote freedom, just from different perspectives. Permissive licenses give the next person in line the freedom to do (essentially) whatever they want. Copyleft licenses preserve freedoms for all subsequent users, no matter how many hands the work passes through. There are plenty of philosophical and practical reasons you might choose one class of license over the other (I tend to prefer copyleft licenses, myself), but it’s wrong to paint one or the other as anti-freedom.

Getting back to Matthew’s point, there has been a fair amount of license weaponization in the last few years. By this I mean the use of a license to try to exclude a certain class of user. Some of this I’m sympathetic to (e.g. the “ethical source” movement), some of this I’m not (e.g. the various “you can do what you want, just don’t make a successful software-as-a-service offering” licenses that have popped up). In both cases, I think copyright is the wrong mechanism for achieving the goals.

Excluding classes of users is antithetical to ideals free software and open source. That may be okay. As I’ve written, free software is not the end goal. But if you’re going to claim to be open source, you should act open source.

On CLAs, DCOs, and pinky swears

Recently, Van Lindberg decided to kick over a hornets’ nest on Twitter:

I don’t think either of them particularly change the risk profile to the end user of a project. Both a contributor license agreement (CLA) and developer certificate of origin (DCO) depend on the contributor asserting something that is correct. In my experience, the most common issue is a developer submitting code they can’t. This could be because they’re reusing code under an incompatible (including proprietary) license.

Another possibility is that they are not the copyright owner. This can be the case when contributing as part of a job or while using their employer’s resources. Van suggests that a CLA helps prevent this because it passes through the contributor’s employer’s legal department. That strikes me as naÏve. Most contributors, I suspect, will sign the CLA on their own without consulting anyone else.

Fundamentally, CLAs and DCOs depend on contributors understanding enough about intellectual property to ensure their contributions are valid. Neither mechanism is particularly effective at that.

This doesn’t mean they’re useless. My 2018 Opensource.com article gives more information on that.