Why subscribe to a newsletter you don’t read?

Why would you subscribe to a newsletter that you don’t read? I mean, maybe you intend to. Maybe it’s sitting there in your inbox unread just waiting for you to get around to it Real Soon Now. Or maybe you filter it off to some folder where email does to die. I get that. I do that all the time.

No, what I’m thinking about is the case where an obvious spam account signs up for a newsletter. As of this writing, my newsletter has 283 subscribers — a number that has grown 27% in the past month. But only 40 people at most have ever opened it. The number of opens has stayed relatively constant even as the subscriber count has gone up.

So why do I think the accounts are spam? For one, there’s the fact that most of them haven’t opened any newsletters. Sure, maybe there’s a reason for that. But also they look…spammy. The addresses are often yahoo or other domains that have fallen out of favor. The names represented by the addresses don’t look like the names of people I know. I can’t imagine why people I do know read my newsletter, nevermind why strangers would. Taken all together, I feel safe calling many of these accounts spam.

But to what end? I understand spam accounts on Twitter liking random posts in the hopes that someone will look at the profile and click a link to whatever thing someone’s trying to peddle. Or maybe follow the account and get clicks that way. That makes sense to me. But what can a spammer do with a newsletter subscription? Is it a really crappy denial of service attack? Do they hope that after a few years my subscriber list will exceed Mailchimp’s free tier? Maybe it’s done to hide nefarious activity in a flood of confirmation emails. That seems like the most likely answer, but it doesn’t seem very efficient. Then again, I’m not a spammer, so what do I know?

If everyone followed good password advice, we’d be less secure

Passwords are hard. To be useful, they must be hard to guess. But the rules we put in place to make them hard to guess also make them hard to remember. So people do the minimum they can get away with.

Earlier this week, security company Webroot took a look at the unintended consequences of password constraints. The rules organizations set in order to ensure passwords are sufficiently complex reduce the total number of possible passwords. This can make automated password guessing more

Good passwords are easy for the user to remember and hard for computers and other humans to guess. Let’s say I wanted to use a password like 2Clippy2Furious!! Various password checking sites rate it highly. It’s 18 characters long and contains upper- and lower-case letters, digits, and special characters. But because it contains consecutive repeating letters, some companies won’t allow it.

Writing for Webroot, Randy Abrams says “it’s length, not complexity that matters.” And he’s right. That’s the point behind the “correct horse battery staple” password in XKCD #936. So let’s all do that, right?

Well…it’s not so simple. If I were trying to brute force passwords, and I knew everyone was using four (or five or six) words, suddenly instead of “CorrectHorseBatteryStaple” being 26 characters, it’s four. Granted, the character set goes from 95 to (using /usr/share/dict/words on my laptop) 479,828. “CorrectHorseBatteryStaple” is many powers of 10 more secure if the attacker doesn’t know you’re using words.

And let’s be real: they don’t. This hypothetical weakness has a long time before it becomes a real concern. Don’t believe me? Just look at the password dumps when a site gets hacked. There are a lot of really bad passwords out there. If we took all the constraints off (except for minimum length), people would just use really dumb, easily-guessed passwords again. But it amuses me that if everyone followed good password advice, we’d actually make it worse for ourselves. Passwords are hard.

Sidebar: Yes, I know

The savvier among you probably read this and thought “it’s better to use a random string that you never have to memorize because your password manager handles it for you. Just set a very long and memorable password on that and you’re good to go.” Yes, you’re right. But people, even those who use password managers, will often go to memorable passwords for low-risk sites or passwords they have to use often (e.g. to log in to their computer so they can access the password manager). 

You are responsible for (thinking about) how people use your software

Earlier this week, Marketplace ran a story about Michael Osinski. You probably haven’t heard of Osinski, but he plays a role in the financial crisis of 2008. Osinksi wrote software that made it easier for banks to package loans into a trade-able security. These “mortgage-backed securities” played a major role in the collapse of the financial sector ten years ago.

It’s not fair to say that Osinski is responsible for the Great Recession. But it is fair to say he did not give sufficient consideration to how his software might be (mis)used. He told Marketplace’s Eliza Mills:

Most people realized that we wrote a good piece of software that we sold in the marketplace. How people use that software is … you know, you really can’t control that.

Osinski is right that he couldn’t control how people used the software he wrote. Whenever we release software to the world, it will get used how the user wants to use it — even if the license prohibits certain fields of endeavor. This could be innocuous misuse, the way graduate students design conference posters in PowerPoint or businesspeople use Excel for all conceivable tasks. But it could also be malicious misuse, the way Russian troll farms use social media to spread false news or sew discord.

So when we design software, we must consider how actual users — both benevolent and malign — will use it. To the degree we can, we should mitigate against abuse or at least provide users a way to defend themselves from it. We are long past the point where we can pretend technology is amoral.

In a vacuum, technological tools are amoral. But we don’t use technology in a vacuum. The moment we put it to use, it becomes a multiplier for both good and evil. If we want to make the world a better place, we cannot pretend it will happen on its own.

“You’ve been hacked” corrects behavior

Part of running a community means enforcing community norms. This can be an awkward and uncomfortable task. I recently saw a Tweet that suggests it might be easier than you thought:

It’s nice because it’s subtle and gives people a chance to self-correct. On the other hand, there’s some value in letting community members (and potential community members) see enforcement actions. Not as a punitive measure, but as a signal that you take your code of conduct seriously.

This won’t work for every case, but I do like the idea as a response to the first violation, so long as it’s a minor violation. Repeated or flagrant violation of the community’s code of conduct will have to be dealt with more strongly.

Twitter interactions are not a polling mechanism

Way back in the day, clever Brands tried to conduct Twitter polls by saying “retweet for the first choice and favorite (now like) for the second choice.” This was obviously very prone to bias. The first choice’s fans will spread the poll, so virality favors the first option. But it was also the best choice available, other than linking to an external poll site (which means a much lower interaction rate).

Then Twitter introduced native polls. Now you can post a question with up to four answers. It even makes a nice bar chart of the results. Twitter interactions are not a polling mechanism, so why are you using them?!

The answer lies in the word “interaction”. Social media interactions are a way for Brands to measure the success of their social media efforts. Conducting polls via interactions instead of the native polling mechanism are a cheap way to drive up interactions. It’s a good indication that you’re not interested in the answers. People who want actual answers can use polls.

This concludes today’s episode of “Old man yells at cloud”.

Date-based conditional formatting in Google Sheets

Sometimes a “real” project management tool is too heavy. And spreadsheets may be the most-abused software tool. So if you want to track the status of some tasks, you might want to drop them into a Google spreadsheet.

You have a spreadsheet with four columns: task, due, completed, and owner. For each row, you want that row to be formatted strikethrough if it’s complete, highlighted in yellow if it’s due today, and highlighted in red if it’s overdue. You could write conditional formatting rules for each row individually, but that sounds painful. Instead, we’ll use a custom formula.

For each of the following rules, apply them to A2:D.

The first rule will strike out completed items. We’ll base this on whether or not column C (completed) has content. The custom formula is =$C:$C<>"". Set the formatting style to Custom, clear the color fill, and select strikethrough.

The second rule will highlight overdue tasks. We only want to highlight incomplete overdue tasks. If it’s done, we stop caring if it was done on time. So we need to check that the due date (column B) is after today and that the completion date (column C) is blank. The rule to use here is =AND($C:$C="",$B:$B<today(),$A:$A<>""). Here, you can select the “Red highlight” style.

Lastly, we need to highlight the tasks due today. Like with the overdue tasks, we only care if they’re not done.=AND($C:$C="",$B:$B=today(),$A:$A<>""). This time, use the “Yellow highlight” style.

And that’s it. You can fill in as many tasks as you’d like and get the color coding populated automatically. I created an example sheet for reference.

Microsoft bought GitHub. Now what?

Last Monday, a weekend of rumors proved to be true. Microsoft announced plans to buy code-hosting site GitHub for $7.5 billion. Microsoft’s past, particularly before Satya Nadella took the corner office a few years ago, was full of hostility to open source. “Embrace, extend, extinguish” was the operative phrase. It should come as no surprise, then, that many projects responded by abandoning the platform.

But beyond the kneejerk reaction, there are two questions to consider. First: can open source projects trust Microsoft? Secondly, should open source (and free software in particular) projects rely on corporate hosting.

Microsoft as a friend

Let’s start with the first question. With such a long history of active assault on open source, can Microsoft be trusted? Understanding that some people will never be convinced, I say “yes”. Both from the outside and from my time as a Microsoft employee, it’s clear that the company has changed under Nadella. Microsoft recognizes that open source projects are not only complementary, but strategically important.

This is driven by a change in the environment that Microsoft operates in. The operating system is less important than ever. Desktop-based office suites are giving way to web-based tools for many users. Licensed revenue may be the past and much of the present, but it’s not the future. Subscription revenue, be it from services like Office 365 or Infrastructure-as-a-Service offerings, is the future. And for many of these, adoption and consumption will be driven by open source projects and the developers (developers! developers! developers! developers!) that use them.

Microsoft’s change of heart is undoubtedly driven by business needs, but that doesn’t make it any less real. Jim Zemlin, Executive Director at the Linux Foundation, expressed his excitement, implying it was a victory for open source. Tidelift ran the numbers to look at Microsoft’s contributions to non-Microsoft projects. Their conclusion?

…today the company is demonstrating some impressive traction when it comes to open source community contributions. If we are to judge the company on its recent actions, the data shows what Satya Nadella said in his announcement about Microsoft being “all in on open source” is more than just words.

And in any acquisition, you should always ask “if not them, then who?” CNBC reported that GitHub was also in talks with Google. While Google may have a better reputation among the developer community, I’m not sure they’d be better for GitHub. After all, Google had Google Code, which it shut down in 2016. Would a second attempt in this space fare any better? Google Code had a two year head start on GitHub, but it languished.

As for other major tech companies, this tweet sums it up pretty well:

Can you trust anyone to host?

My friend Lyz Joseph made an excellent point on Facebook the day the acquisition was announced:

Unpopular opinion: If you’re an open source project using GitHub, you already sold out. You traded freedom for convenience, regardless of what company is in control.

People often forget that GitHub itself is not open source. Some projects have avoided hosting on GitHub for that very reason. Even though the code repo itself is easily mirrored or migrated, that’s not the real value in GitHub. The “social coding” aspects — the issues, fork tracking, wikis, ease of pull requests, etc — are what make GitHub valuable. Chris Siebenmann called it “sticky in a soft way.

GitLab, at least, offers a “community edition” that projects can self-host. In a fantasy world, each project would run their own infrastructure, perhaps with federated authentication for ease of use when you’re a participant in many projects. But that’s not the reality we live in. Hosting servers costs money and time. Small projects in particular lack both of those. Third-party infrastructure will always be attractive for this reason. And as good as competition is, having a dominant social coding site is helpful to users in the same way that a dominant social network is simpler: network effects are powerful.

So now what?

The deal isn’t expected to close for a while, and Microsoft plans to seek regulatory approval, which will not speed the process. Nothing will change immediately. In the medium term, I don’t expect much to change either. Microsoft has made it clear that it plans to run GitHub as a fairly autonomous business (the way it does with LinkedIn). GitHub gets the stability that comes from the support of one of the world’s largest companies. Microsoft gets a chance to improve its reputation and an opportunity to make it easier for developers to use Azure services.

Full disclosure: I am a recent employee of Microsoft and a shareholder. I was not involved in the acquisition and had no inside knowledge pertinent to the acquisition or future plans for GitHub.

You’re an SEO company

Business owners, regardless of their industry, often view themselves in terms of what their business does. “We’re a bookstore, a coffee shop, a web design company,” or whatever goods or services that customers pay money for. But a recent conversation made me realize that most small businesses in a mature market are really a search engine optimization (SEO) company.

Okay, there are a few caveats here. I’m thinking of mature markets as fields where there are many small or small-ish players that are attempting to serve a large number of users. Think generally of the early and late majority sections of the technology adoption life cycle. Ride sharing, for example, is out of scope. It’s pretty solidly in the middle of the bell curve, but it has three players: Uber, Lyft, and everyone else.

The subject of the conversation was a VPN service. A friend was using VPN software and observed that it would be easy to share his server with others for a fee. All the other challenges of running a business aside, I immediately asked what his differentiation is.

VPN services may not be mainstream exactly, but the market is mainstream enough. And there are a lot of players with no one particularly dominant. So how does a new entry set itself apart? There’s a little bit of room to differentiate on price, location, service, etc, but not much. So the best way to differentiate and get new customers is to be better at search engine optimization than the rest of the field.

In essence, making a business successful requires skills entirely unrelated to the business itself. When you can’t easily differentiate your product, you have to differentiate your marketing.

Book review: Habeas Data

What does modern technology say about you? What can the police or other government agencies learn? What checks on their power exist? These questions are the subject of a new book from technology reporter Cyrus Farivar.

Habeas Data (affiliate link) explores the jurisprudence that has come to define modern privacy law. With interviews with lawyers, police officers, professors, and others who have shaped the precedent. What makes this such an interesting subject is the very nature of American privacy law. Almost nothing is explicitly defined by legislation. Instead, legal notions of privacy come from how courts interpret the Fourth Amendment to the United States Constitution. This gives government officials the incentive to push as far as they can in the hopes that no court cases arise to challenge their methods.

For the first two centuries or so, this served the republic fairly well. Search and seizure were constrained to the physical realm. Technological advances did little to improve the efficiency of law enforcement. This started to change with the advent of the telegraph and then the telephone, but it’s the rapid advances in computing and mobility that have rendered this unworkable.

As slow as legislatures can be to react to technological advances, courts are even slower. And while higher court rulings have generally been more favorable to a privacy-oriented view, not everyone agrees. The broad question that courts must grapple with is which matters more: the practical effects of the technology changes or the philosophical underpinnings?

To his credit, Farivar does not claim to have an answer. Ultimately, it’s a matter of what society determines is the appropriate balance between individual rights and the needs of the society at large. Farivar has his opinions, to be sure, but Habeas Data does not read like an advocacy piece. It is written by a seasoned reporter looking to inform the populace. Only by understanding the issues can the citizenry make an informed decision.

With that in mind, Habeas Data is an excellent book. Someone looking for fiery advocacy will likely be disappointed, but for anyone looking to understand the issue, it’s a great fit. Technology law and ethics courses would be well-advised to use this book as part of the curriculum. It is deep and well-researched while still remaining readable.

It has its faults, too. The flow of chapters seems a little haphazard at times. On the other hand, they can largely be treated as standalone studies on particular issues. And the book needed one more copy editing pass. I saw a few typographic errors, which is bound to happen in any first-run book, but was jarred by a phrase that appeared to have been accidentally copy/pasted in the middle of a word.

None of this should be used as a reason to pass on this book. I strongly recommend Habeas Data to anyone interested in the law and policy of technology, and even more strongly to those who aren’t interested. The shape that privacy law takes in the next few years will have impacts for decades to come.