Maybe your tech conference needs less tech

My friend Ed runs a project called “Open Sourcing Mental Illness“, which seeks to change how the tech industry talks about mental health (to the extent we talk about it at all). Part of the work involves the publication of handbooks developed by mental health professionals, but a big part of it is Ed giving talks at conferences. Last month he shared some feedback on Twitter:

So I got feedback from a conf a while back where I did a keynote. A few people said they felt like it wasn’t right for a tech conf. It was the only keynote. Some felt it wasn’t appropriate for a programming conf. Time could’ve been spent on stuff that’d help career. Tonight a guy from a company that sponsored the conf said one of team members is going to seek help for anxiety about work bc of my talk. That’s why I do it. Maybe it didn’t mean much to you, but there are lots of hurting, scared people who need help. Ones you don’t see.

Cate Huston had similar feedback from a talk she gave in 2016:

the speaker kept talking about useless things like feelings

The tech industry as a whole, and some areas more than others, likes to imagine that it is as cool and rational as the computers it works with. Conferences should be full of pure technology. And yet we bemoan the fact that so many of our community are real jerks to work with.

I have a solution: maybe your tech conference needs less technology. After all, the only reason anyone pays us to do this stuff is because it (theoretically) solves problems for human beings. I’m biased, but I think the USENIX LISA conference does a great job of this. LISA has three core areas: architecture, engineering, and culture. You could look at it this way: designing, implementing, and making it so people will help you the next time around.

Culture is more than just sitting around asking “how does this make you feeeeeeeel?” It includes things like how to avoid burnout and how to train the next generation of practitioners. It also, of course, includes how to not be a insensitive jerk who inflicts harm on others with no regard for the impact they cause.

I enjoy good technical content, but I find that over the course of a multi-day conference I don’t retain very much of it. For a few brief hours in 2011, I understood SELinux and I was all set to get it going at home and work. Then I attended a dozen other sessions and by the time I got home, I forgot all of the details. My notes helped, but it wasn’t the same. On the other hand, the cultural talks tend to be the ones that stick with me. I might not remember the details, but the general principles are lasting and actionable.

Every conference is different, but I like having one-third of content be not-tech as a general starting point. We’re all humans participating in these communities, and it serves no one to pretend we aren’t.

Other writing in December 2016

Happy new year! Where have I been writing when I haven’t been writing here?

SysAdvent

Once again, SysAdvent was a great success. The large community that has built around this project means I do less than in years past. I want to give others the opportunity to get involved, too. This year I edited one article:

The Next Platform

I’m freelancing for The Next Platform as a contributing author. Here are the articles I wrote last month:

Opensource.com

Over on Opensource.com, we hit the million page view mark for the third consecutive month. I wrote the articles below.

Cycle Computing

Meanwhile, I wrote or edited a few things for work, too:

  • LISA 16 Cloud HPC BoF — I summarized a BoF session at the LISA Conference in Boston.
  • Various ghost-written pieces. I’ll never tell which ones!

Other writing in November 2016

Where have I been writing when I haven’t been writing here?

The Next Platform

I’m freelancing for The Next Platform as a contributing author. Much like my role with Opensource.com as a Community Moderator, I look at the other names on the list and I just say “wow! How did I end up in such good company?” The articles I wrote last month:

  • Advances in in situ processing tie to exascale targets — The growth in FLOPS is outpacing the growth in IOPS. Analyzing simulations as they run is becoming increasingly important for scientists and engineers.
  • Microsoft Research pens Quill for data intensive analysis — Collecting data is only useful to the extent that the data is analyzed. We have more data these days, but no platform that can handle both real-time streaming and post hoc analysis. The Quill project aims to change that.
  • JVM Boost shows warm Java is better than cold — The Java Virtual Machine allows “write once, run anywhere” but it imposes a performance penalty. For short-running jobs, the hit can be significant. The HotTub project speeds up these jobs (up to 30x in some cases!) by reusing JVM processes.

Opensource.com

Over on Opensource.com, I agreed to coordinate the Doc Dish column. I also wrote the articles below. It was a great month for the site. Three times during November, we set a single-day page view record. We also crossed the million page view mark for the second consecutive month and the third time in site history.

Cycle Computing

Meanwhile, I wrote or edited a few things for work, too:

  • Scale in a Cloudy World — I contributed an article to HPC Source about how to scale cloud HPC environments.
  • Various ghost-written pieces. I’ll never tell which ones!

Other writing in October 2016

Where have I been writing when I haven’t been writing here?

Over on Opensource.com, we had our second-ever month with a million page views! While I didn’t have any articles published, I did agree to coordinate the Doc Dish column, so there’s that.

Meanwhile, I wrote or edited a few things for work, too:

I also spoke at the All Things Open conference in Raleigh, NC. It went okay.

Other writings in September 2016

Where have I been writing when I haven’t been writing here?

Over on Opensource.com, we had another 900k+ page views in the month: the fourth time in site history and the second consecutive month. I contributed two articles:

Meanwhile, I wrote a few things for work, too:

  • Cycle Computing: The cloud startup that just keeps kicking — The Next Platform wrote a very nice article about us, so I wrote a blog post talking about how nice it was. (Hey, I’m in marketing now. It’s what we do).
  • Cloud-Agnostic Glossary — Supporting multiple cloud-service providers means having to translate terms between them. I put together a Rosetta Stone to help translate relevant terms between AWS, Azure, and Google Cloud.
  • The question isn’t cost, it’s value — When people talk about the cost of cloud computing, they’re usually looking at the raw dollar value. Since it takes money to make money, that’s not always the right way to look at it. It’s better to consider the value generated.

Come see me at these conferences in the next few months

I thought I should share some upcoming conference where I will be speaking or in attendance.

  • 9/16 — Indy DevOps Meetup (Indianapolis, IN) — It’s an informal meetup, but I’m speaking about how Cycle Computing does DevOps in cloud HPC
  • 10/1 — HackLafayette Thunder Talks (Lafayette, IN) — I organize this event, so I’ll be there. There are some great talks lined up.
  • 10/26-27 — All Things Open (Raleigh, NC) — I’m presenting the results of my M.S. thesis. This is a really great conference for open source, so if you can make it, you really should.
  • 11/14-18 — Supercomputing (Salt Lake City, UT) — I’ll be working the Cycle Computing booth most of the week.
  • 12/4-9 — LISA (Boston, MA) — The 30th version of the premier sysadmin conference looks to be a good one. I’m co-chairing the Invited Talks track, and we have a pretty awesome schedule put together if I do say so myself.

Changing how HTCondor is packaged in Fedora

The HTCondor grid scheduler and resource manager follows the old Linux kernel versioning scheme: for release x.y.z, if y is an even number it’s a “stable” series that get bugfixes, behavior changes and major features go on odd-numbered y. For a long time, the HTCondor packages in Fedora used the development series. However, this leads to a choice between introducing behavior changes when a new development HTCondor release comes out or pinning a Fedora release to a particular HTCondor release which means no bugfixes.

This ignores the Fedora Packaging Guidelines, too:

As a result, we should avoid major updates of packages within a stable release. Updates should aim to fix bugs, and not introduce features, particularly when those features would materially affect the user or developer experience. The update rate for any given release should drop off over time, approaching zero near release end-of-life; since updates are primarily bugfixes, fewer and fewer should be needed over time.

Although the HTCondor developers do an excellent job of preserving backward compatibility, behavior changes can happen between x.y.1 and x.y.2. HTCondor is not a major part of Fedora, but we should still attempt to be good citizens.

After discussing the matter with upstream and the other co-maintainers, I’ve submitted a self-contained change for Fedora 25 that will

  1. Upgrade the HTCondor version to 8.6
  2. Keep HTCondor in Fedora on the stable release series going forward

Most of the bug reports against the condor-* packages have been packaging issues and not HTCondor bugs, so upstream isn’t losing a massive testing resource here. I think this will be a net benefit to Fedora since it prevents unexpected behavior changes and makes it more likely that I’ll package upstream releases as soon as they come out.

Looking for my replacement

It’s been nearly three years since I joined Cycle Computing as a Senior Support Engineer. Initially, I led a team of me, but since then we’ve grown the organization. I’d like to think I did a good job of growing not only the team, but the tooling and processes to enable my company to provide excellent support to enterprise customers across a variety of fields.

But now, it is time to hire my replacement. I’m taking my talents across the (proverbial) hall to being working as a Technical Evangelist. I’ll be working on technical marketing materials, conferences, blog posts, and all kinds of neat stuff like that. I think it’s a good overlap of my skills and interests, and it will certainly be a new set of challenges.

So while this move is good for me, and good for Cycle Computing’s marketing efforts, it also means we need a new person to manage our support team. The job has been posted to our job board. If you’re interested, I encourage you to apply. It’s a great team at a great company. If you have any questions, I’d be happy to talk to you about it.

Hints for using HTCondor’s credd and condor_store_cred

HTCondor has the ability to run jobs as either an unprivileged “nobody” user or as the submitting user. On Linux, enabling this is fairly easy: the administrator just sets the UID_DOMAIN configuration to the same value and away you go. On Windows, you need to run the credential daemon (condor_credd) and the user must send store credentials using condor_store_cred.

The manual does a pretty good job of describing the basic setup of the credd, though there are some important pieces missing. With help from HTCondor technical lead Todd Tannenbaum, I’ve submitted some improvements to the docs, but in the meantime…

The main thing to consider when configuring your pool to use the credd is that it wants things to be secure. That makes sense, considering its entire job is to securely store and transfer user credentials. The credd will not hand out the password unless the client is authenticated and using a secure connection. The method of authentication is not important (if you really, really trust your network, you can use the CLAIMTOBE method), so long as authentication occurs somehow.

So where do the condor_store_cred hints come in? Often, the credd runs on the same machine as the schedd, and users log in to there to submit jobs. In that case, everything’s probably fine. But if you’re submitting jobs from a machine outside the pool (for example, a user’s workstation), it can get a little hairier.

Before running condor_store_cred, HTCondor needs to be told where to look for the credd, and the client settings mentioned above need to meet the credd’s requirements. (I’m using CLAIMTOBE here for simplicity). If the machine the user submits from is not in the pool, condor_store_cred will need to know where to find the collector, too.

CREDD_HOST = scheduler.example.com
COLLECTOR_HOST = centralmanager.example.com
SEC_CLIENT_AUTHENTICATION_METHODS = CLAIMTOBE
SEC_CLIENT_AUTHENTICATION = PREFERRED
SEC_CLIENT_ENCRYPTION = PREFERRED

As of this writing, condor_store_cred gives an unhelpful error message if something goes wrong. It will always say “Make sure your ALLOW_WRITE setting includes this host.”, so if your ALLOW_WRITE setting already includes the host in question, you might get stuck. Use the -debug option to get better output. For example:

02/16/16 12:23:51 STORE_CRED: In mode 'query'
02/16/16 12:23:51 Warning: Collector information was not found in the configuration file. ClassAds will not be sent to the collector and this daemon will not join a larger Condor pool.
02/16/16 12:23:51 STORE_CRED: Failed to start command.
02/16/16 12:23:51 STORE_CRED: Unable to contact the REMOTE schedd.

This tells you that you forgot to set the COLLECTOR_HOST in your configuration.

Another hint is that if your scheduler name is different than the machine name (e.g. if you run multiple condor_schedd processes on a single machine and have Q1@hostname, Q2@hostname, etc), you might need to include “-name Q1@hostname” in the arguments. Unlike most other HTCondor client commands, you cannot specify a “sinful string” as a target using the “-addr” option.

Hopefully this helps you save a little bit of time getting run_as_owner working on your Windows pool, until such time as I sit down to write that “Administering HTCondor” book that I’ve been meaning to work on for the last 5 years.

Supercomputing ’15

Last week, I spent a few days in Austin, Texas for the Supercomputing conference. Despite having worked in HPC for years, I’ve never been to SC. It’s a big conference. Since everyone heard I was going, they set a record this year with over 12,000 attendees. That’s roughly 10x the size of LISA, where I had been a few days ago.

I missed Alan Alda’s keynote, so my trip was basically ruined. That’s not true, actually. I spent most of the time in my company’s booth giving demos and talking to people. I had a lot of fun doing that. I’m sure the technical sessions were swell, but that’s okay. I look forward to going again next year, hopefully for the whole week and not immediately following another week-long conference.

20151117_103152

Ben with a minion