Why HTCondor is a pretty awesome scheduler

In early March, The Next Platform published an article I wrote about cHPC, a container project aimed at HPC applications. But as I wrote it, I thought about how HTCondor has been addressing a lot of the concerns for a long time. Since I’m in Madison for HTCondor Week right now, I thought this was a good time to explain some of the ways this project is awesome.

No fixed walltime. This is a benefit or a detriment, depending on the circumstances, but most schedulers require the user to define a requested walltime at submission. If the job isn’t done at the end of that time, the scheduler kills it. Sorry about your results, get back in line and ask for more walltime. HTCondor’s flexible configuration allows administrators to enable such a feature if desired. By default users are not forced to make a guess that they’re probably going to get wrong.

Flexible requirements and resource monitoring. HTCondor supports user-requestable CPU, memory, and GPU natively. With partitionable slots, resources can be carved up on the fly. And HTCondor has “concurrency limits”, which allow for customizable resource constraints (e.g. software licenses, database connections, etc).

So many platforms. Despite the snobbery of HPC sysadmins, people do real work on Windows. HTCondor has almost-full feature parity on Windows. It also has “universes” for Docker and virtual machines.

Federation. Want to overflow to your friend’s resource? You can do that! You can even submit jobs from HTCondor to other schedulers.

Support for disappearing resources. In the cloud, this is the best feature. HTCondor was designed for resource scavenging on desktops, and it still supports that as a first-class use case. That means machines can come and go without much hassle. Contrast this to other schedulers where some explicit external action has to happen in order to add or remove a node.

Free as in freedom and free as in beer. Free beer is also the best way to get something from the HTCondor team. But HTCondor is licensed under the Apache 2.0 license, so anyone can use it for any purpose.

HTCondor isn’t perfect, and there are some use cases where it doesn’t make sense (e.g. low-latency), but it’s a pretty awesome project. And it’s been around for over three decades.

Leave a Reply

Your email address will not be published. Required fields are marked *