Back in February, Pete Cheslock quipped “100,000 cores – cause it sounds more impressive than 2000 servers.” Patrick Cable pointed out that HPC people in particular talk in cores. I told them both that the “user perspective is all about cores. The number of machines it takes to provide them means squat.” Andrew Clay Shafer disagreed, with a link to some performance benchmarks.
He’s technically correct (the best kind of correct), but misses the point. Yes, there are performance impacts when the number of machines change (interestingly, fewer machines is better for parallel jobs, while more machines is better for serial jobs), but that’s not necessarily of concern to the user. Data movement and other constraints can wash out any performance differences the machine count introduces.
But really, the concern with core count is misplaced, too. What should really be of concern to the user is the time-to-results. It’s up to the IT service provider to translate that need to the technical requirements (this is more true for operational computing than research, and it depends on the workload to have a fair degree of predictability). The user says “I need to do X amount of computation and get the results in Y amount of time.” Whether this is done on 1 huge machine or ten thousand small machines, that doesn’t really matter. This plays well into cloud environments where you can use a mixture of instance types to get to the size you need.