Earlier this week, I attended CCA11 at Argonne National Laboratory. I was there to present an extended abstract and take in what I could. I’ve never presented at a conference before (unless you count a short talk to kick off the Condor BoF at LISA ’10, which I don’t) and the subject of my abstract was work that we’ve only partially done, so I was a bit nervous. Fortunately, our work was well-received. It was encouraging enough that I might be talked into writing another paper at some point.
One thing I learned from the poster session and the invited talks is that the definition of “cloud” is just as ambiguous as ever. I continue to hate the term, although the field (however you define it) is doing interesting things. There’s a volunteer effort underway at NASA to use MapReduce to generate on-demand product visualization for disasters. An early prototype for Namibian flooding is at http://matsu.opencloudconsortium.org/.
Perhaps one of the largest concerns is the sheer volume of data. For example, the National Institutes of Health have over two petabytes of genomics data available, but how can you transfer that? Obviously, in most cases a user would only request a subset of data, but if there’s a use case that requires the whole data set, then what? One abstract presenter championed the use of sneakernet and argued that network bandwidth is the greatest challenge going forward.
One application that wasn’t mentioned is the cloud girlfriend. Maybe next year?
Thanks to Andy Howard and Preston Smith for their previous work and for helping me write the abstract.
Last week, the Internet let out a collective LOL about a news story from South Africa. It seems a pigeon with a 4GB USB stick got better bandwidth than the local DSL service. While humorous, this is not exactly a new idea. A rafting tour company in the Rocky Mountains sends photos back to the base via pigeon. Using a PigeonNet is so common, there’s actually been an RFC developed for it, although the date diminishes its credibility somewhat.
To be serious, though, transferring data over the network isn’t always the best option. You should never underestimate the bandwidth of a box truck. The transfer of large files over a network connection can take hours, days, or even weeks. A delivery service can often have a disk with the same data delivered overnight. We experienced this in my department a few years ago. One of our faculty was working with a colleague in China who had several terabytes of climate model data to share. If they had tried sending it over the network, we’d still be waiting for all of it to arrive. Instead, a box full of disks arrived after less than a week.
The common term for this kind of manual transfer of data is SneakerNet. SneakerNet can be very fast and reliable in certain situations, but it’s important to consider all of the factors. The time it takes to transfer data via SneakerNet is not just the time it takes to ship a disk from one place to another. There’s also a non-trivial amount of time to copy the data onto and off of the disk. Over a USB connection, it may take half a day or more to transfer a terabyte of data to or from a disk.
Sometimes SneakerNet is not done as a bandwidth consideration, but because of architecture. If you have a separate internal network (like is often used for classified government data), you may have no other choice than to transfer data “by hand.” Or it may be that the cost of setting up a more automated system is not worth the effort. In my department, we rarely have to install software on Macs, so it’s to our benefit to walk down the hall with a CD instead of going through the effort of standing up a software installation service.
There are plenty of reasons and methods for transferring data the old-fashioned way. I’ll leave it to my ones of readers to come up with their own justifications.