Cloud Pipeline: future of inter cloud provider sneaker-nets

One of the notable frictions surrounding use of cloud computing providers has been the difficulties in getting large data sets into and out of the domain of the cloud provider. Once your data set grows beyond a certain level, it's just not feasible to use the public network to transfer it. Amazon, in May 2009, began addressing this friction by offering an import feature, whereby one can ship them data (on an external SATA/USB drive), and they'll load it into their S3 storage service. And just recently, Amazon added a similar export feature. This is extremely useful between the customer and Amazon, but I believe it's only the beginning of a trend in what's to come to inter cloud "sneaker nets".

There are a slew of interesting use-cases of transferring data sets between various forms of providers, without the customer ever touching the data, nor ever sending physical devices. This of course, would dictate there being some (set of) standards/formats for inter-provider transfer. There are obvious and well known uses such as shipping data off-site for DR (Disaster Recovery) purposes. If the the DR site is also a cloud provider, the transfer should optimally occur between the two providers without the customer being involved in sending media devices. Under the right circumstances, data could be sent directly from the source to destination provider, eliminating the need for the destination to send a media device at all; that would remove one delivery day from the equation. Doing this would likely require some combination of the data being encrypted and the media being properly scrubbed from its previous data. Or starting with a pool of fresh devices on the source provider side, and shipping the used device back to the customer from the destination provider side, with the extra cost of the device added.

"Cloud Pipeline"

A more efficient and seamless transport of large data sets will be a key enabler that will allow the cloud computing landscape to evolve in usage and in number of providers. With that evolution will come a lot of other "touch-less" uses, such as exporting your database through FedEx from a provider such as Amazon to a database analytics specific provider who houses an army of specialized columnar analytics database engines. Or perhaps to a provider that specializes in render farm activities, real-time 3D server-side rendering, massively parallel HPC (High Performance Computing), compliance, de-duplication, compression, bio-informatics, or a host of other specialties. One could actually set up a 'pipeline' of cloud services in this way, moving data to the stage in the pipeline where it is most optimally processed, based on capabilities, geographies, current pricing, etc. Perhaps the next big Pixar-like animation studio company will make use of a Cloud Pipeline. Or perhaps the next big biotech company. It wouldn't be surprising if they start out with some stages of the pipeline in-house, and farm out an increasing amount of work as the cloud evolves.

Cloud Marketplace

Ultimately, there will be market places for cloud computing. But initially, many things must be developed and normalized/standardized before the compute side has the full potential of being "marketized". One example is e-metering, which can't be simply bolted on as an after-thought, but needs to be deeply integrated into layers of the cloud fabric. That may take quite some time before it becomes marketplace friendly.

But the inter-modal data transport (aka "sneaker net" or "FedEx net") level is a level of abstraction at which a cloud marketplace gets interesting in the shorter term. Here we have the opportunity for a given data set to be copied or multiplexed to a set of receiving providers, based on pre-arranged or real-time market criteria. It may be that by the time the data movement occurs, a provider may have come available who can process the data more efficiently, with a lower pricing for the same efficiency, or perhaps just in a more geographically or politically friendly locale. Perhaps a given provider just rolled in a bank of analytics database engines, or maybe they added banks of GPGPUs. These are the kinds of events that could make one provider much more competitive in the market (10 or 100x). As long as a customer can periodically and inexpensively transport copies of their data to another (backup) site, much of the tie-in problem vanishes. It becomes more of a data format standards issue, one that the customer has more influence on.

Keep in mind a related cloud marketplace would require APIs; orchestrating workflows across a cloud pipeline with real-time market based routing would need them. ;-)

Disclosure: no positions