Wednesday, May 1, 2013

Can your cloud bring the rain?




There is a large amount of press, hype, and debate about cloud computing. There are arguments about which cloud platform is better, vCloud director, OpenStack, CloudStack, etc. I've seen very passionate debates on buy versus build with relation to cloud and many people are really left trapped in a fog of rhetoric trying to sort it all out. It reminds me of the early days of Linux and the distribution wars on which distribution is better. Instead of joining in, I wanted to offer a slightly different perspective on the problem. I'm less interested in what kind of cloud it is, I'm more interested to know if your cloud can bring the rain!

In order to explain what I mean, let's first step back in time to my first SAN. We had a project I was working on that needed storage and several Solaris servers (this was probably in 2001 or so). I recommended a SAN even though our group was VERY skilled at adding additional local drives and using volume managers, and even doing host clustering with shared SCSI enclosures (and we had NO SAN knowledge much less expertise). After weeks of reading, vendor meetings, design sessions, learning about fiber cabling, zoning, masking, and configuring HBAs, and some long nights dealing with bugs, the moment of truth arrived. I had a new lun available on my host that I now needed to partition, add to the volume manager, and make a file system just like I had always done. I had a directory I could store files on.

It was quite possibly the most anti-climatic moment of my IT career. Months worth of work gave me exactly the same result as sticking a hard drive into the server. I took refuge in the fact that my lun was faster than a regular disk (but I could have done that with software raid). It was more flexible because I didn't have to use standard disk sizes (but my volume manager did that for me already). It was more reliable (because having multipathing software, HBAs, switches, controllers, and disk that had all new ways to manage and monitor meant there were tons of new things to break).

Do I regret having my SAN? Absolutely NOT! When I built my first 3 and 4 node clusters I smiled. I looked at the cost savings by not stranding storage all over the place and dealing with lots of SCSI disk cabinets with only a few drives or cabinets that were full when I needed to add drives (not to mention never dealing with high voltage and low voltage differential SCSI EVER again). Allocating storage from home in my slippers was a nice bonus. One of the real wins though was when a developer called and you could hear the desperation in his voice. The production instance of a database had died and there was no ETA on a fix. Could we repurpose a test system? Within 30 minutes the old test luns were unmasked, new luns created, data was being restored and clients were being redirected to the new database. After two days things were fixed on the production system and life was good. He was ready for the "you'll be down for a week or two in order to rebuild the test system" and I basically said we just need to reboot and present the old luns and you are back to where you were with no rebuilding or reloading. I can do it while you are at lunch.

The very long point here, is that it wasn't about the SAN, it is about the fact that the SAN made me think differently about how we offered services and that let us solve real world problems in ways we just couldn't before. I see a very parallel lesson with the cloud.

After spending months researching and building an internal cloud in the lab. Rethinking IP management and assignment, account management, sacred rules I had about DNS management, I had a nice little system that could automatically provision IaaS. Click a button and you had a Linux server with accounts, IP address, I could log into it with a DNS name. It was great! Then it hit me. The same feeling I had after provisioning my first LUN on the SAN. It's just another server....so what.

The problem was not the cloud I had built or the technology I built it with. The problem was I didn't have a service or platform worthy of building a cloud. I'm not saying there isn't some benefit (it did make my life easier and more automated and that is always good). Something was missing. Even as I moved on to automate a specific platform like a webserver so that it deployed and there was some content something was still missing.

“There is nothing quite so useless, as doing with great efficiency, something that should not be done at all.”
― Peter F. Drucker

In the end, the problem seems to be that building a cloud to deliver the IT services you are using today (assuming you do not have a cloud) will probably miss a great opportunity to rethink the services that you offer and what they can provide. If your cloud provides this great elasticity and you web or app server platform you are offering do not evolve to match you may actually be doing yourself a disservice. How do you manage content in a web farm that can constantly grow and shrink in terms of servers? Deploying content on the template isn't really a good solution because you will be in constant template management mode. A shared backed may be good, but how do you make sure that's not a single point of failure or a bottleneck? If you are integrating with a load balancer so that your shiny new web layer can scale, are you locking yourself into you old way of thinking and ensuring that your platform only works in one type of environment and couldn't help you at something like AWS should you chose to leverage their service? If you start at AWS are you going to leverage their pre-built services and APIs so that you are stuck in AWS? How do you wrap up all those things and offer a web platform that is simple to use, scales up and down, has current content, and isn't constrained by every process the IT organization has ever invented.

It's also interesting to me to see how some people are solving these problems. While we kind of standardized on 3 tier architectures to run websites, cloud applications and platforms are different and scale differently and there are a lot of options out there. I find it useful to watch AWS extend the services that it offers and maybe take some lessons for your private cloud.
Initially they offered virtual machines and some storage via S3. Now they are offering redundant databases across AWS zones to help you keep you business available even when they have a disaster. They are also offering things like their DynamoDB to give you a scale out NOSQL database. The platforms and services they offer have evolved greatly because in the end, these are the services that transform how we write applications and how we do business. I also see people not using these and opting for building their own versions of these so that they are not locked into AWS and can be redundant across cloud providers.

One other view seems to be that if your cloud doesn't offer those kinds of services (redundant databases, scalable storage, application platforms that are as scalable as you cloud will enable them to be) the applications will evolve and deploy that type of functionality strictly as part of application design. Some of these projects (Apache Cassandra) didn't come along to compete with Amazon's services, they came because Amazon didn't offer those services and that is what applications and businesses needed. Once that happens, your cloud might be stuck striving to be low cost and faster as your only main selling points versus the potential of providing a game changing way of computing for your organization. So this is what I mean when I ask if your cloud can bring the rain. Can you deliver capabilities with your cloud that truly transform your business or are you leveraging your cloud to deploy business as usual faster? Are you altering your platforms in parallel with building out your cloud to let them take advantage of your new capabilities?

It's a journey, and there are crawl, walk, run phase to all this. I'm not saying don't build a cloud until you can offer everything AWS has and maybe more. But the debates about cloud technologies and which cloud platforms are better seem to be IT people getting lost in IT instead of how IT can enable the business to achieve things we could not offer them before. Rain is the stuff that makes things grow. Make sure you are building a cloud that can bring the rain.