I apologize for the buzzwordy, markety blog title. But I am confident that the war between cloud databases and relational databases is partially, perhaps primarily, a question of marketing, and a marketing concept is therefore relevant: Disruptive technology.
Disruptive technologies start at the low-end of the market, (like microcomputers, or TCP/IP) and work their way up, disrupting “enterprise” solutions like minicomputers and value-added networks. Just as microcomputers did not completely replace minis and mainframes, I don’t expect cloud databases to completely eradicate traditional databases, but I hope to show why they will grow tremendously in the next few years: to the point that it is very unclear exactly what will happen to traditional databases over the long term.
By a “cloud” database, I mean one with the following properties:
- provided as part of a hosting package by large web companies like Amazon, Google, Rackspace, Microsoft and even smaller ones like Joyent and Heroku.
- designed so that they scale up automatically as traffic and content grows, rather than requiring new architectural and deployment decisions to be made as traffic grows. The hardware behind a cloud is finite, but consumers should only bump into limitations as they approach the limits of the cloud itself, rather than the limits of a particular box or cluster.
These product are still in their infancy, and I suspect that some of the vendors listed in point 1 do not actually achieve the linear scalability requirement. I haven’t tested them all myself.
I know the limitations of databases like App Engine and SimpleDB, which is why Ayogo has not jumped on that bandwagon (yet).
But imagine I was a decade younger and I was making a decision about what system to use for my world-beating new dorm-room headquartered startup.
On my left shoulder is a little relational database angel, on my right shoulder is a little cloud database devil.
SQL Angel: “Relational database theory is powerful, and SQL is VERY flexible.”
Cloud Devil: “Our query languages are simpler, and you just can use programming code to do anything tricky (like summation or joining).”
SQL Angel: “Relational databases have been proven to scale. Like at stock exchanges and stuff. You just need to understand the query language, the optimizer, indexing, the disk layout, RAID striping, master-slave replication, read/write splitting, snapshot backups and the physics of how hard disks spin. Just learn that stuff and you’ll be able to scale beautifully. With SQL, you’ll be the next eBay.”
Cloud Devil: “Our database just scales. Avoid huge numbers of writes to a single row. That’s all you need to learn about scalability. Everything else will just work. With your data in the cloud, you’ll be the next Google.”
SQL Angel: “You lose a lot of ad hoc query capability if you don’t use a database as flexible as a relational database. What about the analytics???”
Cloud Devil: “Analytics? Doesn’t that sound like some kind of stupid idea that would come out of marketing? Or Finance? Do you really need that? Anyhow, Google uses MapReduce for analytics. Major cloud databases either have, or will have access to MapReduce soon.”
SQL Angel: “But SQL databases are used in finance. Like at Goldman Sachs. Those guys are bad-ass enough to lend money to the country of Greece!”
Cloud Devil: “Cloud databases are used at Google. Those guys are bad-ass enough to give the Chinese government the finger.”
SQL Angel: “And there are free relational databases. Like MySQL! Which is used by Facebook and YouTube.”
Cloud Devil: “MySQL sucks. Facebook and YouTube are migrating away from it and onto Cassandra, BigTable, etc.”
SQL Angel: “But there are other free relational databases!”
Cloud Devil: “Wow….this is getting complicated. Do you really want to choose a minority relational database that for some reason is less popular than MySQL?
SQL Angel: “Plus there are some amazing commercial RDBMS’.”
Cloud Devil: “Before you have a dollar of revenue, you’re supposed to shell out cash for a commercial database? When App Engine is FREE and SimpleDB is CHEAP?”
SQL Angel: “SQL is a multi-vendor standard! You can port your code from one SQL engine to another!”
Cloud Devil: “Did you ever actually try porting SQL code and a large volume of data from one server to another? What a headache. If you pick something that scales all the way up from the start, you won’t have to worry about porting. Anyhow, maybe standards will arise for cloud databases too. Trust us.”
SQL Angel: “None of this is new…we already discarded this stuff in the 1960s!”
Cloud Devil: “Do you really care about stuff that happened before you were born? Anyhow, go ask the SQL guy about ‘paxos’ which was invented in 1998, ‘MapReduce’, patented in 2004 and ‘CAP Theorem’ from 2000. Ask about transparent sharding and automatic replication.”
SQL Angel: “The cloud devil is oversimplifying things. Look: SQL databases are just what PROFESSIONALS use.”
Cloud Devil: “Do you want things simple…or complex? Do you want to be like Google or like a Health Insurance company?”
I hope you can see how this debate is going to turn out.
You might wonder what the inevitable popularity of cloud databases will have to do you or your business. That’s where we need to consider the nature of disruptive technologies: they start with the snotty nosed kids. And some of those kids go on to be Larry or Sergey or Mark Z or Kevin Rose or Jerry Yang, and are therefore influential because of their success.
And what about the rest of those kids? Eventually (a few years from now) the other kids arrive at your department, having failed at their dorm-room startup. But they start asking why you’re maintaining your own servers for the customer support site on a managed RDB instead of just deploying it for free into the cloud? After all, not all corporate data is equally secretive.
Then the snotty nosed kid gets the ear of an architect or CIO and asks: “Why don’t we run a cloud database cluster within the firewall? Wouldn’t that be easier than tweaking each relational database to the requirements of each app?” The next thing you know…the old timers need to defend their decision to use “legacy” technology that does not “scale smoothly” and has been rejected by “thought leaders” like “Google”.
I’m not claiming that the old timers will necessarily and consistently lose that debate. Analytics and flexibility are very valuable in the enterprise. But I am saying that by the time you have that debate, you’ll probably have about 15 small internal, departmental systems running on cloud databases that you did not authorize rather than on the “blessed” Oracle installations that you’re paying through the nose for. At some point, the relational database will need to constantly justify its existence, rather than being the default choice.