The common thread behind the fact that Amazon, Microsoft, and Google offer cloud services is that they are outgrowths of the computing infrastructure that each of them used to run some or all of their business. And one of the most awaited services was Google Cloud Spanner, the public version of the database that still accounts for the majority of Alphabet’s business: AdWords.
Today, Google is announcing that Cloud Spanner will carry the same multiregional ACID capabilities of the internal database on which Google has long relied.
Cloud Spanner is Google’s managed relational transaction database that delivers the global scalability and availability associated with NoSQL databases, but with the full consistency, durability, and SQL support for which relational transaction databases are known. But when it first announced last winter, it was limited to single region support.
To recap, the secret sauce of Spanner is the TrueTime, Google’s answer to dealing with constraints of the CAP Theorem. That is the riddle that in a database, you can have two of the following three: consistency, where reads are guaranteed to return the most recent write; availability, where the active node responds without a timeout; or partition tolerance, where the system performs writes and reads even when the data is partitioned. In other words, because a globally distributed database must be partitioned, you will have to sacrifice either consistency or availability.
As anyone who has read Flash Boys understands the constraints that the speed of light imposes. Google contends with the inevitable delay by engineering its own private network to make latency and outages as minimal as possible across global, redundant backbones. Then there are atomic clocks and GPS devices at each data center; they adjust for time differences to ensure that transactions are committed in the same sequence, regardless of location. Finally, Spanner uses its own proprietary Paxos algorithms for determining which updates to commit.
Spanner joins a wave of cloud-native databases that rethink how to manage data. They take liberal advantage of inexpensive storage and fast networks to automatically replicate and distribute data, and introduce new approaches to ACVID based on globally distributed architectures. Amazon Aurora, for instance, re-implements MySQL and Postgres; while the APIs are compatible, underneath, there is a different approach to ACID relying on change logs rather than pages. Microsoft Cosmos DB, the closest relative to Spanner, provides a choice of five levels of consistency for a globally distributed platform; while Oracle has taken advantage of controlling the environment of its own public cloud by introducing a self-running database where configurations will be driven by machine learning.
And if imitation is the sincerest form of flattery, there’s Cockroach DB, developed by one of the members of the original Google team as an open source, clean room rethink of the system without all the custom hardware.
Obviously, it’s early days for Google Cloud Spanner. Having gone GA last May, customers are at the tire-kicking point. The question is, who besides Google needs such a globally scalable, distributed transaction database. Early references are dominated by online SaaS providers like Evernote, Redknee, and Marketo. As to the appeal to mainstream enterprises, if you start looking at scenarios involving global enterprises that are optimizing supply chains or performing algorithmic trading and looking for the right market to place the trace, it shouldn’t be hard to imagine that demand for such a platform exists.
Big Data Analytics