The cloud has successfully provided file storage since long before it was called the cloud, but does that success translate to databases?
In my previous article, I introduced you to the concept of files in the cloud, whether it's a file-sharing service such as Dropbox or a whole-enterprise backup system using something like Carbonite or CrashPlan. But in this article, I want to focus on a relative newcomer to the playing field, databases in the cloud. These are true relational databases (RDBMSs)—or in the case of NoSQL, anti-RDBMSs in the sky. This article explains the difference between the two and introduces you to some of the players in today's market.
RDBMS vs. NoSQL
If you read Part 1 of this series, you got a little homework assignment: learn about NoSQL and also research and compare ACID (Atomicity, Consistency, Isolation, Durability) and BASE (Basically Available, Soft state, Eventual consistency). These concepts are critical to this discussion because they define the primary decision point when choosing a data storage technology. The two assignments are highly related: while not universally accepted, the term NoSQL is in common enough use that it stands as a good umbrella term for the database technologies that implement features distinct from RDBMSs. The primary differences between the two types of database center on the different consistency models, and that's where ACID and BASE come in.
ACID is the primary way that DB2 operates. If you're a green-screen person like me, you may actually not even be familiar with the concepts of ACID. But think of it this way: when you write a record to a database, you can be assured that the next program that reads that record will get the values you wrote. This is most assuredly not the case with a BASE model. Most of the rest of the ACID concepts have to do with things we associate more with SQL, such as transactions and constraints. In the final analysis, database that adopt the ACID model are always in sync; all database rules are followed, and all the database changes that make up a transaction are either committed together or not at all. BASE implementations do not ensure that the data is consistent across multiple machines, and inconsistencies are handled when they are encountered.
RDBMS in the Sky
This fine-grained, immediate synchronization makes for a very robust infrastructure upon which to build business applications. It does not, however, scale as well horizontally, because the more database servers you have, the more work you have to do to keep them synchronized. Even though we have the ability to remotely access a database using ODBC, opening that access up to the cloud can become difficult. Having hundreds or thousands of simultaneous distributed requests can result in lock waits or other delays that will effectively shut down a distributed application. That doesn't stop vendors from trying, though. Most vendors offer software stacks that you can access through the cloud. In these cases, they're nearly always talking about either dedicated or virtual servers running in a data center someplace, with your software on them. For example, IBM recently purchased a company called SoftLayer that provides cloud services. However, as noted, the primary service provided by SoftLayer is dedicated or virtual servers. These servers are effectively standalone servers; if you lease several servers, they may communicate via IP, but they aren't designed to be part of a larger integrated network. If the idea is to have distributed servers all providing read and write access to a common database, this isn't the way to get there, at least not out of the box.
The other option for RDBMS-based applications is to explore cloud-based managed relational database services. The two primary vendors are no surprise: Amazon and Google. Amazon in particular has so many offerings that you have to be careful which one you choose. For example, Amazon's Elastic Compute Cloud (EC2) is primarily a virtual server farm like the one outlined in the previous paragraph. You can run almost anything you want, including DB2 or Informix, but each server is an island unto itself. But Amazon also provides the Amazon RDS, which is another feature of their overall Amazon Web Services offering. Amazon RDS provides virtual access to various databases, including MySQL, Oracle, PostgreSQL, and Microsoft SQL Server. The other large shareholder in this space is Google's Cloud SQL offering. Less mature and less fully featured than Amazon's, it only has MySQL. Both provide automatic backup and replication along with two-tiered pricing: price per usage as well as flat-fee monthly charges. The interesting thing about Amazon RDS is that, under certain configurations, you can actually scale outward to multiple servers for high-load applications; I haven't found a similar feature in the Google offering.
NoSQL as an Option
Given the problems with relational data in the cloud, you may find that you're not interested in replacing your locally hosted DB2 database. At the same time, you may find that trying to deal with non-relational data—everything from images to PDFs to videos to XML files—is becoming a problem. Many enterprises are switching to CLOB storage for documents (actually writing your documents to a physical DB2 file with a field of type, and while I highly recommend CLOBs as an option for some circumstances, you may decide that it's easier to keep the files. At that point, you have to decide whether it's a better business decision to keep the files on the IFS or not on the IBM i at all.
If you decide to go the latter route, you might find something simple like Dropbox adequate, but you might also want to consider a more robust solution. NoSQL solutions provide a non-relational database access that works well with document-like data. Many of the common NoSQL databases (for example, MongoDB or Couchbase) have cloud offerings through one of the big two vendors or sometimes through their own pages. Cloud services are usually provided either as a dedicated server solution or a virtual configuration. The primary difference between these two is that with dedicated servers you have to work more to keep your physical servers configured and running. Whether that makes sense to you depends on your business model: how frequently and how fast do you need access to this data, and how often do you update it? Those question and others play into the decision of where to host your database.
I'll leave you with one final vendor: Rackspace. This company provides you with a wide range of hosting options, including the ability to configure hybrid environments with cloud-based virtual servers for one part of the system (typically, the front end) and dedicated physical servers for your back-end database processes. If you get a chance, stop by their website and see whether one of their configurations might make sense as a base for your distributed database services.
LATEST COMMENTS
MC Press Online