Closely comparing Hadoop Big Data with more traditional Relational Database solutions helps you to more fully understand the advantages and drawbacks of each. If you want to engage in more meaningful IT-related discussions and make more informed business decisions, knowing more about available technologies and techniques is a key first step. As Viktor Mayer-Schönberger and Kenneth Cukier put it:
Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.
What is Hadoop?
According to Client Spectrum software partner, SAS:
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.[i]
A more concise colleague put it this way:
Hadoop is a technology architecture that makes use of commodity hardware in a highly distributed and scalable fashion, enabling fast data retrieval at a lower cost.
Both definitions are admirably succinct explanations, and both show how the world (and the market) are transforming the way both small and large amounts of data are collected and stored. It’s time to get on board.
Hadoop Big Data Vs. Relational Databases
To see how well Hadoop Big Data stands up against Relational Database solutions like IBM Campaign (formerly IBM Unica), we compared the two, designating seven different characteristics from the outset. In our study, Hadoop Big Data and traditional Relational Databases went head-to-head in the following arenas:
- IT support
- Static customer profiles
- Unstructured data
- Real-time interaction
- High Volume.
Figure 1 reveals that Hadoop has the upper hand in the last three categories: unstructured data, real-time interaction and the ability to handle high volumes of data. That’s where the “Big Data” comes in.
With more and more organizations realizing the potential of more comprehensive quantities of data to flesh out CRM platforms, streamline data to current marketing solutions or enhance ongoing Business Intelligence (BI) initiatives, Big Data solutions like Hadoop are very attractive.
If you look back at Figure 1, however, you’ll see that Hadoop Big Data is no cure-all. In fact, more traditional relational databases are still superior when it comes to security, IT support, static customer profiles and profile integration.
And why is that?
Schema “On Read” vs. Schema “On Write”
Hadoop Big Data and Relational Databases function in markedly different ways.
Relational databases follow a principle known as Schema “On Write.” Hadoop uses Schema “On Read.”
When writing data, in IBM Campaign for example, using Schema “On Write” takes information about data structures into account. The data is then used to construct tables, joins, rules and constraints. This approach gives users the advantage of maintaining clean data, which enforces specific rules and structures.
Hadoop, on the other hand, uses a Schema “On Read” approach, in which it typically “dumps” data by effectively ignoring all structure when writing, resulting in “unstructured” data. As a result, cleaning and interpreting data is left to whoever is querying Hadoop during the “read.”
Implications and Consequences
The absence of identifiable rules, constraints and overall structure makes it difficult to maintain a static customer profile that is unambiguous while excluding duplicate data. Relational databases are more suited to storing and maintaining clear systems of customer records, especially with critical information. Hadoop isn’t looking for a specific, single column or row. Hadoop searches for patterns, probabilities and ambiguous recurrences.
Your organization may have already invested in advanced tools — like ETL, or “Extract, Transform, Load” — that do not easily transfer to Hadoop. What’s more, chances are that your organization has already based its applications, such as IBM Campaign – and maybe its entire infrastructure — on relational databases.
Through it all, it is important to remember that technologies, requirements, skillsets and objectives can, and will, change. Learn all you can and ask the right questions.
Speaking of questions, If you’d like to know more about Hadoop, take a look at our webinar: Connecting Hadoop Big Data to IBM Campaign & Interact, in which we discuss how one Client Spectrum customer is efficiently leveraging Hadoop Big Data within a marketing automation ecosystem and more effectively advancing a variety of nurturing campaigns.
Mayer-Schönberger, Viktor and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Houghton Mifflin Harcourt, 2013, p. 7.