Do you need a Relational Databases for Big Data ?

2610

INTRODUCTION

Teradata, Greenplum, Netezza, DB2, Oracle’s Exadata aren’t “Big Data” databases, as defined by meaning databases that are routinely used to handle large data sets that are unstructured, rapidly changing and usually with little or vague quality measures for the data.

These databases are relational databases, and all have their strenghts and weaknesses. Again, best is going to be determined by what you are using the database for, the entire ecosystem that the database will live in, and how well it can be maintained and managed.

Database wise for “Big Data” (I hate that term btw – its nothing but marketing fluff), why do you need a database? Databases are great for organizing data into rows and columns – something that the data usually referenced in “Big Data” doesn’t do naturally or well. In fact, if you’re using a database to store “Big Data” then you aren’t really doing “Big Data”.

In a Big Data approach, what you should use is an HDFS system to store data within. Then if you do also need some database functionality, NoSQL, like Mongo or ACID might be appropriate, but again with NoSQL, it depends on what type of NoSQL Database you want to build – Document, Key value, table style, or graph databases.

BTW – Teradata, IBM (Netezza/Pure Systems), Oracle…all have HDFS appliances that use some form of Apache Hadoop. Its usually commodity hardware or purpose built hardware, so the “best” is really going to be dependent upon the ancillary databases around it.