Home Infrastructure

Infrastructure

The 2017 Big Data Landscape

IS BIG DATA STILL A THING? Observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle.  As a result, it took...

What is a Graph Database?

INTRODUCTION We live in a connected world. There are no isolated pieces of information, but rich, connected domains all around us. Only a database that embraces relationships as a core aspect of its data model...

SQL vs NoSQL: High-Level Differences

INTRODUCTION Most of you are already familiar with SQL database, and have a good knowledge on either MySQL, Oracle, or other SQL databases. In the last several years, NoSQL database is getting widely adopted to...

Do you need a Relational Databases for Big Data ?

INTRODUCTION Teradata, Greenplum, Netezza, DB2, Oracle's Exadata aren't "Big Data" databases, as defined by meaning databases that are routinely used to handle large data sets that are unstructured, rapidly changing and usually with little or...

Hadoop Ecosystem Table

INTRODUCTION The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using...

What is Hadoop?

INTRODUCTION Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global...

How to choose a data format

INTRODUCTION It’s easy to become overwhelmed when it comes time to choose a data format. Picture it: you have just built and configured your new Hadoop Cluster. But now you must figure out how to...

Apache Spark Basics

INTRODUCTION Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers. BASICS Apache Spark can process data from a variety of data repositories, including the Hadoop...

SQL Engines for Hadoop: Hive vs Impala vs Spark

INTRODUCCION Hive, Impala and Spark SQL all fit into the SQL-on-Hadoop category. Apache Hive and Spark are both top level Apache projects. Impala is developed by Cloudera and shipped by Cloudera, MapR, Oracle and Amazon....

Hive vs Impala

Although Hive is popular in Hadoop world, it has its own drawback, like excessive Map/Reduce operations for certain queries and JVM overhead during Map/Reduce. Impala is designed to improve the query performance accessing data...

POPULAR POST

SQL Engines for Hadoop: Hive vs Impala vs Spark

INTRODUCCION Hive, Impala and Spark SQL all fit into the SQL-on-Hadoop category. Apache Hive and Spark are both top level Apache projects. Impala is developed...

What is Big Data?