Data Quality

What is Data Quality?

INTRODUCTION Data Quality is an essential characteristic that determines the reliability of data for making decisions. Data quality help you identify revenue opportunities, meet regulatory compliance requirements and respond to customer issues in a timely manner. We’ve all heard of the many horrors...

Cleaning Big Data: Most Time-Consuming Data Science Task

INTRODUCTION A new survey of data scientists found that they spend most of their time massaging rather than mining or modeling data. Still, most are happy with having the sexiest job of the 21st century....


SQL Engines for Hadoop: Hive vs Impala vs Spark

INTRODUCCION Hive, Impala and Spark SQL all fit into the SQL-on-Hadoop category. Apache Hive and Spark are both top level Apache projects. Impala is developed...

What is Big Data?