Home Blog

Machine learning vs Deep learning


Understanding the latest advancements in artificial intelligence can seem overwhelming, but it really boils down to two concepts you’ve likely heard of before: machine learning and deep learning. These terms are often thrown around in ways that can make them seem like interchangeable buzzwords, hence why it’s important to understand the differences.

And those differences should be known! Examples of machine learning and deep learning are everywhere. It’s how Netflix knows which show you’ll want to watch next or how Facebook knows whose face is in a photo. And it’s how a customer service representative will know if you’ll besatisfied with their support before you even take a customer satisfaction (CSAT) survey.

So what are these concepts that dominate the conversations about artificial intelligence and how exactly are they different?

What’s machine learning

Here’s a basic definition of machine learning:

“Algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions”

An easy example of a machine learning algorithm is an on-demand music streaming service. For the service to make a decision about which new songs or artists to recommend to a listener, machine learning algorithms associate the listener’s preferences with other listeners who have similar musical taste.

Machine learning fuels all sorts of automated tasks and spans across multiple industries, from data security firms hunting down malware to finance professionals looking out for favorable trades. They’re designed to work like virtual personal assistants, and they work quite well.

Machine learning is a lot of complex math and coding that, at the end of day, serves a mechanical function the same way a flashlight, a car, or a television does. When something is capable of “machine learning”, it means it’s performing a function with the data given to it, and gets progressively better at that function. It’s like if you had a flashlight that turned on whenever you said “it’s dark”, so it would recognize different phrases containing the word “dark”.

Now, the way machines can learn new tricks gets really interesting (and exciting) when we start talking about deep learning.

Deep learning vs Machine learning

In practical terms, deep learning is just a subset of machine learning. It technically is machine learning and functions in a similar way (hence why the terms are sometimes loosely interchanged), but its capabilities are different.

Basic machine learning models do become progressively better at whatever their function is, but they still some guidance. If an ML algorithm returns an inaccurate prediction, then an engineer needs to step in and make adjustments. But with a deep learning model, the algorithms can determine on their own if a prediction is accurate or not.

Let’s go back to the flashlight example: it could be programmed to turn on when it recognizes the audible cue of someone saying the word “dark”. Eventually, it could pick up any phrase containing that word. Now if the flashlight had a deep learning model, it could maybe figure out that it should turn on with the cues “I can’t see” or “the light switch won’t work”. A deep learning model is able to learn through its own method of computing – its own “brain”, if you will.

How does deep learning work?

A deep learning model is designed to continually analyze data with a logic structure similar to how a human would draw conclusions. To achieve this, deep learning uses a layered structure of algorithms called an artificial neural network (ANN). The design of an ANN is inspired by the biological neural network of the human brain. This makes for machine intelligence that’s far more capable than that of standard machine learning models.

It’s a tricky prospect to ensure that a deep learning model doesn’t draw incorrect conclusions (which is probably what keeps Elon up at night), but when it works as it’s intended to, functional deep learning is a scientific marvel and the potential backbone of true artificial intelligence.

A great example of deep learning is Google’s AlphaGo. Google created a computer program that learned to play the abstract board game called Go, a game known for requiring sharp intellect and intuition. By playing against professional Go players, AlphaGo’s deep learning model learned how to play at a level not seen before in artificial intelligence, and all without being told when it should made a specific move (as it would with a standard machine learning model). It caused quite a stir when AlphaGo defeated multiple world-renowned “masters” of the game; not only could a machine grasp the complex and abstract aspects of the game, it was becoming one of the greatest players of it as well.

To recap the differences between the two:

    • Machine learning uses algorithms to parse data, learn from that data, and make informed decisions based on what it has learned
    •  Deep learning structures algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own
    • Deep learning is a subfield of machine learning. While both fall under the broad category of artificial intelligence, deep learning is what powers the most human-like artificial intelligence

A simple explanation

We get it – all of this might still seem complicated. The easiest takeaway for understanding the difference between machine learning and deep learning is to know that deep learning is machine learning.

More specifically, it’s the next evolution of machine learning – it’s how machines can make their own accurate decisions without a programmer telling them so.

An analogy to be excited about

Another thing to be excited about with deep learning, and a key part in understanding why it’s becoming so popular, is that it’s powered by massive amounts of data. The “Big Data Era” of technology is providing huge amounts of opportunities for new innovations in deep learning. We’re bound to see things in the next 10 years that we can’t even fathom yet.

Andrew Ng, the chief scientist of China’s major search engine Baidu and one of the leaders of the Google Brain Project, shared a great analogy for deep learning with Wired Magazine: “I think AI is akin to building a rocket ship. You need a huge engine and a lot of fuel,” he told Wired journalist Caleb Garling. “If you have a large engine and a tiny amount of fuel, you won’t make it to orbit. If you have a tiny engine and a ton of fuel, you can’t even lift off. To build a rocket you need a huge engine and a lot of fuel.”

“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.”

– Andrew Ng (source: Wired)


So what do machine learning and deep learning mean for customer service?

Many of today’s AI applications in customer service utilize machine learning algorithms, primarily to drive self-service, increase agent productivity, and make workflows more reliable. The data fed into those algorithms comes from a constant flux of incoming customer queries, which in turn leads to quick and accurate predictions. Artificial intelligence is an exciting prospect for many business, and industry leaders speculate that the most practical applications of business-related AI will be for customer service.

And as deep learning becomes more refined, we’ll see even more advanced applications of artificial intelligence in customer service. A great example is Zendesk’s own Answer Bot, which incorporates a deep learning model to understand the context of a support ticket and learn which help articles it should suggest to a customer.

Expect to see even more innovative applications of deep learning in the near future (especially in self-service), and expect machines to provide even better personalized assistance to customer service representatives.

Source: https://www.zendesk.com/blog/machine-learning-and-deep-learning/

Top 10 Most Popular AI Models


While AI and ML provide ample possibilities for businesses to improve their operations and maximize their revenues, there is no such thing as a “free lunch.”

While Artificial Intelligence and Machine Learning provide ample possibilities for businesses to improve their operations and maximize their revenues, there is no such thing as a “free lunch.”

The “no free lunch” problem is the AI/ML industry adaptation of the age-old “no one-size-fits-all” problem. The array of problems the businesses face is huge, and the variety of ML models used to solve these problems is quite wide, as some algorithms are better at dealing with certain types of problems than the others. Thus said, one needs a clear understanding of what every type of ML models is good for, and today we list 10 most popular AI algorithms:

1. Linear regression

2. Logistic regression

3. Linear discriminant analysis

4. Decision trees

5. Naive Bayes

6. K-Nearest Neighbors

7. Learning vector quantization

8. Support vector machines

9. Bagging and random forest

10. Deep neural networks

We will explain the basic features and areas of application for all these algorithms below. However, we have to explain the basic principle of Machine Learning beforehand.

All Machine Learning models aim at learning some function (f) that provides the most precise correlation between the input values (x) and output values (y). Y=f(X)

The most common case is when we have some historical data X and Y and can deploy the AI model to provide the best mapping between these values. The result cannot be 100% accurate, as otherwise, this would be a simple mathematical calculation without the need for machine learning. Instead, the f function we train can be used to predict new Y using new X, thus enabling the predictive analytics. Various ML models achieve this result by employing diverse approaches, yet the main concept above remains unchanged.

Linear Regression

Linear regression is used in mathematical statistics for more than 200 years as of now. The point of the algorithm is finding such values of coefficients (B) that provide the most impact on the precision of the function f we are trying to train. The simplest example is
y= B0 + B1 * x,
where B0 + B1 is the function in question


By adjusting the weight of these coefficients, the data scientists get varying outcomes of the training. The core requirements for succeeding with this algorithm is having the clear data without much noise (low-value information) in it and removing the input variables with similar values (correlated input values).

This allows using linear regression algorithm for gradient descent optimization of statistical data in financial, banking, insurance, healthcare, marketing, and other industries.

Logistic Regression

Logistic regression is another popular AI algorithm, able to provide binary results. This means that the model can both predict the outcome and specify one of the two classes of the y value. The function is also based on changing the weights of the algorithms, but it differs due to the fact that the non-linear logic function is used to transform the outcome. This function can be represented as an S-shaped line separating the true values from false ones.

The success requirements are the same as for linear regression — removing the same value input samples and reducing the quantity of noise (low-value data). This is quite a simple function that can be mastered relatively fast and is great for performing the binary classification.

Linear Discriminant Analysis (LDA)

This is a branch of the logistic regression model that can be used when more than 2 classes can exist in the output. Statistical properties of the data, like the mean value for every class separately and the total variance summed up for all classes, are calculated in this model. The predictions allow to calculate the values for each class and determine the class with the most value. To be correct, this model requires the data to be distributed according to the Gaussian bell curve, so all the major outliers should be removed beforehand. This is a great and quite simple model for data classification and building the predictive models for it.

Decision Trees

This is one of the oldest, most used, simplest and most efficient ML models around. It is a classic binary tree with Yes or No decision at each split until the model reaches the result node.

This model is simple to learn, it doesn’t require data normalization and can help to solve multiple types of problems.

Naive Bayes

Naive Bayes algorithm is a simple, yet very strong model for solving a wide variety of complex problems. It can calculate 2 types of probabilities:

1. A chance of each class appearing

2. A conditional probability for a standalone class, given there is an additional x modifier.

The model is called naive as it operates on the assumption that all the input data values are unrelated to each other. While this cannot take place in the real world, this simple algorithm can be applied to a multitude of normalized data flows to predict results with a great degree of accuracy.

K-Nearest Neighbors

This is quite a simple and very powerful ML model, using the whole training dataset as the representation field. The predictions of the outcome value are calculated by checking the whole data set for K data nodes with similar values (so-called neighbors) and using the Euclidian number (which can be easily calculated based on the value differences) to determine the resulting value.

Such datasets can require lots of computing resources to store and process the data, suffer the accuracy loss when there are multiple attributes and have to be constantly curated. However, they work extremely fast, are very accurate and efficient at finding the needed values in large data sets.

Learning Vector Quantization

The only major downside of KNN is the need to store and update huge datasets. Learning Vector Quantization or LVQ is the evolved KNN model, the neural network that uses the codebook vectors to define the training datasets and codify the required results. Thus said, the vectors are random at first, and the process of learning involves adjusting their values to maximize the prediction accuracy.

Thus said, finding the vectors with the most similar values results in the highest degree of accuracy of predicting the value of the outcome.

Support Vector Machines

This algorithm is one of the most widely discussed among data scientists, as it provides very powerful capabilities for data classification. The so-called hyperplane is a line that separates the data input nodes with different values, and the vectors from these points to the hyperplane can either support it (when all the data instances of the same class are on the same side of the hyperplane) or defy it (when the data point is outside the plane of its class).

The best hyperplane would be the one with the largest positive vectors and separating the most of the data nodes. This is an extremely powerful classification machine that can be applied to a wide range of data normalization problems.

Random Decision Forests or Bagging

Random decision forests are formed of decision trees, where multiple samples of data are processed by decision trees and the results are aggregated (like collecting many samples in a bag) to find the more accurate output value.

Instead of finding one optimal route, multiple suboptimal routes are defined, thus making the overall result more precise. If decision trees solve the problem you are after, random forests are a tweak in the approach that provides an even better result.

Deep Neural Networks

DNNs are among the most widely used AI and ML algorithms. There are significant improvements in deep learning-based text and speech apps, deep neural networks for machine perception and OCR, as well as using deep learning to empower reinforced learning and robotic movement, along with other miscellaneous applications of DNNs.

Final Thoughts on 10 Most Popular AI Algorithms

As you can see, there is an ample variety of AI algorithms and ML models. Some are better suited for data classification, some excel in other areas. No model fits all sizes, so choosing the best one for your case is essential.

How to know if this model is the right one? Consider the following factors:

  1. The 3 V’s of Big Data you need to process (volume, variety, and velocity of input)
  2. The number of computing resources at your disposal
  3. The time you can spend on data processing
  4. The goal of data processing

Thus said, if some model provides 94% prediction accuracy at the cost of twice longer processing time, as compared to a 86% accurate algorithm — the variety of choices grows greatly.

However, the biggest problem usually is the general lack of high-level expertise needed to design and implement the data analysis and Machine learning solution. This is why most of the businesses choose one of Managed Services Providers specializing in Big Data and AI solutions.

Source: https://dzone.com/articles/top-10-most-popular-ai-models

What Is The Difference Between Business Intelligence And Analytics?



If someone puts you on the spot, could you tell him/her what the difference between business intelligence and analytics is? If you feel a bit uncertain about the specifics here, you’re not alone, experts aren’t in agreement either! There is not a clear line between business intelligence and analytics, but they are extremely connected and interlaced in their approach towards resolving business issues and providing insights on past and present data, and defining future decisions. While some experts try to underline that business analytics focuses, also, on predictive modeling and advanced statistics to evaluate what will happen in the future, BI is more focused on the present moment of data, making the decision (and future of a company) based on current insights. But let’s see in more detail what experts say and how can we connect and differentiate the both.

We already saw earlier this year the benefits of Business Intelligence and Business Analytics. Let’s dig deeper now and figure out what this is all about, what makes them different, and how they are complementary to each other.

What Do The Experts Say?

In an article tackling BI and Business Analytics, Better Buys asked seven different BI pros what their thoughts were on the difference between business intelligence and analytics. Each and every professional had a different take. Here are a few snippets of their opinions:

“BI is needed to run the business while Business Analytics are needed to change the business.” – Pat Roche, Vice President of Engineering at Magnitude Software

“BI is looking in the rearview mirror and using historical data. Business Analytics is looking in front of you to see what is going to happen.” – Mark van Rijmenam, CEO / Founder at BigData-Startups

“What’s the difference between Business Analytics and Business Intelligence? The correct answer is: everybody has an opinion, but nobody knows, and you shouldn’t care.”  Timo Elliot, Innovation Evangelist at SAP

Well, what if you do care about the difference between business intelligence and data analytics? It doesn’t matter if you run a small business operation or enterprise, if you have to make decisions that will affect you in the short or long run, it is wise to use both. Business intelligence and analytics will provide a company with a holistic view of the raw data and make decisions more successful and cost-efficient.

What Is Business Intelligence And Analytics?

Business intelligence and analytics are data management solutions implemented in companies and enterprises to collect historical and present data, while using statistics and software to analyze raw information, and deliver insights for making better future decisions.

Let’s face it: both terms provide insights into the business operation and future decisions, but it comes down to the differences into how they do it and what information exactly do they provide.

It seems clear that there isn’t one standard “correct” definition of the differences between the two terms. The varying opinions given by the experts is evidence of that. So, instead of trying to find the “right” answer, let’s find a useful distinction between the two that can be used simply and clearly to help you in your work. The most straightforward and useful difference between business intelligence and data analytics boils down to two factors:

  1. What direction in time are we facing; the past or the future?
  2. Are we concerned with what happened, how it happened, or why it happened?

Keeping in mind that this is all a matter of opinion, here are our simplified definitions of business intelligence vs business analytics.

Business intelligence – Deals with what happened in the past and how it happened leading up to the present moment. It identifies big trends and patterns without digging too much into the why’s or predicting the future.

Business analytics – Deals with the why’s of what happened in the past. It breaks down contributing factors and causality. It also uses these why’s to make predictions of what will happen in the future.

Confused yet? Let’s use an example from football as a metaphor to help clarify things.

Business Intelligence vs Business Analytics As Seen Through Football

Let’s say you’re on the coaching staff of a football team and you want to review the most recent game. You do this to see how you can fix your errors and replicate your successes.

Using our previous definitions, BI would be the process of identifying all the statistics and plays that led to your team winning. It would identify that you kept possession of the ball for much longer than your opponents. It would also identify the trend that your right side of the field was instrumental in retaining possession through excellent passing.

Business analytics would be more concerned with why you had possession of the ball for longer than your opponent and why your right side of the field did so well at passing.

Was it because:

  • Your opponent’s defenders on that side were weaker players than their defenders on the other?
  • Your right-side players had been putting in more time on the field together then your left side?
  • One of your players on the right was simply having a phenomenal performance which carried over to the rest of that side?

These questions are important. They allow you to figure out how you can replicate your success, or prevent your failure in the future. Asking the right business intelligence questions will lead you to better analytics. While using a business dashboard, all the insights can be simplified into a single place, making the time for meaningful decisions much faster. But first, we need to analyze the difference more, as that will help us to understand what to do in a company’s operation process, and how to chose the best tool to manage your insights.

Without further ado, let’s dive deeper into the difference between business intelligence and data analytics. In order to do so, we need to examine the distinction between correlation and causation.

Correlation Is Not Causation

When two things are correlated, it means that when one happens, the other tends to happen at the same time. When two things have a causal relationship, it means that one thing leads directly or indirectly to the other happening.

A famous example of the difference between these two is the fact that ice cream consumption and city homicide rates are highly correlated. Now, of course, ice cream does not cause people to murder each other. So clearly there is not a causal relationship.

The two are correlated due to the fact that homicide rates rise when temperatures rise in the late summer. It is theorized that since warmer weather brings more people outside, this leads to more social interaction, some of which is violent.

You Can’t Always Trust What You See

You can find examples of people confusing correlation and causation everywhere you look. For example, that muscular person at the gym who always likes to give you work out advice may or may not actually know what they are talking about. The advice they’re giving you, while correlated with being known by a muscular person, may not actually lead to being muscular. Instead, they may simply have good genetics. They may be muscular not because of their knowledge, but actually in spite of it.

Moving into the lighter side of things, there are some hilarious examples of things being correlated that clearly don’t have a causal relationship. Many of them are shown on the website Spurious Correlations. For example, divorce rates in Maine are very closely correlated with per capita consumption of margarine… Maybe married couples should switch to butter instead?


Source: Spurious Correlations

In all seriousness, it can be extremely difficult, depending on the field, to separate correlation and causation. Very large scale and expensive research trials are often done just to find evidence of causal relationships. Also, a famous example would be the butterfly effect. But we won’t go that much into details, and, actually, examine more the business side of things, and, therefore, concentrate on the specifics of business intelligence vs data analytics, and provide insights on correlation and causation in the business realm.

How Does This Apply To Business?

Can you understand the factors that are causing your business success or failure rather than just the factors that are associated with your business success or failure? If so, it’s much more likely that you will be able to predict the future in your marketplace and act accordingly. However, it’s important to note that you need to know what’s correlated with something before you can know causation.

In other words, you need to know what happened and how it happened (BI) before you have the ability to say why things happened (BA) with any reasonable degree of certainty.

That is the difference between business intelligence and analytics, and that’s why both of them are crucial. They fit together like two pieces of a jigsaw puzzle –  a puzzle that helps your business to be more profitable. It is of crucial importance to define and use KPI examples that will help to establish a business goal and execute the correlation and causation of business analytics vs business intelligence. While it may sound complicated at the beginning, the more you dig deeper with a data analysis tool, the more sense it will make to establish qualified insights and make better decisions. That is all about: the difference between business intelligence and business analytics is important to understand because it helps to prepare a company for adjusting its operations into a cost-effective and insightful way. Using both into the process of creating a successful business intelligence strategy, will only make a company more competitive on the market.

Exclusive Bonus Content: BI vs Analytics: What Is The Difference?
Get our “at a glance” guide to learn the difference between the two!

Use-Case Scenarios

Enough with the descriptions and metaphors. Let’s solidify things and wrap up this post with business examples, illustrating the difference between business intelligence and business analytics.

Let’s say you work for a marketing firm that uses both business intelligence and analytics to help large e-commerce companies launch new products. In order to understand what new products would be most likely to succeed (analytics), you would need to figure out:

  • What products had been most successful in the past (BI)
  • The seasonal trends that had influenced success for past launches (BI)
  • Why customers bought the past successful products (BA)

For example, let’s say that your hypothetical e-commerce store sold boutique women’s fashion. You will need to work with your retail analytics to understand what products will work.

First, you would examine what categories of clothing are driving the most profits. Then, you can examine what times in the year those successful products had been launched. Finally, you could do a series of in-depth customer interviews in order to figure out why customers liked those pieces or categories more than the others.

If you did enough market research, and you had a large enough sample size, you should be able to predict with a great deal of accuracy which new products would be likely to succeed.

This could lead to surprises in the way that you think about your products because your customers often have a very different way of looking at your products than you do.

BI and Analytics Dismantle Assumptions

For example, maybe your assumption was that your customers mainly cared about the price point of your garments.

After your research, however, you found your customers were actually willing to spend more on your products if you emphasized your humane sourcing practices, such as not utilizing sweatshops.

Then, your focus would be on continuing to use that positioning in your marketing messages as opposed to worrying about the price points of your garments so much when doing a product launch.

The above example illustrates one of the fundamental important points of business intelligence and analytics. Your assumptions about your company, your customers, your marketplace, and your products, are often flat out wrong – or at the very least, incomplete. After asking the right questions, analytics are here to help – whichever your industry or sector, be it healthcare analytics or financial business intelligence, you need to use both BI and BA for success.

Business Intelligence And Analytics Industry Examples

It’s quite clear that the difference between the both can be examined through real-life examples, so let’s analyze few industries that can show the value of both terms.

Human Resources: What are my recruiting options?

In Human Resources it’s all about workforce: engagement of employees, overtime hours, training costs, the overall productivity, cost per hire, recruiting conversion rate, time to fill a position, retention efficiency, part-time employees, etc. When you establish the right HR KPIs for your business, you need to dig deeper into the what happened and how (BI), and then why it happened (BA), to understand how to perform in the future. Let’s have a simple look into one full-scale dashboard and see how business analytics vs business intelligence performs.

This online dashboard above is created for a simple, yet effective overview of the recruiting process in a company. It can be used by a recruiting agency, or in-house – it is meant for HR managers and professionals that need more data and insight into their process, to define future decisions and decrease costs. The goal is to find the right recruitment approach, giving you the best candidates at the lowest cost. To put it in practice, you want to define what kind of process and what happened during the recruiting process, alongside how it happened (BI), and the next question would be whyit happened the way you see it on this dashboard (business analytics). Let’s say that the average time to fill a position (by a department, in days) didn’t go as planned. You can inspect more and see that the conversion rate of the recruiting professionals didn’t go as expected, and you have lost precious time and resources to keep up with the market (you have used your historical data and connected it to the present moment – found out that you are losing resources). The average costs of hiring will help you determine the patterns that are occurring as a part of the recruiting cycle. By grasping these data with an online data visualization tool, the amount of time needed to gain those insights will be reduced and could be used in other business processes.

In this example, we can define what happened, how, and then why. Don‘t be afraid to do your own analysis and create your own HR report that will help you showcase the power of business intelligence vs analytics. This is important since you want to know and define the influences on your operations to consider future undertakings; you want to know what happened, how it happened and why. This is the holistic formula of a successful business.

But let’s dig deeper into other industries.

Procurement: Is it possible to outperform my supply delivery process?

Another example that we can show you to better see what is business intelligence and analytics, and then you can also explore additionally by yourself, is the procurement dashboard, expounding on the supplier delivery performance. As mentioned before, after you establish your indicators, in this case, procurement KPIs, you can dig deeper into the analysis of the business processes to establish a better performance and decide on future company aspects of success. Business intelligence analytics is often used together (even in the wording), which can help you to get a holistic overview, like in the dashboard presented above. It doesn’t mean it cannot be used separately, but to make better decisions, you need the best tools you can utilize in this competitive market. That being said, business intelligence vs analytics can show the mentioned correlations and causations that will provide an extensive value to the general business operations and future reasoning of important decisions.

With our last example, we will wrap up what business intelligence analytics can do for a company and how to use it. The advantages are clear, but what about the indispensable features a simple visual overview can provide you with? Using your raw data and assembling a visual representation of all your important performance, historical and present intelligence, you can create a powerful insight tool that will gather and connect the most significant acumen a business needs to manage their small, mid, and big-scale operations, while making balanced decisions and creating a sustainable process. Let’s see this through an example.

Sales: How to decrease the sales cycle lenght?

The sales dashboard visualized above reflects on the sales cycle needed to perform the complete process – from potential opportunity to a paid invoice. While compiling the historical data (calculating the average in a define time-preset), with the present insights and trends that are occurring in the sales process, we can dig deeper into the BI of the cycle. We can examine the sales funnel (which can, also, be customized by the particular needs of a business or department), what were the trends and patterns happening during the sales funnel stage, how it affected the complete sales cycle, and who were the top performing representatives from the team. By drilling down the productivity, outperforming processes, the ones that have the less amount of efficiency, a company can easily spot what is working and what is not. If you see the average sales cycle length of 18 days, but your benchmarks are telling you that it should be no more than 15 days, then tackling deeper into the BI angle of conducting research can give you the answer where those 3 days are underperforming. This will give an extra edge for the next sales cycle, as you can easily pinpoint what is the issue, and brainstorm solutions.

By detailing the factors that caused these insights (in plain language, why something happened), adding predictive analytics and examining the, already mentioned, why of these processes, a business can utilize the business analytics point of view – that will help to gather the interconnected data into a comprehensive data-story. If you tackle into the raw sets of data, and leverage the power of statistics to predict the future of your performance, then you have taken the advantage of the entire sequence of the business intelligence vs data analytics sphere.

Exclusive Bonus Content: BI vs Analytics: What Is The Difference?
Get our “at a glance” guide to learn the difference between the two!

To conclude the matter that we have examined in this article, with the aim to differentiate both terms, used separately and in correlation with each other, you can now establish which will perform better during your decision-making process and what can you expect from both terms. Nevertheless, we would like to stress the fact that through the complementary uses of business intelligence and business analytics, you can unpack your assumptions and get more accurate and useful data. Put simply, BI and BA give you the tools to see reality as clearly as possible. That being said, the significance of both will affect business operations of a company, be it small or big, and the need to combine both will certainly become an operational must-have for a successful market presence.

Source: https://www.datapine.com/blog/difference-between-business-intelligence-and-analytics/

2018 Top 10 Business Intelligence Trends


Whether you’re a data rockstar or an IT hero or an executive building your BI empire, these 10 Business Intelligence Trends could help take your organization to the next level.

1.- How Machine Learning Will Enhance the Analyst

Popular culture is fueling a dystopian view of what machine learning can do. But while research and technology continue to improve, machine learning is rapidly becoming a valuable supplement for the analyst. In fact, machine learning is the ultimate assistant to the analyst.

Imagine needing to quickly look at the impact of a price change on a given product. To do this, you would run a linear regression on your data. Before Excel, R or Tableau, you had to do this all manually and the process took hours. Thanks to machine learning, you can now see the product’s consumption in a matter of minutes, if not seconds. As an analyst, you don’t need to do that heavy lifting, and you can move onto the next question—were the higher consumption months due to an extrinsic factor such as a holiday? Was there a new release? Was there news coverage influencing product purchase or awareness? What you’re not thinking about is how you wish you could have spent more time perfecting your regression model.

Machine Learning helps you look under lots and lots of rocks when you need assistance getting an answer.

There are two ways in which machine learning assists the analyst. The first is efficiency. With the example above, the analyst doesn’t spend valuable time on basic math. The analyst now has more time to think about business implications and the next logical steps. Secondly, it helps the analyst explore and stay in the flow of their data analysis because they no longer have to stop and crunch the numbers. Instead, the analyst is asking the next question. As Ryan Atallah, Staff Software Engineer describes it, “ML helps you look under lots and lots of rocks when you need assistance getting an answer.”

Machine learning’s potential to aid an analyst is undeniable, but it’s critical to recognize that it should be embraced when there are clearly defined outcomes. “Machine learning is not great when your data is subjective,” says Andrew Vigneault, Staff Product Manager with Tableau. For example, when conducting a survey to customers about product satisfaction, ML cannot always pick up on qualitative words.

Additionally, the analyst needs to understand success metrics for the data to make sense of it in a way that is actionable. In other words, inputs into a machine don’t make the outputs meaningful. Only a human can understand if the right amount of context has been applied—which means that machine learning cannot be done in isolation (without an understanding of the model and what inputs/outputs are being made).

While there might be concern over being replaced, machine learning will actually supercharge analysts and make them more efficient, more precise, and more impactful to the business. Instead of fearing machine learning technology, embrace the opportunities it presents.


IDC forecasts revenues from AI and machine learning systems to total $46 billion by 2020.


In 2020, AI will become a positive net job motivator, creating 2.3 million jobs while eliminating only 1.8 million jobs. (Gartner)

2.- The Human Impact of Liberal Arts in the Analytics Industry

As the analytics industry continues to seek skilled data workers, and organizations look to elevate their analytics team, we may have had a plethora of talent at our fingertips all along. We are familiar with how art and storytelling has helped influence the data analytics industry. That doesn’t come as a surprise. What comes as a surprise is how the technical aspects of creating an analytical dashboard, previously reserved for IT and power users, is being taken over by users who understand the art of storytelling—a skill set primarily coming from the liberal arts. Furthermore, organizations are placing a higher value on hiring workers who can use data and insights to affect change and drive transformation through art and persuasion, not only on the analytics itself.

As technology platforms become easier to use, the focus on tech specialties decreases. Everyone can play with data without needing to have the deep technical skills once required. This is where people with broader skills, including the liberal arts, come into the fold and drive impact where industries and organizations have a data worker shortage. As more organizations focus on data analytics as a business priority, these liberal arts data stewards will help companies realize that empowering their workforce is a competitive advantage.

Not only do we see a broad-base appeal to help hire a new generation of data-workers, we’ve also observed several instances where technology-based companies were led or heavily impacted by founders with a liberal arts education. This includes founders and executives from Slack, LinkedIn, PayPal, Pinterest and several other high-performing technology companies.

It takes a certain amount of skill to build a dashboard and to do analysis, but there’s something that isn’t really something you can teach—and that’s really about the way you tell a story with the data. 

One powerful example of bringing in the liberal arts to a predominantly technology company comes from Scott Hartley’s recent book, “the Fuzzy and the Techie.” Nissan hired a PhD anthropologist Melissa Cefkin to lead the company’s research into human-machine interaction, and specifically the interaction between self-driving cars and humans. The technology behind self-driving vehicles has come a long way, but still faces hurdles when mixed human-machine environments persist. Using a four-way stop as an example, humans typically analyze situations on a case-by-case basis, making it nearly impossible to teach a machine. To help combat this scenario, Cefkin was tasked with leveraging her anthropology background to identify patterns in human behavior that can better teach these self-driving cars the patterns that humans follow, and in turn, communicate those back to the human riding in the car.

As analytics evolves to be more art and less science, the focus has shifted from simply delivering the data to crafting data-driven stories that inevitably lead to decisions. Organizations are embracing data at a much larger scale than ever before and the natural progression means more of an emphasis on storytelling and shaping data. The golden age of data storytelling is upon us and somewhere within your organization is a data storyteller waiting to uncover your next major insight.


Liberal arts grads are joining the tech workforce 10% more rapidly than technical grads. (LinkedIn)


One third of all Fortune 500 CEOs have liberal arts degrees. (Fast Company)

3.- The Promise of Natural Language Processing

2018 will see natural language processing (NLP) grow in prevalence, sophistication, and ubiquity. As developers and engineers continue to refine their understanding of NLP, the integration of it into unrealized areas will also grow. The rising popularity of Amazon Alexa, Google Home, and Microsoft Cortana have nurtured people’s expectations that they can speak to their software and it will understand what to do. For example, by stating a command, “Alexa, play ‘Yellow Submarine’,” the Beatles’ hit plays in your kitchen while making dinner. This same concept is also being applied to data, making it easier for everyone to ask questions and analyze the data they have at hand.

Gartner predicts by 2020 that 50 percent of analytical queries will be generated via search, NLP or voice. This means that suddenly it will be much easier for the CEO on the go to quickly ask his mobile device to tell him: “Total sales by customers who purchased staples in New York,” then filter to “orders in the last 30 days,” and then group by “project owner’s department.” Or, your child’s school principal could ask: “What was the average score of students this year,” then filter to “students in 8th grade,” and group by “teacher’s subject.” NLP will empower people to ask more nuanced questions of data and receive relevant answers that lead to better everyday insights and decisions.

[NLP] can open the analysts’ eyes a little bit and gives them some self-assurance and some confidence in what they’re able to do. 

Simultaneously, developers and engineers will make great strides in learning and understanding how people use NLP. They will examine how people ask questions, ranging from instant gratification (“which product had the most sales?”) to exploration (“I don’t know what my data can tell me—how’s my department doing?”). As Ryan Atallah, Staff Software Engineer for Tableau, notes, “This behavior is very much tied to the context in which the question is being asked.” If the end user is on their mobile, they are more likely to ask a question that generates instant gratification, whereas, if they are sitting at a desk looking at a dashboard, they’re probably looking to explore and examine a deeper question.

The biggest analytics gains will come from understanding the diverse workflows that NLP can augment. As Vidya Setlur, Staff Software Engineer with Tableau also puts it, “Ambiguity is a hard problem,” so understanding workflows becomes more important than the input of a specific question. When there are multiple ways of asking the same question of the data (e.g. “What sales rep had the most sales this quarter?” or “Who had the most sales this quarter?”), the end user doesn’t wish to think about the “right” way to ask it, they just want the answer.

Consequently, the opportunity will arise not from placing NLP in every situation, but making it available in the right workflows so it becomes second nature to the person using it.


By 2019, 75% of workers whose daily tasks involve the use of enterprise applications will have access to intelligent personal assistants to augment their skills and expertise. (IDC)


By 2021, more than 50% of enterprises will be spending more per annum on bots and chatbot creation than traditional mobile app development. (Gartner)

4.- The Debate for Multi-Cloud Rages On

If your organization is exploring and evaluating a multi-cloud strategy in 2018, you’re not alone.

“There’s a stampede of organizations moving their data to the cloud and moving their core applications,” said Chief Product Officer Francois Ajenstat. “And whether it’s a ‘lift and shift’ or a re-platforming, we see customers adopting the cloud at a much faster rate than ever.”

According to a recent Gartner study, “a multi-cloud strategy will become the common strategy for 70 percent of enterprises by 2019, up from less than 10 percent today.” Customers are growing sensitive about being locked into a single legacy software solution that doesn’t match their future needs. However, switch and migrations have become relatively easier with similar APIs and the use of open standards like Linux, Postgres, MySQL, and others.

It’s likely your organization is also evaluating how data centers are designed and run. Your IT department is evaluating hosting environments based on risk, complexity, speed and cost—all factors that increase the difficulty in finding one, single solution for your organization’s needs.

Evaluating and implementing a multi-cloud environment can help determine who provides the best performance and support for your situation. According to the Boston Herald, GE re-aligned its cloud hosting strategy to leverage both Microsoft Azure and Amazon Web Services, with the intention to understand the best performing hosting environment and see which contract provides the lowest cost to pass to their customers.

This multi-cloud or hybrid cloud strategy is becoming increasingly important to help reduce risk and provide more choice and flexibility for customers.

But the multi-cloud trend doesn’t come without a healthy awareness of the merits and challenges of moving to this type of environment. While flexibility is a plus, a multi-cloud environment increases overhead cost from splitting your organization’s workloads across multiple providers. And a multi-cloud environment forces an internal developer team to learn multiple platforms and have additional governance processes in place, depending on the different environments they have to support.

Additionally, a multi-cloud strategy could potentially diminish the buying power of a company or organization. If a company is splitting what they buy across multiple providers, it will hurt their volume discounts. This creates a model where a company is buying less at a worse price.

Surveys and stats, such as the Gartner data-point above, indicate multi-cloud adoption is on the rise. However, it doesn’t indicate how much of a given platform was adopted. In many multi-cloud cases, organizations are using one provider for most of their needs and very little for others. But most of these use cases fall on implementing a second cloud hosting environment as a backup in case of incompetency or failure of the main cloud hosting environment.

While the rise of multi-cloud adoption in 2018 is on the rise, organizations will have to maneuver through the nuance of assessing whether their strategy measures how much of each cloud platform was adopted, internal usage, and the workload demands and implementation costs.


70% of enterprises will be implementing a multi-cloud strategy by 2019. (Gartner)


74% of Tech Chief Financial Officers say cloud computing will have the most measurable impact on their business in 2017. (Forbes)

5.- Rise of the Chief Data Officer

Data and analytics are becoming core to every organization. That is undebatable. As organizations evolve, they’re prioritizing a new level of strategic focus and accountability regarding their analytics.

Historically, most business intelligence efforts were assigned to the Chief Information Officer (CIO), who oversaw standardizing, consolidating, and governing data assets across the organization, which needed consistent reporting. This put BI initiatives (data governance, building analytical models, etc.) in competition with other strategic initiatives (such as IT architecture, system security, or network strategy) under the purview of the CIO—and often inhibited the success and impact of BI.

In some cases, a gap between the CIO and the business has formed due to speed to insight versus security and governance of the data. So to derive actionable insights from data through analytics investments, organizations are increasingly realizing the need for accountability in the C-Suite to create a culture of analytics. For a growing number of organizations, the answer is appointing a Chief Data Officer (CDO) or Chief Analytics Officer (CAO) to lead business process change, overcome cultural barriers, and communicate the value of analytics at all levels of the organization. This allows the CIO to have a more strategic focus on things such as data security.

My job is to bring tools and technologies and empower the team.

The fact that CDO’s and/or CAO’s are being appointed and assigned accountability for business impact and improved outcomes, also demonstrates the strategic value of data and analytics in modern organizations. There is now a proactive conversation at the C-level about how to deploy an analytics strategy. Instead of waiting for requests for a particular report, CDO’s are asking, “How can we anticipate or quickly adapt to business requests?”

To best facilitate a highly effective team under this C-level position, organizations are dedicating more money and resources. According to Gartner, 80 percent of large enterprises will have a CDO office fully implemented by 2020. Currently, the average number of employees in the office is 38, but 66 percent of organizations surveyed expect that the allocated budget for the office will grow.

Josh Parenteau, Tableau’s Market Intelligence Director, notes that the role of the CDO is “outcome focused.” He states that “it’s not just about putting data into a data warehouse and hopefully someone uses it—they’re there to define what the use is and make sure that you’re getting value.” This outcome focus is critical, especially as it aligns with the top three objectives in Gartner’s 2016 CDO survey, which include greater customer intimacy, an increased competitive advantage, and an improvement in efficiency. These objectives are fueling companies like Wells Fargo, IBM, Aetna, and Ancestry to implement CDOs with the intent to take their data strategy to the next level, making the role of Chief Data Officer a business staple in 2018.


By 2019, 90% of large companies will have a CDO role in place. (Gartner)


By 2020, 50% of leading organizations will have a CDO with similar levels of strategy influence and authority as their CIO.

6.- The Future of Data Governance is Crowdsourced

The modern business intelligence outfit has progressed from data and content lockdowns to the empowerment of business users everywhere to use trusted, governed data for insights. And as people are learning to use data in more situations, their input on better governance models has become a monumental force within organizations.

It’s an understatement to say that self-service analytics has disrupted the world of business intelligence. The paradigm shifted to anyone having the capacity to create analytics leading to the asking and answering of critical questions across the organization. The same disruption is happening with governance. As self-service analytics expands, a funnel of valuable perspectives and information begins to inspire new and innovative ways to implement governance.

Governance is as much about using the wisdom of the crowd to get the right data to the right person as it is locking down the data from the wrong person.

Governance is as much about using the wisdom of the crowd to get the right data to the right person as it is locking down the data from the wrong person.

For the business user, the last responsibility they want is the security of the data. Good governance policies allow the business user to ask and answer questions, while allowing them to find the data they need, when they need it.

BI and analytics strategies will embrace the modern governance model: IT departments and data engineers will curate and prepare trusted data sources, and as self-service is mainstreamed, end users will have the freedom to explore data that is trusted and secure. Top-down processes that only address IT control will be discarded in favor of a collaborative development process combining the talents of IT and end users. Together, they will identify the data that is most important to govern and create rules and processes that maximize the business value of analytics without compromising security.


45% of data citizens say that less than half of their reports have good quality data. (Collibra)


61% of C/V Suite leaders say their own companies’ decision-making is only somewhat or rarely data driven. (PwC)

7.- Vulnerability Leads to a Rise in Data Insurance

For many companies, data is a critical business asset. But how do you measure the value of that data? And what happens when that data is lost or stolen? As we have seen with recent high profile data breaches, a threat to a company’s data can be crippling and potentially cause irreparable damage to the brand.

According to a 2017 study by the Ponemon Institute, the average total cost of a data breach was estimated at $3.62 million.

But are companies doing everything they can to protect and insure their data? One industry rapidly growing in response to data breaches is the cybersecurity insurance market. This industry has seen 30 percent year-over-year growth, with the industry set to reach $5.6 billion in annual gross written premium by 2020. (AON)

Cyber and privacy insurance covers a business’ liability for a data breach in which the customer’s personal information is exposed or stolen by a hacker.

However, even with the market’s growth and the continued threat of data breaches, only 15 percent of U.S. companies have an insurance policy that covers data breaches and cybersecurity. Furthermore, when you look at those 15 percent of U.S. companies covered, a majority come from large, established financial institutions.

You have to decide where the pain point is. What is the real risk to your business?.

The need for policies with financial institutions is clear. But the trend will broaden to other verticals because nobody is immune to the threat of a data breach.

Doug Laney, Gartner Analyst, recently wrote a book titled, “Infonomics: How to Monetize, Manage, and Measure Information for Competitive Advantage.” He gives distinct models on how companies across all industries can review the value of their data, both in non-financial models and financial models.

Non-financial models focus on the intrinsic value, the business value, and the performance value of the data. These values can measure a company’s uniqueness, accuracy, relevancy, internal efficiencies and overall impact on its usage.

Financial models focus on the cost value, the economic value, and the market value of the data. These values can measure the cost of acquiring data, administering the data internally, and the value of selling or licensing your data.

Data as a commodity means its value will only increase, and ultimately drive new questions and conversations around how this raw material will continue to project companies to greater heights and advantages. And like any product, what good is it if it can be pilfered without consequence?

The average total cost of a data breach was estimated at $3.62 million. (Ponemon)


Only 15% of US companies have an insurance policy specifically for their data. (Ponemon)

8.- Increased prominence of the data engineer role

Here is a certainty: you can’t create a dashboard without having all of your charts built out so you can understand the story you’re trying to communicate. Another principle you likely know: you can’t have a reliable data source without first understanding the type of data that goes into a system and how to get it out.

Data engineers will continue to be an integral part of an organization’s movement to use data to make better decisions about their business. Between 2013 and 2015, the number of data engineers more than doubled. And as of October 2017, there were over 2,500 open positions with “data engineer” in the title on LinkedIn, indicating the growing and continued demand for this specialty.

Data engineers play a fundamental part in enabling self-service for the modern analytics platform.

 So what is this role and why is it so important? The data engineer is responsible for designing, building, and managing a business’s operational and analytics databases. In other words, they are responsible for extracting data from the foundational systems of the business in a way that can be used and leveraged to make insights and decisions. As the rate of data and storage capacity increases, someone with deep technical knowledge of the different systems, architecture, and the ability to understand what the business wants or needs starts to become ever more crucial.

Yet, the data engineer role requires a unique skillset. They need to understand the backend, what’s in the data, and how it can serve the business user. The data engineer also needs to develop technical solutions to make the data is usable.

In the words of Michael Ashe, Senior Recruiter for Tableau, “I’m no spring chicken. I’ve been in technical recruiting for over 17 years. And it’s no surprise that data and storage capacity has continued to grow—I’ve seen it happen in quantum leaps. The data will always need tweaking. Businesses need to plug into this role. They need to dive into specific data to make business decisions. The data engineer most definitely will continue to grow as a role.”


A 2016 Gartner study found respondent organizations were losing an average of $9.7 million annually as a result of poor data quality.


Data scientists and analysts can spend as much as 80% of their time cleaning and preparing data. (TechRepublic)

9.- The Location of Things will Drive IoT Innovation

It’s an understatement to say that the proliferation of the internet of things (IoT) has driven monumental growth in the number of connected devices we see in the world. All of these devices interact with each and capture data that is making a more connected experience. In fact, Gartner predicts that by 2020 the number of IoT devices available to consumers will more than double “with 20.4 billion IoT devices online.”

Even with this growth, the use cases and implementation of IoT data hasn’t followed the same desirable path. Companies have concerns about security, but most don’t have the right organizational skill sets or the internal technical infrastructure with other applications and platforms to support IoT data.

When most people think location or geospatial, they think of it as a dimension. It’s something I’m going to analyze…the new trend is that it is becoming an input into the analytical process.

 One positive trend we are seeing is the usage and benefits of leveraging location-based data with IoT devices. This subcategory, termed “location of things,” provides IoT devices with sensing and communicates their geographic position. By knowing where an IoT device is located, it allows us to add context, better understand what is happening and what we predict will happen in a specific location.

For companies and organizations seeking to capture this data collection, we are seeing different technologies being used. For example, hospitals, stores, and hotels have begun to use Bluetooth Low Energy (BLE) technology for indoor location services, which were typically difficult for GPS to provide contextual location. The technology can be used to track specific assets, people and even interact with mobile devices like smartwatches, badges or tags in order to provide personalized experiences.

As it relates to analyzing the data, location-based figures can be viewed as an input versus an output of results. If the data is available, analysts can incorporate this information with their analysis to better understand what is happening, where it is happening, and what they should expect to happen in a contextual area.


IoT endpoints will grow to 30 billion by 2020. (IDC)


Explosive growth of IoT is expected, exceeding more than $5 billion by year-end 2020. (Gartner)

10.- Universities Double Down on Data Science & Analytics Programs

North Carolina State University is home to the first Master of Science Analytics program. The MSA is housed within their Institute of Advanced Analytics (IAA), a data hub with the mission to “produce the world’s finest analytics practitioners—individuals who have mastered complex methods and tools for large-scale data modeling [and] who have a passion for solving challenging problems…” As the first of its type, the NC State program has foreshadowed academia’s pronounced investment in data science and analytics curriculum.

Earlier this year, the University of California, San Diego launched a first for their institution—an undergraduate major and minor in data science. They didn’t stop there. The university also made plans, supercharged by an alumnus donation, to create a data science institute. Following suit, UC Berkeley, UC Davis, and UC Santa Cruz have all increased their data science and analytics options for students, with demand exceeding expectations. But why?

I’m constantly surprised by what the students come up with, and blown away with how they’re able to just intuitively look at the data and play with the data and come up with some visualizations.

 According to a recent PwC study, 69 percent of employers by the year 2021 will demand data science and analytics skills from job candidates. In 2017, Glassdoor also reported that “data science,” for the second consecutive year, was a “top job.” As demand from employers grows, the urgency to fill a funnel of highly-skilled data fiends becomes more critical. But there’s a reality gap. The same PwC report cites that only 23 percent of college graduates will have the necessary skills to compete at the level employers demand. A recent MIT survey found that 40 percent of managers are having trouble hiring analytical talent.

The hard skills of analytics are no longer an elective; they are a mandate. 2018 will begin to see a more rigorous approach to making sure students possess the skills to join the modern workforce. And as companies continue to refine their data to extract the most value, the demand for a highly data-savvy workforce will exist — and grow.

Source: https://www.tableau.com/sites/default/files/pages/838266_2018_bi_trends_whitepaper_1.pdf

How to become a Professional Data Scientist?



They apply advanced math and statistics to build the technical cases around the hypotheses that the business analysts build. Data scientists are tasked with building the models required to test these theories. This model is important to big data. You start with a hypothesis. For example, if we change the branding colors on a product on a given day and publish that on Twitter and it is positively received, we can expect an increase in sales of 4 percent. That is the hypothesis.

Create the mathematical models. These models measure what positive sentiment means and then can model what tests need to be run to find correlations between that and price increases.

Discover patterns, trends, and correlations. Some tasks may not necessarily start with a hypothesis. This is where the real power of big data comes in. You find patterns and trends you didn’t even know existed.

The skill required here is to take a business idea and model it with numbers and data. Data scientists take that data and turn it into information. There can be a fine line between what data scientists do and what computer scientists do. There are some overlaps, but there are also jobs with a significant difference, namely in scientific and academic research.



Assessing your interest

As with the business analysts, there are a set of questions you can ask yourself to see if you’re a fit for this type of job. So, you should carefully consider the following questions.

Are you naturally inquisitive?

Just as a business analyst needs to think in terms of building hypotheses, the data scientist needs to have aptitude in this area. Computer scientists need to be able to construct models that can prove or disprove a given business hypothesis. Can you see beyond the surface issues and go deep? Do you know when a result has potential and needs further testing? Are you passionate about technology?

Can you focus for a long time?

The journey required to complete a PhD or advanced degree in the big data field can be a long one. You have to commit a significant amount of study to a specific area of research. Are there areas of math, statistics, or computer science that you have a passion for studying? Do you want to address big problems that may take years to solve? Do you like to write . . . a lot? Can you maintain intense focus on a few topics for many years — maybe for an entire career?

Are you self-motivated?

Data scientists need to be able to direct their own intellectual paths. Do you naturally follow a solution to its end? Do you have a knack for knowing where to find answers if you don’t know them?

Are you multidisciplined?

Data scientists need to be knowledgeable in multiple areas — math, statistics, and computer science. Can you pick up computer science languages and concepts easily? Does the idea of a new language excite you or intimidate you? Can you easily collaborate with others to learn new things?

Idea to reality

Data modeling requires the ability to take business concepts and ideas and model those within a world driven by numbers and data concepts. Do you have the aptitude or interest to build experiments that capture the business value?

Responsibilities and skills

Key responsibilities:

  • Providing big data solutions for our clients, including analytical consulting, statistical modeling, and quantitative solutions
  • Mentoring sophisticated organizations on large-scale data and analytics and working closely with client teams to deliver results
  • Helping to translate business cases to clear research projects, be they exploratory or confirmatory, to help our clients utilize data to drive their businesses
  • Collaborating and communicating across geographically distributed teams and with external clients

Required skills/experience include:

  • BS or MS in Computer Science, Math, or equivalent work experience
  • Coursework in mathematics, statistics, machine learning, and data mining
  • Proficiency in R or other math packages (Matlab, SAS, and so on)
  • Excellent programming skills in object-oriented languages
  • Adept at learning and applying new technologies
  • Excellent verbal and written communication skills
  • Strong team player capable of working in a demanding startup environment
  • Experience with Java and Python

You don’t have to have a PhD to be a data scientist, but requires the candidate to have deep understanding of data modeling, programming, machine learning, and math.

Source: https://www.edvancer.in/what-it-really-takes-to-become-a-professional-data-scientist/

What is Data Quality?



Data Quality is an essential characteristic that determines the reliability of data for making decisions. Data quality help you identify revenue opportunities, meet regulatory compliance requirements and respond to customer issues in a timely manner.

We’ve all heard of the many horrors of poor Data Quality. Companies with millions of records with “(000)000-0000” as customer contact numbers, “99/99/99” as date of purchase, 12 different gender values, shipping addresses with no state information etc. The cost of “dirty data” to enterprise and organizations is real. For example, the US Postal Service estimated that it spent $1.5 Billion in processing undeliverable mail in 2013 because of bad data. The sources of poor Data Quality can be many but can be broadly categorized into data entry, data processing, data integration, data conversion, and stale data (over time).


So what can you do to make sure that your data is consistently of high quality? There is increasing awareness of the criticality of data to making informed decisions and how inaccurate data can lead to disastrous consequences.

Data Quality determines the reliability of data for making decisions.

The challenge lies in ensuring that enterprises collect/source relevant data for their business, manage/govern that data in a meaningful and sustainable way to ensure quality golden records for key Master Data, and analyze the high-quality data to accomplish stated business objectives. Here is the 6-step Data Quality based on the best practices from data quality experts.

Step 1 – Definition

Define the business goals for Data Quality improvement, data owners / stakeholders, impacted business processes, and data rules.

  • Examples for customer data:

    • Goal: Ensure all customer records are unique, accurate information (ex: address, phone numbers etc.), consistent data across multiple systems, etc.
    • Data owner: Sales Vice President
    • Stakeholders: Finance, Marketing, and Production
    • Impacted business processes: Order entry, Invoicing, Fulfillment etc.
    • Data Rules: Rule 1 – Customer name and Address together should be unique; Rule 2: All addresses should be verified against an approved address reference database etc.

Step 2 – Assessment

Assess the existing data against rules specified in Definition Step. Assess data against multiple dimensions such as accuracy of key attributes, completeness of all required attributes, consistency of attributes across multiple data sets, timeliness of data etc. Depending on the volume and variety of data and the scope of Data Quality project in each enterprise, we might perform qualitative and/or quantitative assessment using some profiling tools. This is the stage to assess existing policies (data access, data security, adherence to specific industry standards/guidelines etc.) as well.

  • Examples: Assess %of customer records that are unique (with name and address together); % of non-null values in key attributes etc.

Step 3 – Analysis

Analyze the assessment results on multiple fronts. One area to analyze is the gap between DQ business goals and current data. Another area to analyze is the root causes for inferior data quality (if that is the case).

  • Examples: If customer addresses are inaccurate by more than the business defined goal, what is the root cause? Is the order entry application data validations the problem? Or the reference address data inaccurate?

Step 4 – Improvement

Design and develop improvement plans based on prior analysis. The plans should comprehend timeframes, resources, and costs involved.

  • Examples: All applications modifying addresses must validate against selected address reference database; Customer name can only be modified via order entry application; The intended changes to systems will take 6 months to implement and requires XYZ resources and $$$.

Step 5 – Implementation

Implement solutions determined in the Improve stage. Comprehend both technical as well as any business process related changes. Implement a comprehensive ‘Change Management’ plan to ensure that all stakeholders are appropriately trained.

Step 6 – Control

Verify at periodic intervals that the data is consistent with the business goals and the data rules specified in the Definition Step. Communicate the Data Quality metrics and current status to all stakeholders on a regular basis to ensure that Data Quality discipline is maintained on an ongoing basis across the organization.

Data Quality is not a onetime project but a continuous process and requires the entire organization to be data-driven and data-focused. With appropriate focus from the top, Data Quality Management can reap rich dividends to organizations.

Source: https://digitaltransformationpro.com/data-quality-simple-6-step-process/

The 2017 Big Data Landscape



Observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle.  As a result, it took several years for Big Data to evolve from cool new technologies to core enterprise systems actually deployed in production.

In 2017, we’re now well into this deployment phase.  The term “Big Data” continues to gradually fade away, but the Big Data space itself is booming.  We’re seeing everywhere anecdotal evidence pointing to more mature products, more substantial adoption in Fortune 1000 companies, and rapid revenue growth for many startups.

Meanwhile, the froth has indisputably moved to the machine learning and artificial intelligence side of the ecosystem. AI experienced in the last few months a “Big Bang” in collective consciousness not entirely dissimilar to the excitement around Big Data a few years ago, except with even more velocity.

2017 is also shaping up to be an exciting year from another perspective: long-awaited IPOs.  The first few months of this year have seen a burst of activity for Big Data startups on that front, with warm reception from the public markets.

All in all, in 2017 the data ecosystem is firing on all cylinders.  As every year, we’ll use the annual revision of our Big Data Landscape to do a long-form, “State of the Union” roundup of the key trends we’re seeing in the industry.

“Big Data + AI = The New Stack”

The 2016 was the year when every startup became a “machine learning company”, “.ai” became the must-have domain name, and the “wait, but we do this with machine learning” slide became ubiquitous in fundraising decks.

Faced with an enormous avalanche of AI press, panels, newsletters and tweets, many people who had a long standing interest in machine learning reacted the way one does when your local band suddenly becomes huge: on the one hand, pride; on the other hand, a distinct distaste for all the poseurs who show up late to the party, with ensuing predictions of impending gloom.

While it’s easy to poke gentle fun at the trend, the evolution is both undeniable and major: machine learning is quickly becoming a key building block for many applications.

We’re witnessing the emergence of a new stack, where Big Data technologies are used to handle core data engineering challenges, and machine learning is used to extract value from the data (in the form of analytical insights, or actions).

In other words: Big Data provides the pipes, and AI provides the smarts.

Of course, this symbiotic relationship has existed for years, but its implementation was only available to a privileged few.

The democratization of those technologies has now started in earnest.  “Big Data + AI” is becoming the default stack upon which many modern applications (whether targeting consumers or enterprise) are being built.  Both startups and some Fortune 1000 companies are leveraging this new stack.

Often, but not always, the cloud is the third leg of the stool. This trend is precipitated by all the efforts of the cloud giants, who are now in an open war to provide access to a machine learning cloud (more on this below).

Does democratization of AI mean commoditization in the short term? The reality is that AI remains technically very hard.  While many engineers are scrambling to build AI skills, deep domain experts are, as of now, still in very rare supply around the world.

However, there is no reversing this democratization trend, and machine learning is going to evolve from competitive advantage to table stakes sooner or later.

This has consequences both for startups and large companies. For startups: unless you’re building AI software as your final product, it’s quickly going to become meaningless to present yourself as a “machine learning company”.  For large organizations: if you’re not actively building a Big Data + AI strategy at this point (either homegrown or by partnering with vendors), you’re exposing yourself to obsolescence.  People have been saying this for years about Big Data, but with AI now running on top of it, things are accelerating in earnest.

Enterprise Budgets: Follow the Money

In our conversations with both buyers and vendors of Big Data technologies over the last year, we’re seeing a strong increase in budgetsallocated to upgrading core infrastructure and analytics in Fortune 1000 companies, with a key focus on Big Data technologies.  Analyst firms seem to concur – IDC expects the Big Data and Analytics market to grow from $130 billion in 2016 to more than $203 billion in 2020.

Many buyers in Fortune 1000 companies are increasingly sophisticated and discerning when it comes to Big Data technologies.  They have done a lot of homework over the last few years, and are now in full deployment mode.  This is now true across many industries, not just the more technology-oriented ones.

This acceleration is further propelled by the natural cycle of replacement of older technologies, which happens every few years in large enterprises.  What was previously a headwind for Big Data technologies (hard to rip and replace existing infrastructure) is now gradually turning into a tailwind (“we need to replace aging technologies, what’s best in class out there?”).

Certainly, many large companies (“late majority”) are still early in their Big Data efforts, but things now seem to be evolving quickly.

 Enterprise Data moving to the Cloud

As recently as a couple of years ago, suggestions that enterprise data could be moving to the public cloud were met with “over my dead body” reactions from large enterprise CIOs, except perhaps as a development environment or to host the odd non-critical, external-facing application.

The tone seems to have started to change, most noticeably in the last year or so.  We’re hearing a lot more openness – a gradual acknowledgement that “our customer data is already in the cloud in Salesforce anyway” or that “we’ll never have the same type of cyber-security budget as AWS does” – somewhat ironic considering that security was for many years the major strike against the cloud, but a testament to all the hard work that cloud vendors have put into security and compliance (HIPAA).

Undoubtedly, we’re still far from a situation where most enterprise data goes to the public cloud, in part because of legacy systems and regulation.

However, the evolution is noticeable, and will keep accelerating.  Cloud vendors will do anything to facilitate it, including sending a truck to get your data.


Without further ado, here’s our 2017 landscape. (click to enlarge) 

Source: http://mattturck.com/bigdata2017/

What is a Graph Database?



We live in a connected world. There are no isolated pieces of information, but rich, connected domains all around us. Only a database that embraces relationships as a core aspect of its data model is able to store, process, and query connections efficiently. While other databases compute relationships expensively at query time, a graph database stores connections as first class citizens, readily available for any “join-like” navigation operation. Accessing those already persistent connections is an efficient, constant-time operation and allows you to quickly traverse millions of connections per second per core.

Independent of the total size of your dataset, graph databases excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting points — collecting and aggregating information from millions of nodes and relationships — leaving the billions outside the search perimeter untouched.

The Property Graph Model

If you’ve ever worked with an object model or an entity relationship diagram, the labeled property graph model will seem familiar. The property graph contains connected entities (the nodes) which can hold any number of attributes (key-value-pairs). Nodes can be tagged with labels representing their different roles in your domain. In addition to contextualizing node and relationship properties, labels may also serve to attach metadata—​index or constraint information—​to certain nodes.

Relationships provide directed, named semantically relevant connections between two node-entities. A relationship always has a direction, a type, a start node, and an end node. Like nodes, relationships can have any properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. As relationships are stored efficiently, two nodes can share any number or type of relationships without sacrificing performance. Note that although they are directed, relationships can always be navigated regardless of direction.

The building blocks of the Property Graph

There is one core consistent rule in a graph database: “No broken links”. Since a relationship always has a start and end node, you can’t delete a node without also deleting its associated relationships. You can also always assume that an existing relationship will never point to a non-existing endpoint.

Source: https://neo4j.com/developer/graph-database/

SQL vs NoSQL: High-Level Differences



Most of you are already familiar with SQL database, and have a good knowledge on either MySQL, Oracle, or other SQL databases. In the last several years, NoSQL database is getting widely adopted to solve various business problems.

It is helpful to understand the difference between SQL and NoSQL database, and some of available NoSQL database that you can play around with.



  • SQL databases are primarily called as Relational Databases (RDBMS); whereas NoSQL database are primarily called as non-relational or distributed database.
  • SQL databases are table based databases whereas NoSQL databases are document based, key-value pairs, graph databases or wide-column stores. This means that SQL databases represent data in form of tables which consists of n number of rows of data whereas NoSQL databases are the collection of key-value pair, documents, graph databases or wide-column stores which do not have standard schema definitions which it needs to adhered to.
  • SQL databases have predefined schema whereas NoSQL databases have dynamic schema for unstructured data.
  • SQL databases are vertically scalable whereas the NoSQL databases are horizontally scalable. SQL databases are scaled by increasing the horse-power of the hardware. NoSQL databases are scaled by increasing the databases servers in the pool of resources to reduce the load.
  • SQL databases uses SQL ( structured query language ) for defining and manipulating the data, which is very powerful. In NoSQL database, queries are focused on collection of documents. Sometimes it is also called as UnQL (Unstructured Query Language). The syntax of using UnQL varies from database to database.
  • SQL database examples: MySql, Oracle, Sqlite, Postgres and MS-SQL. NoSQL database examples: MongoDB, BigTable, Redis, RavenDb, Cassandra, Hbase, Neo4j and CouchDb
  • For complex queries: SQL databases are good fit for the complex query intensive environment whereas NoSQL databases are not good fit for complex queries. On a high-level, NoSQL don’t have standard interfaces to perform complex queries, and the queries themselves in NoSQL are not as powerful as SQL query language.
  • For the type of data to be stored: SQL databases are not best fit for hierarchical data storage. But, NoSQL database fits better for the hierarchical data storage as it follows the key-value pair way of storing data similar to JSON data. NoSQL database are highly preferred for large data set (i.e for big data). Hbase is an example for this purpose.
  • For scalability: In most typical situations, SQL databases are vertically scalable. You can manage increasing load by increasing the CPU, RAM, SSD, etc, on a single server. On the other hand, NoSQL databases are horizontally scalable. You can just add few more servers easily in your NoSQL database infrastructure to handle the large traffic.
  • For high transactional based application: SQL databases are best fit for heavy duty transactional type applications, as it is more stable and promises the atomicity as well as integrity of the data. While you can use NoSQL for transactions purpose, it is still not comparable and sable enough in high load and for complex transactional applications.
  • For support: Excellent support are available for all SQL database from their vendors. There are also lot of independent consultations who can help you with SQL database for a very large scale deployments. For some NoSQL database you still have to rely on community support, and only limited outside experts are available for you to setup and deploy your large scale NoSQL deployments.
  • For properties: SQL databases emphasizes on ACID properties ( Atomicity, Consistency, Isolation and Durability) whereas the NoSQL database follows the Brewers CAP theorem ( Consistency, Availability and Partition tolerance )
  • For DB types: On a high-level, we can classify SQL databases as either open-source or close-sourced from commercial vendors. NoSQL databases can be classified on the basis of way of storing data as graph databases, key-value store databases, document store databases, column store database and XML databases.

Source: http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/

Do you need a Relational Databases for Big Data ?



Teradata, Greenplum, Netezza, DB2, Oracle’s Exadata aren’t “Big Data” databases, as defined by meaning databases that are routinely used to handle large data sets that are unstructured, rapidly changing and usually with little or vague quality measures for the data.

These databases are relational databases, and all have their strenghts and weaknesses. Again, best is going to be determined by what you are using the database for, the entire ecosystem that the database will live in, and how well it can be maintained and managed.

Database wise for “Big Data” (I hate that term btw – its nothing but marketing fluff), why do you need a database? Databases are great for organizing data into rows and columns – something that the data usually referenced in “Big Data” doesn’t do naturally or well. In fact, if you’re using a database to store “Big Data” then you aren’t really doing “Big Data”.

In a Big Data approach, what you should use is an HDFS system to store data within. Then if you do also need some database functionality, NoSQL, like Mongo or ACID might be appropriate, but again with NoSQL, it depends on what type of NoSQL Database you want to build – Document, Key value, table style, or graph databases.

BTW – Teradata, IBM (Netezza/Pure Systems), Oracle…all have HDFS appliances that use some form of Apache Hadoop. Its usually commodity hardware or purpose built hardware, so the “best” is really going to be dependent upon the ancillary databases around it.