Data Science for Beginners : The 5 basic questions

2148

INTRODUCTION

Get a quick introduction to data science if you’re interested in doing data science – or work with people who do data science – and you want to start with the most basic concepts.

In these video, Senior Data Scientist Brandon Rohrer explains the kinds of questions that data science can answer to questions using a number or category.

THE FIVE QUESTIONS

  • Is this A or B?
  • Is this weird?
  • How much – or – How many?
  • How is this organized?
  • What should I do next?

Each one of these questions is answered by a separate family of machine learning methods, called algorithms.

It’s helpful to think about an algorithm as a recipe and your data as the ingredients. An algorithm tells how to combine and mix the data in order to get an answer. Computers are like a blender. They do most of the hard work of the algorithm for you and they do it pretty fast.

1.- Is this A or B? – Classification Algorithms

Let’s start with the question: Is this A or B?

This family of algorithms is called two-class classification.It’s useful for any question that has just two possible answers.

For example:

  • Will this tire fail in the next 1,000 miles: Yes or no?
  • Which brings in more customers: a $5 coupon or a 25% discount?

These question can also be rephrased to include more than two options: Is this A or B or C or D, etc.? This is called multiclass classification and it’s useful when you have several—or several thousand—possible answers. Multiclass classification chooses the most likely one.

 

2.- Is this weird? – Anomaly detection Algorithms

The next question data science can answer is: Is this weird? This question is answered by a family of algorithms called anomaly detection.

If you have a credit card, you’ve already benefitted from anomaly detection. Your credit card company analyzes your purchase patterns, so that they can alert you to possible fraud. Charges that are “weird” might be a purchase at a store where you don’t normally shop or buying an unusually pricey item.

This question can be useful in lots of ways. For instance:

  • If you have a car with pressure gauges, you might want to know: Is this pressure gauge reading normal?
  • If you’re monitoring the internet you’d want to know: Is this message from the internet typical?

Anomaly detection flags unexpected or unusual events or behaviors. It gives clues where to look for problems.

3.- How Much? How Many? – Regression Algorithms

Machine learning can also predict the answer to How much? or How many? The algorithm family that answers this question is called regression.

Regression algorithms make numerical predictions, such as:

  • What will the temperature be next Tuesday?
  • What will my fourth quarter sales be?

They help answer any question that can asks for a number.

4.- How is this Organized? – Clustering Algorithms

These last two questions are a bit more advanced.

Sometimes you want to understand the structure of a data set – How is this organized? For this question, you don’t have examples that you already know outcomes for.

There are a lot of ways to tease out the structure of data. One approach is clustering. It separates data into natural “clumps,” for easier interpretation. With clustering there is no one right answer.

Common examples of clustering questions are:

  • Which viewers like the same types of movies?
  • Which printer models fail the same way?

By understanding how data is organized, you can better understand – and predict – behaviors and events.

5.- What Should I do? – Reinforcement Learning Algorithms

The last question – What should I do now? – uses a family of algorithms called reinforcement learning.

Reinforcement learning was inspired by how the brains of rats and humans respond to punishment and rewards. These algorithms learn from outcomes, and decide on the next action.

Typically, reinforcement learning is a good fit for automated systems that have to make lots of small decisions without human guidance.

Reinforcement Learning algorithms: What should I do next?

Questions it answers are always about what action should be taken – usually by a machine or a robot. Examples are:

  • If I’m a temperature control system for a house: Adjust the temperature or leave it where it is?
  • If I’m a self-driving car: At a yellow light, brake or accelerate?
  • For a robot vacuum: Keep vacuuming, or go back to the charging station?

Reinforcement learning algorithms gather data as they go, learning from trial and error.

CONCLUSION

So that’s it – The 5 questions data science can answer.

Source: Channel MSDN