You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Machine Learning 101

By   /  November 2, 2018  /  No Comments

Click to learn more about author Steve MacLauchlan.

By now you’re probably well aware that Big Data and Artificial Intelligence are major disruptors in almost every single vertical. Understanding the landscape can be challenging, particularly for business customers who want to innovate but aren’t sure where to start. In today’s blog, I hope to leave you, dear readers, with a basic understanding of both how Machine Learning works and how it might be beneficial for your organization.

First, let’s discuss where Machine Learning lives in the Big Data world. Forgive the Wikipedia Quote, but it’s a good summary: “Machine Learning is a subset of Artificial Intelligence in the field of computer science that often uses statistical techniques to give computers the ability to ‘learn’ (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.” Machine Learning is a method to devise (or derive) complex models and algorithms which can be applied to a set of data to perform a specific task.

Figure 1: Original image by Dahl Winters, 2015


Machine Learning is a subset of Artificial Intelligence, and itself an umbrella for a myriad of approaches (examples you may have heard of are Artificial Neural Networks, Genetic Algorithms, Decisions Trees, etc.).

At a high level, Machine Learning operates by utilizing data to search for patterns within that data to create a model which can accurately predict an outcome. Over time, given sufficient data for the complexity of the task, an emergent model is developed via complex mathematical and statistical optimizations which essentially identifies relationships within the data that may not have been immediately recognizable to a human given the volume or complexity of the data.

Hypothetical Example

Consider an organization who provides elderly care. We can assume they have sufficient data about the daily lives, interactions, purchase history, etc. of a senior under their supervision. Now, let’s assume that this organization wants to minimize the impact of the flu virus in their patients. Without Machine Learning, we know the flu symptoms… but the problem is by the time symptoms show, it’s already too late! We want to be able to look for patterns in the behavior, environment and demographics of a person to identify when they are at high risk for catching the flu and alter our approach as a result (e.g., targeted literature to inform them of their risk). We can take a volume of data and feed it into a Machine Learning algorithm to look for emergent patterns that may not have been initially recognized by a human. Perhaps we discover that elders who live in the same city as their children are at increased risk (grandchildren are germ factories), or those who regularly play bingo on Thursday nights are, strangely, at lower risk! These patterns are the result of Machine Learning. We will revisit this example to understand how we got there.

There are two approaches to training a Machine Learning algorithm: supervised learning and unsupervised learning.

In supervised learning, the data provided to train the algorithm contains both the inputs and the outputs, or outcomes of that data. In this way, the algorithm receives “feedback” by being able to compare its results with the actual results. Supervised learning represents the majority of Machine Learning. It’s ideal for most applications, but organizations don’t always have the luxury of a dataset that includes the expected results.

Unsupervised learning, on the contrary, usually has a vastly different outcome to supervised learning. Rather than being able to answer a specific question, unsupervised learning is frequently only capable of offering insights into the data. It may identify patterns and associations, as well as clusters of data which may offer insights, but is unlikely to be able to perform a specific task.

Revisiting Our Example

If our organization has years of data on their patients, as well as whether or not they caught the flu in a specific season, we have a good baseline for performing supervised learning. If we were unable to tie the outcomes to the specific question, our data would be unfit to solve the problem, and we may instead use Machine Learning to search for interesting patterns in the data which may lead to different hypotheses as a result.

Machine Learning: Competitive Advantage

Your data is a valuable corporate asset and applying Machine Learning is one way of extracting business value from that virtual goldmine. Many data-driven organizations are utilizing these capabilities to drive market insight, create competitive products and better serve their customers.

About the author

More than anything, Steve MacLauchlan loves solving problems collaboratively with our clients at UDig and building lasting relationships. He enjoys meeting new people, learning about the businesses we help and understanding emerging technologies. He’s passionate about data as the key to the future and believes the ability to leverage it is paramount to remaining competitive. Any problem can be solved with the right application of people, process, and technology: often in that order. UDig is a technology consulting company headquartered in Richmond, Virginia, that focuses on solving business challenges with innovative and modern technical solutions. We specialize in Digital, Data and Engineering initiatives.

You might also like...

RWDG Slides: Data Governance and Three Levels of Metadata

Read More →