Advertisement

Techniques and Algorithms in Data Science for Big Data

By on

kf_mlalg_0316161In simple terms, Big Data – when combined with Data Science – allow managers to measure and assess significantly more information about the subtleties of their businesses, and to use the information in making more intelligent decisions. In 2011, during the period when the growth of Big Data was really gaining significant notice throughout the Data Management industry, said that it “is evolving into the key basis for competition.” It has now evolved, data volumes continue to grow, and now the question is no longer if it’s a new trend and what affects it will have, but how to leverage Big Data in meaningful ways for the enterprise. Data Science has been around for much longer than Big Data, but it wasn’t until the growth of data volumes reached contemporary levels that Data Science has become a necessary component of enterprise-level Data Management.

The Big Data revolution has arguably provided a more powerful information foundation than any previous digital advancement. We can now measure and manage massive amounts of information with remarkable precision. This evolutionary step allows managers to target and provide more finely tuned solutions and to use data in areas historically reserved for the “gut and intuition” decision-making process.

Flexibility and agility are two states of mind useful in dealing with Big Data. Successfully exploiting the value of Big Data requires experimentation and exploration. Whether creating new products or looking for ways to gain a competitive advantage, getting optimum results from Big Data requires curiosity and an entrepreneurial outlook. In her Enterprise Data World 2015 Conference presentation, titled “Techniques and Algorithms in Data Science for Big Data,” Laila Moretto suggested a questioning mindset is preferable to one easily satisfied with assurances.

The philosophies and software of Big Data have become more popular; they are now influencing and altering long-standing beliefs about the value of flexibility, long-term thinking, and decision-making. Leaders from all industries are using the insights gained from Big Data Analytics as management tools. The problems with incorporating Big Data technologies into an established organization can be quite large and in most cases still require significant leadership. There is continuing resistance to change by key individuals and they will have to be dealt with, preferably by way of retraining and counseling. In spite of these resistance problems, it is a revolution executives need to take seriously if they wish to remain competitive.

The past few years have seen a significant rise in tools to deal with Big Data and its numerous associated data types, but many enterprises are still only just beginning to understand how to best deal with their new assets. Fortunately, the cost of computing and organizing corporate data has been declining quite steadily. Mobile phones, social networks, GPS, sensors, online shopping, and a host of other sources are producing a flood of data, and the end result of these new data sources is optimistically “useful information.”

Broadly speaking, there are five ways all this data can be used. First, it can make information much more transparent, much more quickly. Second, organizations can collect and analyze more digital data, accurately. Third, the use of such data can create much more precisely tailored products or services for customers. Fourth, combined with the right analytics and Data Science, the decision-making process becomes significantly more efficient. Fifth, it can be used to improve the next generation of services and products for a business’s customer base.

Decision Analytics and Machine Learning

 Laila Moretto believes incorporating Big Data and Data Science into an organization successfully requires asking some basic questions:

  • What kind of analytics are going to be used?
  • Should the analytics lean more toward Machine Learning (for tasks such as facial recognition or reading handwriting)
  • Or would Decision Analytics be more useful (examples include the new “automatic brakes” on cars and store coupons tailored specifically to individual customers)?
  • Who will you hire to best deal with these technologies?

The algorithm/s chosen for an analytics program will be decided upon by the goals that have been established.

Big Data analytics can reveal solutions previously hidden by the sheer volume of data available, such as an analysis of customer transactions or patterns of sales. The most successful internet startups are good examples of how Big Data with Data Science is used to enable new services and products. Facebook, for example, has combined a large number of signals from a user’s actions and those of their friends; they have been able to craft a highly personalized user experience and create a new kind of advertising business. It’s no coincidence that some of the earliest ideas and tools for dealing with Big Data have come from Facebook, Google, Yahoo, and Amazon.

Many Useful Algorithms

A variety of Machine Learning and data mining algorithms are available for creating  valuable analytic platforms. Established goals will determine which algorithms are used to sort out and process the information available. Various algorithms have been developed to deal specifically with business problems. Other algorithms were designed to augment current existing algorithms, or to perform in new ways. According to Moretto, Some algorithms will be more appropriate than others. There are a range of algorithms to choose from. They can do anything from recognizing faces to reminding clients they have an appointment.

Algorithm models take different shapes, depending on their purpose. Using different algorithms to provide comparisons can offer some surprising results about the data being used. Making these comparisons will give a manager more insight into business problem and solutions. They can come as a collection of scenarios, an advanced mathematical analysis, or even a decision tree. Some models function best only for certain data and analyses. For example, classification algorithms with decision rules can be used to screen out problems, such as a loan applicant with a high probability of defaulting.

Unsupervised clustering algorithms can be used to find relationships within an organization’s dataset. These algorithms can be used to find different kinds of groupings within a customer base, or to decide what customers and services can be grouped together. An unsupervised clustering approach can offer some distinct advantages, as compared to the supervised learning approaches. One example is the way novel applications can be discovered by studying how the connections are grouped when a new cluster is formed.

Laila Moretto covered the primary uses of many algorithms in her presentation (see the video link at the bottom for a deeper discussion of each algorithm), including:

  • K Means Clustering
  • Association Rules
  • Linear Regression
  • Logistic Regression
  • Naïve Bayesian Classifier
  • Decision Trees
  • Time Series Analysis
  • Text Analysis

Choosing Data Scientists for Employment

Businesses such as Facebook and Google have numerous Data Scientists on their staff. Companies like Target and Macy’s are moving in that direction. The skills of Data Scientists are necessary, both in setting up the data system, choosing an algorithm, and in interpreting the results. Choosing the right algorithms for an organization involves a combination of science and art. The “artistic” part is based on data mining experience, combined with knowledge of the business and its customer base. These abilities play a crucial role in choosing an algorithm model capable of delivering business queries accurately. For this to happen, a competent staff of Data Scientists needs to be in place.

Laila Moretto has the following suggestions when interviewing a Data Scientist:

  • Ask, “Was your education more related to Machine Learning, or decision-making analytics?” (A business may need one of each, or more.)
  • Look for graduates that have done Machine Learning projects, capstone projects, or worked in competitions. (Essentially, people with some hands on experience.)
  • Look for graduates who have done internships in areas similar to the ones being planned.

The use of Big Data, when coupled with Data Science, allows organizations to make more intelligent decisions. Its evolution has resulted in a rapid increase in insights for enterprises utilizing such advancements. Learning to understand Big Data, and hiring a competent staff, are key to staying on the cutting edge in the information age.


Here is the video to the Enterprise Data World 2015 presentation:

44:28

 

Leave a Reply