You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Machine Learning Boosts Data-Scientist Learning, and Vice Versa

By   /  October 28, 2013  /  No Comments

by James Kobielus

Data scientists are human, from what I’ve seen. The same can be said about the subject-matter domain specialists who work hand-in-hand with data scientists. Sometimes one and the same human performs both functions. Sometimes it is different humans engaged in a very wetware-intensive collaboration.

Humans are only human. We have limited capabilities, attention spans, and the like. But our data, and the knowledge that we might gain from it, are seemingly unlimited. As such even the data scientists and domain experts of this world have to prioritize their efforts to extract insights from some relevant portion of the vast ocean of information that surges around them.

There are only so many hours in the day. That’s why scientists and analysts must leverage every big data productivity, automation, and acceleration tool in their arsenals to sift, sort, search, infer, predict, and otherwise make sense of the data that’s “out there.” And that’s why so many have embraced machine learning.

Fundamentally, machine learning is a productivity tool for data scientists, helping them to get smarter, just as machine learning algorithms can’t get smarter without some ongoing training by data scientists. Wikipedia defines it as “system that can learn from data.” Machine learning allows data scientists to train a model on an example data set, and then leverage algorithms that automatically generalize and learn both from that example and from fresh feeds of data.

To varying degrees, you’ll see the terms “unsupervised learning,” “deep learning,” “computational learning,” “cognitive computing,” “machine perception,” “pattern recognition,” and “artificial intelligence” used in this same general context. Unsupervised learning is the practice that most analysts think of when the discussion turns to machine learning. As I noted in this LinkedIn blog, data scientist (and astrophysicist) Kirk Borne has referred to unsupervised learning as: “essentially the purest form of Data Mining (in my opinion): it is data-driven, evidence-based, unfettered by models or preconceived notions regarding the patterns in the data. It is used to discover the patterns, anomalies, categories, correlations, and features in the data, both big and small.”

That may be true, but unsupervised learning doesn’t necessarily mean that the resultant learning is superior to what a human analysts might have achieved through more manual “knowledge discovery” techniques. For example, this recent article reports on Stanford researchers building an open-source machine learning model for sentiment analysis that, they claim, “accurately classif[ies] the sentiment of a sentence 85 percent of the time…..[compared to] the previous state of the art for this task….[peaking] at about 80 percent accuracy.” I’ll bet that a reasonably intelligent human being can probably identify the sentiment of an average sentence – positive, neutral, or negative – with 90-plus percent accuracy.

Still, you don’t need to believe that machines can think better than or as well as humans to see the value of machine learning. We gladly offload many cognitive processes to automated systems where there just aren’t enough flesh-and-blood humans to exercise their highly evolved brains on various analytics tasks.

About the author

James Kobielus, Wikibon, Lead Analyst Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine learning, and cognitive computing applications. Prior to his 5-year stint at IBM, Jim was an analyst at Forrester Research, Current Analysis, and the Burton Group. He is also a prolific blogger, a popular speaker, and a familiar face from his many appearances as an expert on theCUBE and at industry events.

You might also like...

Property Graphs: The Swiss Army Knife of Data Modeling

Read More →