Data scientists are human, from what I’ve seen. The same can be said about the subject-matter domain specialists who work hand-in-hand with data scientists. Sometimes one and the same human performs both functions. Sometimes it is different humans engaged in a very wetware-intensive collaboration.
Humans are only human. We have limited capabilities, attention spans, and the like. But our data, and the knowledge that we might gain from it, are seemingly unlimited. As such even the data scientists and domain experts of this world have to prioritize their efforts to extract insights from some relevant portion of the vast ocean of information that surges around them.
There are only so many hours in the day. That’s why scientists and analysts must leverage every big data productivity, automation, and acceleration tool in their arsenals to sift, sort, search, infer, predict, and otherwise make sense of the data that’s “out there.” And that’s why so many have embraced machine learning.
Fundamentally, machine learning is a productivity tool for data scientists, helping them to get smarter, just as machine learning algorithms can’t get smarter without some ongoing training by data scientists. Wikipedia defines it as “system that can learn from data.” Machine learning allows data scientists to train a model on an example data set, and then leverage algorithms that automatically generalize and learn both from that example and from fresh feeds of data.
To varying degrees, you’ll see the terms “unsupervised learning,” “deep learning,” “computational learning,” “cognitive computing,” “machine perception,” “pattern recognition,” and “artificial intelligence” used in this same general context. Unsupervised learning is the practice that most analysts think of when the discussion turns to machine learning. As I noted in this LinkedIn blog, data scientist (and astrophysicist) Kirk Borne has referred to unsupervised learning as: “essentially the purest form of Data Mining (in my opinion): it is data-driven, evidence-based, unfettered by models or preconceived notions regarding the patterns in the data. It is used to discover the patterns, anomalies, categories, correlations, and features in the data, both big and small.”
That may be true, but unsupervised learning doesn’t necessarily mean that the resultant learning is superior to what a human analysts might have achieved through more manual “knowledge discovery” techniques. For example, this recent article reports on Stanford researchers building an open-source machine learning model for sentiment analysis that, they claim, “accurately classif[ies] the sentiment of a sentence 85 percent of the time…..[compared to] the previous state of the art for this task….[peaking] at about 80 percent accuracy.” I’ll bet that a reasonably intelligent human being can probably identify the sentiment of an average sentence – positive, neutral, or negative – with 90-plus percent accuracy.
Still, you don’t need to believe that machines can think better than or as well as humans to see the value of machine learning. We gladly offload many cognitive processes to automated systems where there just aren’t enough flesh-and-blood humans to exercise their highly evolved brains on various analytics tasks.