Vetting the Actual Science Behind Data Science

Click here to learn more about author James Kobielus.

Everybody wants to rule the world–or, at the very least, discover the fundamental rules that rule the world. That’s why we have scientists.

Statistical models are the heart of most scientific inquiry. In business applications, for example, data scientists often work with behavioral data that is drawn from fields such as sociology and psychology that have been probabilistic from the get-go. Even B.F. Skinner, the most famously deterministic behaviorist of the past century, was unable to reduce all behavior to a clockwork stimulus-response framework. Even if it were possible to predict human behavior with high confidence, social scientists will always confront the wild card of the human mind. The genetic, cognitive, affective, and other “unseen” factors driving our actions are too complex, multilayered, and mercurial to predict with any precision.

Determinism is a dangerous assumption when you’re referring to the study of human behavior. However, watered-down determinism continues to lurk around the edges of the behavioral sciences, aided and abetted by the predictive prowess of big data analytics and the Internet of Thing (IoT). As Nicholas Carr stated in this article, “With smartphones ubiquitous, Facebook inescapable, and wearable computers … emerging, society is gaining a digital sensing system. People’s location and behavior are being tracked as they go through their days, and the resulting information is being transmitted instantaneously to vast server farms. Once we write the algorithms needed to parse all that ‘big data,’ many sociologists and statisticians believe, we’ll be rewarded with a much deeper understanding of what makes society tick.”

Owing to its deterministic connotations, one of the more unfortunate new terms in the behavioral sciences is the notion of “social physics.” This implies the possibility of using IoT-sourced “quantified self” (QS) data and other behavioral sources to find strongly predictive natural laws of human engagement. It also implies that these quasi-deterministic insights can be used to refine the arts of social manipulation. Per the website of MIT’s Human Dynamics Lab, the linkage to “social engineering” is explicit: “The engine that drives social physics is big data: the newly ubiquitous digital data that is becoming available about all aspects of human life. By using these data to build a predictive, computational theory of human behavior we can hope to engineer better social systems.”

Even if purveyors of “social physics” don’t adopt a strictly deterministic outlook, this framework makes it too easy to pass off any stray IoT/QS-sourced behavioral-analytics project as confirmed science. Case in point is in this recent blog by Damian Fernandez-Lamela. In the piece, he defines two categories of “social physics” interactions: engagement and exploration. That would be fine if he didn’t also make unsubstantiated assertions that imply scientific corroboration of laws associated with each category of interaction. On the one hand, he states that a “higher frequency of engagement interactions are a predictor of productivity, as it helps coordinating the behavior of a group.” On the other, he states “having more exploration interactions can be a predictor of the level of innovation of a company or team.”

Those observations may or may not describe the phenomena to which they allude. But for Fernandez-Lamela to simply assert that some variable “can be a predictor” is far from asserting a verified scientific finding. In data science, a “predictor” is any independent variable that you may choose to incorporate into a statistical model to identify its correlation—if any—to the dependent variable of interest. Anything can be a “predictor,” but that doesn’t necessarily make it strongly predictive. In fact, its connection to the dependent variable of interest may be purely spurious, though, in the context of a statistical model, it remains a (useless) “predictor.”

Later on in the article, Fernandez-Lamela alludes to unnamed “research” that “also shows that in-person interactions are orders of magnitude more important than online interactions.” Once again, that’s interesting, but, unless he actually identifies this research and supplies a smidgen of information about it, we would be ill-advised to accept these findings as confirmed scientific fact.

Don’t get me wrong. I think there is scientific merit to the sorts of research being conducted under the “social physics” umbrella. As I noted in this post, the scientific establishment is beginning to realize the potential of IoT/QS tools for gaining primary data directly from human subjects in a way that is organic to the biological, behavioral, and psychological phenomena being studied. Professional scientists in many fields (biology, sociology, genomics, etc.) into data from IoT/QS, from crowdsourcing, and other sources. Doing so enables them to tap into data sources that they previously may have been unable to collect.

But let’s not sanctify every statistical analysis of IoT/QS data with the implication that it’s settled scientific fact. You would need controlled trials, independent verification, peer review, and the like to validate any such claim. And you would need researchers to refrain from implying that they’re uncovering deterministic laws about how people think, feel, and behave.

As the physicists know, it’s not a clockwork universe and human cognition doesn’t operate on cogs, flywheels, and mainsprings.

LEARN HOW TO IMPLEMENT MACHINE LEARNING IN YOUR ORGANIZATION

Data Topics

Vetting the Actual Science Behind Data Science

Leave a Reply Cancel reply