Advertisement

3 Ways to Avoid Disappointment with Data Science Projects

By on

Head in Handsby Angela Guess

Sergo Grigalashvili recently wrote in The Enterprisers Project, “You know the old saying: ‘lies, damned lies, and statistics.’ The same can be said about data science broadly. We can easily, by mistake or not, say or imply something using numbers that are not there in reality; predictive models can be good on historical data but useless in a real-world situation; data analysis can yield interesting findings that are not very actionable and practically worthless. There are many reasons why this is so pervasive, and none of the reasons are new. Statisticians have known them well for decades. The problem is that the scale of data science today magnifies the pitfalls and their impact (as it does benefits). Below are the big three from my experience.”

Grigalashvili goes on, “Perhaps the biggest and most overlooked pitfall is the sampling bias. For example, marketers often look at Twitter for trends around what is generating interest. But the Twitter data set is one of the most biased datasets out there to measure interest. Active Twitter users are predominantly urban, young, white, and in media, entertainment, or marketing industries. Any production or advertising decision made based solely on Twitter data is bound to be extremely biased. This is not to be critical of Twitter. In reality, any data set is biased, and we often fail to recognize this. We assume – or we just choose to believe – that our sample is a good representation of the population. For data science to work, analysts must understand the bias and factor it into the analysis rather than ignore and make poorly informed decisions and create poor models.”

Read more here.

Photo credit: Flickr

Leave a Reply