Loading...
You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Social Data Quality Will Take Back Seat to Data Relevance

By   /  July 22, 2013  /  No Comments

by James Kobielus

People prevaricate, mislead, and exaggerate in every possible social context. It’s no surprise that their Tweets and other social media remarks are full of the same. If you imagine that the social streams you’re filtering are rich founts of only honest sentiment, you’re unfortunately mistaken.

However, even bald-faced lies can be a valuable intelligence, if we vet them effectively. Interpreted in the proper context, verbal insincerity can illuminate the distance between people’s inner desires and outward behaviors. If we’re aware of this tendency, we can apply the appropriate predictive weights to behavioral models that rely heavily on verbal evidence, such as Tweets, logs of interactions with call-center agents, and responses to satisfaction surveys.

Customer data has various criteria for quality. For your enterprise’s system-of-record data, the literal accuracy of customer names, addresses, billing information and the like is the most important quality criterion. But where those same customers’ Tweets, Facebook updates, and other social media streams are concerned, the data quality metric has its relevance in helping you compile a 720-degree portrait of the behavioral tendencies and predispositions of various population segments. For example, you don’t want to invest in an expensive promotional campaign if your target demographic isn’t likely to back up their half-hearted statement that your new product is “interesting” by whipping out their wallets at the point of sale.

This thought came to me recently as I read a blog about the quality of “crowd-sourced” and “self-reported” data, such as Twitter, Facebook, and other social feeds. The author, Jim Harris, quotes a reader of his as calling for a metric of “relevance” and for a “broadening [of] data quality to embrace information and knowledge quality.”

This is exactly right. You may not be able to verify the literal accuracy of customer sentiment data, be it “crowd-sourced” or gathered through well-established market research approaches. You may not even need or want to, because social sentiment data has weak quality and governance requirements. This data is ephemeral and is aggregated for patterns and trends relating to broad customer segments, not individual customers. But you can learn plenty by aggregating various feeds of semi-reliable sentiment data, plus associated metadata, and by assessing its correlation to specific behaviors of interest (e.g., purchases).

Relevance means having good enough intelligence, warts and all, in order to support whatever decision scenario confronts you. What you can learn from quality-uncertain social sentiment data is the situational contexts in which some customer segments are likely to be telling the truth about their deep intentions. You can also identify the channels in which they prefer to reveal those truths.

In the process, you can also determine which sources of customer sentiment data to prioritize and which to ignore in various application contexts.

That’s high quality social intelligence, for sure. Call it “meta-intelligence,” if you wish.

About the author

James Kobielus, Wikibon, Lead Analyst Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine learning, and cognitive computing applications. Prior to his 5-year stint at IBM, Jim was an analyst at Forrester Research, Current Analysis, and the Burton Group. He is also a prolific blogger, a popular speaker, and a familiar face from his many appearances as an expert on theCUBE and at industry events.

You might also like...

Machine Learning Will Do Auto-Programming’s Heavy Lifting

Read More →