Learn more about James Kobielus.
Some people never retire, but not necessarily because they still need to put food on the table. You’ll find many individuals over a certain age who are financially well off and could easily stop working. But they choose to keep at it for various reasons, such as loving what they do.
In this regard, data scientists are like anybody else. They choose to enter and remain in this profession for many reasons, with the intellectual challenge of the work high up on their priority list. In the hands of a data scientist, statistical models can unveil correlations and other patterns in real-world data that would otherwise have been overlooked. In addition, many data scientists are changing the world through disruptive applications of advanced analytics and algorithmic processes throughout business and industry.
The best data scientists stay in this field because it’s under their skin and they’re always scratching the itch known as curiosity. They live, love, and breathe data and are always doggedly mining and modeling it for fresh insights. And those are the sorts of individuals that I had the good fortune to engage with at Datapalooza a few weeks ago in San Francisco. At the inaugural session of this monthly community event, attendees ranged from data-science newbies to seasoned veterans. What they all had in common were a deep passion for data science, a need to engage their peers, and a desire to show others what they themselves can produce.
In viewing the presentations at Datapalooza, what I found most fascinating was the matter of where exactly data scientists get their ideas from. In terms of ideation, it would appear that the “scratch an itch” motivation for data science is just as common than any grandiose desire to change the world.
A good example of the “scratch an itch” impetus is the Caltrain schedule mobile app presented at Datapalooza by Edd Dumbill and associates from Silicon Valley Data Science (SVDS). As I discussed in this recent blog, the SVDS team’s reasons for pursuing the project weren’t world-shattering, but rather, of an incremental and personal nature in terms of the impacts to be achieved. They used data science to build a real-time mobile app to predict train arrivals throughout the Caltrain system in the Bay Area.
SVDS’ reasons for undertaking this were twofold. First, they sought to make their own lives as Caltrain commute a bit more predictable, by building a tool to supplement the unreliable train arrival data coming directly from the Caltrain system. Second, they sought to minimize Caltrain-cause noise disruptions on their office conference calls, by using their tools to identify more precisely the intervals of time when such disruption is likely to come.
By the same token, I saw a fair number of other projects that combined both the “scratch an itch” motivation—in the sense that the data scientists presenting them had a clear personal passion at work—as well as some larger cause they were using those projects to advance. This mix of motivations can be seen in the range of Spark-related projects that were either presented at Datapalooza or on display several blocks at the recently established Spark Technology Center (STC).
As I discussed in this recent blog, the projects vary widely in scope, objectives, and real-world impact. Most are very much in the “change the world” category, while some have a bit of intellectual “scratch an itch” built in them. In no particular order, the projects have the following aims:
- Broadcast the most serious missing children cases through AMBER Alert;
- Enable powerful social sentiment queries and algorithms on massive amounts of data in parallel processing environments;
- Understand how genetics contribute to complex disease;
- Act on real-time data driven insights discovered from Twitter;
- Enable users to compete against an algorithmic agent in 3 rounds of the classic childhood game, Rock Paper Scissors;
- Accelerate real-time facial detection, recognition, and intelligence in customer engagement scenarios.
- Analyze 100 million radio events that have been collected over several years in order to identify faint signals indicative intelligent extraterrestrial life;
- Enable more powerful predictive crime prevention;
- Sift in real time through Twitter data to gauge customer emotions on a multiple tone dimensions, ranging from anger to cheerfulness to openness; and
- Monitor Twitter feeds to capture the words that are most associated with a specific stock, tracks current stock prices, and sends buy, sell, or hold recommendations to the user.
Nevertheless, there were several clear change-the-world data-science initiatives on display at Datapalooza. As I discussed in this blog, IBM discussed its role in the following initiatives:
- Support for the United Nations Sustainable Development Goals, focusing on data science initiatives to promote education of girls and address other global concerns;
- Partnership with DataKind and MicroCred Group to use data scientist to expand access to financial services in the developing world;
- Partnership with Galvanize to provide full data-science scholarships to women from science, technology, engineering, and mathematics backgrounds, thereby contributing to diversity in the profession
Clearly, there are as many itches to scratch with data science as there are deserving causes in the world. And the value from scratching those itches may be global, or simply to make a life a little more easy and pleasant for the data scientist and people like them.
Itching for more data scientific stimulation? Click here to become part of the STC community and contribute projects, design, and code to Apache Spark.
Also, Datapalooza may soon be coming to a city near you. Stay tuned here for updates. We hope to engage the world’s brightest data scientists wherever and whenever makes sense for you.