You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Modeling Mind and Body with Spark

By   /  August 3, 2015  /  No Comments

by James Kobielus

What is Apache Spark best for? It’s the right tool for modeling any massively distributed, continually streaming, in-memory connectivity graph that processes sensor data and drives optimized next best actions in a spatio-temporal context.

What fits that exceptionally complex use-case description better than the neural network of which one’s brain is the hub and connected sensorimotor apparatus the periphery? By virtue of that fact, there is no big-data analytics framework better suited than Apache Spark for neuroscientific modeling. And no one doubts that open-source frameworks such as Spark are the way to go in this and other scientific disciplines that involve disparate communities of researchers working on a wide range of projects, ranging from basic research to real-world applications.

Spark is becoming the tool of choice for whole-brain neuroscientific modeling. As this recent article notes, Spark’s advantages for this type of modeling go beyond its neuroscience-friendly development abstraction for streaming, graph, and machine-learning analytics. And they go beyond Spark’s highly scalable distributed runtime environment for massively parallel processing. Another principal advantage of Spark is its abstraction for in-memory caching, which facilitates submission of rapid interactive, exploratory queries. Yet another is its powerful, flexible, and intuitive APIs in Scala, Java, and Python. Spark’s Python API in particular lets neuroscientists combine existing Python tools for scientific computing (NumPy, Scipy, scikit-learn) and visualization (matplotlib, seaborn, mpld3).

The article describes how a team has statistically mapped neural activation patterns at scale by constructing Spark models that it executes on big-data analytics clusters. It has built dynamic models of brain activities by correlating two sets of time-series data: environmental and neural. The latter is sourced from sensors that read the activity of many neurons firing in parallel. The former describe properties of the outside world that the central nervous system senses, processes, and responds to. The author, computational neuroscientist Jeremy Freeman, describes how researchers employed dimensionality reduction to capture key dynamic properties of the brain without oversimplifying the complex processes at work.

What’s most interesting about Freeman’s discussion is how he relates this model effort to the search for basic principles of what he called “neural coding.” Specifically, he notes the limitations of tools and algorithms—such as those in Spark–for building so-called “neural networks” that emulate brainwave activity. “[T]here remains a significant gap between these networks and real brains,” he states, “In most artificial networks, each node does essentially the same kind of thing, whereas everywhere we look in the brain we see diversity. There are hundreds or thousands of different kinds of neurons, with diverse morphologies, functions, patterns of connectivity, and forms of communication. Perhaps related, real organisms do not solve just one highly specific task with a clear objective (e.g. face recognition); they flexibly navigate and interact with a dynamic, ever-changing world.”

He refers to this underlying processing principle of biological computation as “neural diversity,” and states that its underlying mechanism is still a mystery to neuroscience. Hence, Spark and other tools that leverage a single-function neural computational abstraction can only result in crude approximations of the real thing.

What this means is that Spark, though very powerful for such modeling, may be distinctly limited in its ability to help unlock the full secrets of the brain.

About the author

James Kobielus, Wikibon, Lead Analyst Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine learning, and cognitive computing applications. Prior to his 5-year stint at IBM, Jim was an analyst at Forrester Research, Current Analysis, and the Burton Group. He is also a prolific blogger, a popular speaker, and a familiar face from his many appearances as an expert on theCUBE and at industry events.

You might also like...

Where is Data Science in the Hype Cycle?

Read More →