Putting Notebooks to Work in Data Science

by Angela Guess

Dan Osipov recently wrote in Datanami, “Interactive notebooks are experiencing a rise in popularity. How do we know? They’re replacing PowerPoint in presentations, shared around organizations, and they’re even taking workload away from BI suites (more on that later). Even though they’ve become prominent in the past few years, they have a long history. First notebooks were available in packages like Mathematica and Matlab, used primarily in academia. More recently they’ve started getting traction in Python community with iPython Notebook. Today there are many notebooks to choose from: Jupyter (successor to the iPython Notebook), R Markdown, Apache Zeppelin,Spark Notebook, Databricks Cloud, and more. There are kernels/backends to multiple languages, such as Python, Julia, Scala, SQL, and others.”

Osipov goes on, “Traditionally, notebooks have been used to document research and make results reproducible, simply by rerunning the notebook on source data. But why would one want to choose to use a notebook instead of a favorite IDE or command line? There are many limitations in the current browser based notebook implementations that prevent them from offering a comfortable environment to develop code, but what they do offer is an environment for exploration, collaboration, and visualization. Notebooks are typically used by data scientists for quick exploration tasks. In that regard they offer a number of advantages over any local scripts or tools. When properly set up by the organization, a notebook offers direct connections to all necessary sources of data, without additional effort on the part of the user. While it may seem like a trivial task, connecting to the right data source can be far from simple.”

Data Topics

Putting Notebooks to Work in Data Science

Leave a Reply Cancel reply