Advertisement

Python Libraries for Better Data Science

By on

bookby Angela Guess

Tyler Keenan recently wrote in Business2Community, “One of Python’s greatest assets is its extensive set of libraries. Libraries are sets of routines and functions that are written in a given language. A robust set of libraries can make it easier for developers to perform complex tasks without rewriting many lines of code. In this article, we’ll introduce you to some of the libraries that have helped make Python the most popular language for data science in Stack Overflow’s 2016 developer poll. These are the basic libraries that transform Python from a general purpose programming language into a powerful and robust tool for data analysis and visualization. Sometimes called the SciPy Stack, they’re the foundation that the more specialized tools are built on.”

Keenan’s list begins: “(1) NumPy is the foundational library for scientific computing in Python, and many of the libraries on this list use NumPy arrays as their basic inputs and outputs. In short, NumPy introduces objects for multidimensional arrays and matrices, as well as routines that allow developers to perform advanced mathematical and statistical functions on those arrays with as little code as possible. (2) SciPy builds on NumPy by adding a collection of algorithms and high-level commands for manipulating and visualizing data. This package includes functions for computing integrals numerically, solving differential equations, optimization, and more. (3) Pandas adds data structures and tools that are designed for practical data analysis in finance, statistics, social sciences, and engineering. Pandas works well with incomplete, messy, and unlabeled data (i.e., the kind of data you’re likely to encounter in the real world), and provides tools for shaping, merging, reshaping, and slicing datasets.”

The full article has over a dozen more libraries. Check it out here.

Photo credit: Flickr

Leave a Reply