by Angela Guess
An article by Mike Loukides discusses the still important value of data hand tools. Loukides begins, “The flowering of data science has both driven, and been driven by, an explosion of powerful tools. R provides a great platform for doing statistical analysis, Hadoop provides a framework for orchestrating large clusters to solve problems in parallel, and many NoSQL databases exist for storing huge amounts of unstructured data… But these tools haven’t negated the value of much simpler tools; in fact, they’re an essential part of a data scientist’s toolkit. Hilary Mason and Chris Wiggins wrote that ‘Sed, awk, grep are enough for most small tasks,’ and there’s a layer of tools below sed, awk, and grep that are equally useful.”
Loukides continues, “The advent of cloud computing, Amazon’s EC2 in particular, also places a premium on fluency with simple command-line tools. In conversation, Mike Driscoll of Metamarkets pointed out the value of basic tools like grep to filter your data before processing it or moving it somewhere else. Tools like grep were designed to do one thing and do it well. Because they’re so simple, they’re also extremely flexible, and can easily be used to build up powerful processing pipelines using nothing but the command line. So while we have an extraordinary wealth of power tools at our disposal, we’ll be the poorer if we forget the basics.”
Learn more by reading the full article here.