by Angela Guess
Joseph Misiti recently responded to a GigaOM article that insists Becoming a Data Scientist Might Be Easier Than You Think. He writes, “You can take the ML course on Coursera and you’re magically a data scientist, because three really intelligent people did it. I disagree. I’m not claiming the people referenced in this article are not data scientists who score high in Kaggle competitions. They’re probably really intelligent people who picked up a new skill and excelled at it (although one was already an actuary, so he is basically doing machine learning in some form already). Here is my problem with it — being a data scientist usually requires a much larger skill set than a basic understanding of a few learning algorithms. I’m taking the Coursera ML course right now, and I think it is great! Here is what I didn’t learn though.”
His list begins, “Most data scientists and the companies that employ them are not using Matlab/Octave. They have backend web services written in Java, Python, Scala, or Ruby. These languages are not covered. Python has libraries like Scipy, Numpy, and Scikit-learn that are great for solving numerical problems. Java has a bunch of libraries too like the Mahout math library [2]. R is used by most statisticians (again not covered in the course). When your boss (or a customer) comes to you and says you need to integrate an algorithm into a pre-existing web service ( example -they need a recommendation engine), and you say ‘I only know Matlab’ that is going be a huge problem. You don’t just pick up Java/Python/C++/Scala/whatever in a few days on the job. You have to be somewhat familiar with these languages to understand large, pre-existing code bases.”

















