Click to learn more about Jonathan Buckley.
The prevailing opinion is that machines and computers are cold and calculating, not prone to the same knee-jerk reactions and passions that humans tend to have. In other words, they’re unbiased. They take a certain amount of input and produce a desired result based on their programming. That may be the overall dominant thought regarding computers, but it may not be entirely true. Of course, were you to say machines have the same sort of biases humans have, you’d likely be laughed at, but further research and investigation has found that machines can often succumb to certain blind spots and may even end up discriminating against certain groups — all unintentionally of course. This idea has only become more pronounced as big data analytics has entered the mainstream, and while most will still think machines are unbiased, the misconception may be damaging in more ways than one.
That big data analytics with machines may be biased will likely be a surprise to many people. After all, most experts have tended to think that using more data is a significant benefit since more data means the elimination of biases. With more information, there’s a more stringent process of filtering out data that might be tainted, swayed, or otherwise slanted. In other words, it takes away the need for gun instinct in favor of cold hard numbers. And this data is also analyzed by machines programmed with specific code. With all this in mind, many will wonder how machines could still be biased, but looking past the general concept of analytics reveals where this bias may likely be introduced.
When it comes to analytics, the trend is to adopt machine learning algorithms. Basically, these are algorithms that tell a machine how to learn in order to come up with new solutions. This isn’t a recipe, where ingredients are put in and a clear product comes out on the other end. Machine learning is a process, where two different machines could come up with two radically different ways to solve a problem or produce a result. Essentially, machine learning is the next step toward artificial intelligence, and let us not forget that one of the goals of artificial intelligence is to mimic the human brain as closely as possible. Needless to say, the human way to thinking lends itself toward certain biases.
An important note is that much of the bias machines can have comes from the data they are using. Big data may seem just as unfeeling as the machines themselves, but underneath the surface, certain biases can perpetuate. One example of this is pointed out by Kate Crawford, a principal researcher for Microsoft Research, as a study on social media data collected during Hurricane Sandy would have given an inaccurate picture of the scope of the disaster if all we had to rely on was social media information. The vast majority of tweets about Sandy originated in Manhattan, as should be expected. However, the areas that were hit the hardest sent out fewer tweets. That meant people from poorer neighborhoods and those places that needed help the most were misrepresented in the data. This idea can be seen in most data sets, with many groups and less visible places being excluded.
There are many biases that can afflict big data analytics. Some include selection bias, where the data that is included is arbitrarily limited by data scientists, and the exclusion or inclusion of outliers, which may skew results in certain ways which obscures the real picture. This is a serious matter which most data scientist have to take into account when using analytics to solve certain problems.
And that’s where the solution to the problem of machine bias likely lies. Better training within data science programs can help computer scientists be aware of the problem and account for possible discrimination. Better screening of algorithms is also another solution that can ensure the results they are getting are of high quality for different populations. Above all else, it’s important to know that while machines may not think and act exactly as humans do, biases can still be a problem, one which needs to be addressed to ensure they are fully eliminated. Whether using a big data tool like Hadoop Spark or programming new software, biases are one part of the equation that can’t be ignored.