Alistair Croll recently argued that Big Data is this generation's civil rights issue. He explains, "In the old, data-is-scarce model, companies had to decide what to collect first, and then collect it. A traditional enterprise data warehouse might have tracked sales of widgets by color, region, and size. This act of deciding what to store and how to store it is called designing the schema, and in many ways, it’s the moment where someone decides what the data is about. It’s the instant of context. That needs repeating: You decide what data is about the moment you define its schema."
He goes on, "With the new, data-is-abundant model, we collect first and ask questions later. The schema comes after the collection. Indeed, Big Data success stories like Splunk, Palantir, and others are prized because of their ability to make sense of content well after it’s been collected—sometimes called a schema-less query. This means we collect information long before we decide what it’s for. And this is a dangerous thing. When bank managers tried to restrict loans to residents of certain areas (known as redlining) Congress stepped in to stop it (with the Fair Housing Act of 1968.) They were able to legislate against discrimination, making it illegal to change loan policy based on someone’s race."
Image: Courtesy Wikipedia