Artificial intelligence and machine learning techniques are altering the way organizations gather, process, and protect data. They are being used to gather massive amounts of information about internet users in the form of big data, and to secure and protect it. The challenge is how to maximize the use of big data, while simultaneously safeguarding the information and protecting the privacy of individuals. Europe’s General Data Protection Regulation (GDPR), which focuses on privacy, has several features that demand additional protection for the privacy of Europeans. (Expect the U.S. to follow within a few years.) The introduction of the GDPR has created the need for more complicated machine learning systems. Lilian Edwards, a University of Strathclyde law professor in Glasgow, Scotland, stated:
“Big data is completely opposed to the basis of data protection. I think people have been very glib about saying we can make the two reconcilable, because it’s very difficult.”
The collection of personal data starts with an effective data intelligence system that can select the desired data, determine to whom it belongs, and decide how it is to be used. Classification, correlation, data discovery, identity, and privacy-specific requirements (such as consent checking), each use different techniques, different training models, and different reasoning. In spite of the differences, these elements must be organized into a cohesive model, and per the GDPR, must include the ability to maintain data privacy.
A modern collection system must be capable of finding, classifying, correlating, tracking, and cataloging data as it is collected and processed. These tasks can be difficult to coordinate initially, and become even more difficult when the diversity of mobile phones and IoT applications are added. The discovery of trustworthy patterns within the data requires a high degree of accuracy and involves eliminating false positives and unnecessary noise. This can be accomplished with machine learning.
Machine learning (ML) uses algorithms that are designed to progressively improve themselves. They do this by processing data, which acts as a form of training. The more training, the better the algorithm gets at finding patterns. Using speech patterns for training helps a bot to sound more human and recognize what is being said. Training with visual patterns can help autonomous cars to identify items on the road. Customer behavior patterns can train a system to associate buying patterns. This differs from artificial intelligence (AI).
AI vs. Machine Learning vs. Deep Learning
Machine learning and artificial intelligence are two popular catchphrases that are often used interchangeably. These two concepts are not the same thing, however, and confusion between the two can lead to breakdowns in communication. The two terms are often used when discussing analytics or big data. Artificial intelligence (AI) as a concept came first, with an emphasis on imitating human intelligence. Machine learning (ML) followed, as a method for achieving artificial intelligence.
In the 1990s, as AI began using neural networks for their foundation, machine learning advocates shifted their focus to more basic, repetitive behaviors, and began creating ML programs designed to analyze huge amounts of data and learn from the process. Machine learning, without the goal of imitating human intelligence, is being used to handle internet purchases, collect information, “personalize” an internet experience, recommend similar products, and analyze sales data. Machine learning is no longer a form of AI, but is a system of “limited” responsive behavior patterns, which are learned and developed through experiences, significantly reducing the need for manual programming.
Deep learning is a “modern” training process for AI using neural networks. Deep learning models attempt to imitate the processing and communication patterns used in biological nervous systems (especially human brains). The majority of modern deep learning models are designed for artificial neural networks.
Security—The Good News and the Bad
AI systems are being designed to fight cyber threats by outwitting them. Many researchers are adding a process known as “Attack, Detect, and Protect,” to protect their AI systems and applications. This includes using facial recognition, automobiles, medical data, and other methods of identifying people. They are also able to model a potential hacker, simulating attacks and creating countermeasures prior to the attacks.
Unfortunately, the hackers are just as industrious, and have a variety of attack methods and ways to use artificial intelligence to their advantage. “Evasion attacks” provide one example. In this situation, the system is flooded with false negatives (basically malware disguised as a benign code), which cause security analysts to ignore alerts.
“Poisoning attacks” provide another example, injecting false data designed to poison the AI training data, and create biases in certain classifications. This kind of attack can change the AI model significantly, impacting its decisions and outcomes. Hackers can, unfortunately, use their own AI, and send it crawling through the internet, searching for vulnerabilities.
For many internet businesses, the goal is to analyze incoming data using relationships, not just similarities. ML and AI can provide ways to achieve this goal, while simultaneously supporting privacy and data protection. Machine learning systems containing personal data must, under the GDPR, be able to locate information, alter it, and limit what is done with it. Article 5(1)(a) of the GDPR says the personal data of individuals must be “processed fairly, lawfully and in a transparent manner in relation to the data subject.”
Additionally, there is the diversity of mobile phones and IoT applications to deal with. All of this, technically speaking, can be difficult. The more connections a system contains, the more the potential for security threats. That means the demands on security are becoming increasingly complex and may extend to devices not yet included in a data security program.
The GDPR complicates the process by requiring blatant transparency and minimizing the amounts and kinds of data that can be collected. The GDPR states:
- When an organization collects personal data, it must state what the collected data will be used for. The data cannot be used for any other purpose, which includes sharing it with third parties.
- Only the minimum amount of data needed for a project or process is to be collected. Data can only be held for a limited time.
- An organization must tell people what data about them it has and what is being done with it.
- An organization must alter or eliminate an individual’s personal data, if requested.
- When personal data is being used for automated decisions about people, the organization must be able to explain the logic behind the decision-making process.
How could an organization legitimately justify collecting and storing data that could be used to infer the sexual preferences and political and religious beliefs of individuals? There are some unethical individuals who would use this information for monetary profit, or to manipulate people’s behavior.
The GDPR’s purpose is to protect the privacy and rights of European citizens. These efforts keep Europeans from being manipulated into making unnecessary purchases or being targeted with false information about political candidates and issues. Lilian Edwards commented on the problems facing big data research:
“Big data challenges purpose limitation, data minimization, and data retention. Most people never get rid of it with big data. It challenges transparency and the notion of consent, since you can’t consent lawfully without knowing to what purposes you’re consenting. Algorithmic transparency means you can see how the decision is reached—but you can’t, with machine learning systems, because it’s not rule-based software.”
Image used under license from Shutterstock.com