When Maria Voreh started at the Federal Bureau of Investigation (FBI), her first assignment was working on the Integrated Automated Fingerprint System (IAFS), technology often seen in police movies and television shows such as “CSI.” “They take a print and the machine does this magic work, and sub-seconds later the suspect is found.” Voreh is Chief Data Officer (CDO) at the FBI and was a keynote speaker at the DATAVERSITY® DG Vision Conference. Her presentation as titled Data Governance in the Modern Era: Balancing Data Security with Data Availability.
The data IAFS searches is not the FBI’s — the data comes from all the law enforcement communities in the country. To protect it, there are rules, constraints, and policy in place about how it can be used or shared. That’s how Voreh really started working in the FBI, she said: Working on rules to maintain the integrity and security of personal data, and at the same time, making it available in sub-second time to the law enforcement officer.
“If you have a law enforcement officer on the side of the road, he doesn’t have five minutes to wait for an answer. He literally has seconds or sub-seconds to know whether the person in this car is just mom on her way to soccer and she’s late, or the guy who’s wanted for a mass murder somewhere down the road.”
Although data security and availability has been Voreh’s life for the last 20 years in the FBI, she doesn’t think of her focus in terms of the data, but rather about bringing a service to users, while protecting the interests of those who are willing to share their data.
Massive Data Cycle Challenges
“We are inundated with data.” Grubhub has over 8,000 requests for food every minute, while 18 million texts are sent, and 45 million Google searches are performed in the same amount of time, she said. “We literally have so much data going around in our lives,” and that same amount of data is flowing in and out of government as well.
Technology is providing more data and the opportunity to know more than ever before, but making sense of it requires additional layers of technology and data. “This becomes a self-licking ice cream cone. We literally are causing our own data exponential growth, but we can’t stop it — and we shouldn’t stop it.”
Legal, Ethical and Moral Challenges
The Privacy Act for personal information came out in 1974, as did the Federal Education Rights and Privacy Act (FERPA), which protected personal data (name, address, date of birth, social security number) in educational institutions. Later, in the mid-1990s, additional legislation was enacted to protect financial data, and the Federal Trade Commission (FTC) was allowed to start monitoring the use of financial data. Recent legislation such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), has gone beyond protecting a specific kind of data, to protecting the rights of the consumer or resident to control what companies do with personal data.
This same change in focus is also felt in government, where it’s it’s no longer simply a process of protecting certain data fields in a regulated way. Now responsibility extends to protecting that information from misuse, from becoming compromised, and from abusing the rights to use that data.
Voreh said that data practitioners need to seriously consider:
- What is the impact to the person, and to humanity if I use that data?
- Am I setting a precedent?
- Do the ends justify the means?
- Am I doing more good than harm?
One misstep, and customer trust is lost. “In my business, if I lose the trust of the public, my job becomes impossible.”
The Reality of Trust
Voreh shared the following statistics from an article by J. Clement titled Online Privacy — Statistics & Facts:
- 22 percent of online users stated that saving
sensitive data online was not secure enough
- 40 percent of online users stated concern about their online data being misused
- 53 percent of online users are more concerned about their privacy compared to a year prior
- 24 percent of global online users trust their government with the management and protection of their personal information
- 66 percent of online users stated that they were increasingly concerned with their online privacy due to their own government
Voreh said that the last figure scares her: 66 percent are concerned that she is going to misuse their data. “When I see that, I get worried. What do I need to do to give them the trust that I’m going to take care of their data?” Professionals in the private sector and academia share that same concern because a misuse of trust results in lost customers, lost revenue, or fewer students.
Emerging Technology Challenges
Emerging technology has provided unprecedented options to analyze data, exploit data, and use it to make a real difference. In 2001 when the FBI was working on the Enron case, they were working with a couple of gigabytes of data and dedicated 100 agents to it.
With the 2013 Boston Marathon bombing, they were working with terabytes of data. “We were starting to get a lot of closed caption TV, some videos from phones, and we threw some technology and a lot of agents at it.”
By the time of the 2017 Las Vegas shooting, they had 22,000 hours of closed circuit TV, as well as personal videos and were exceeding a petabyte of data. “We had tools and we were trying to leverage some computer vision technology to detect images and people, but again, the technology was there, but we were concerned about using some of it because of the unintended outcomes.”
Panda or Gibbon?
The challenge of technologies such as artificial intelligence (AI), machine learning, and neural networks have potential to put us on the cusp of greatness but they aren’t entirely proven yet. “We can’t explain them to our users, and because of that, we get into unintended outcomes,” which makes these technologies vulnerable to exploitation.
Voreh talked about an experiment using a neural network to identify a photograph of a panda. Researchers changed .04 percent of the pixels in the image, causing the neural network to change its classification from “Panda, with 57.7 percent confidence” to “Gibbon, with 99.3 percent confidence.” To the human eye, the change is undetectable. The picture looks like a panda, but the computer disagrees. “So we can’t remove the human in the loop. We as humans still have to be part of our technology, training it and retraining it.”
Quality data and good Data Management practices are needed to train and retrain algorithms, because new technology trained on bad data produces bad results. “That’s where we start losing the public trust,” she said.
New technology also requires trained people who understand the value of the data, how to handle it appropriately, and how to annotate metadata to document the source, the purpose, and permitted uses for the data. “Without that, this new technology is just fun and games. It doesn’t do anything for our actual mission purposes.”
Data Security vs. Data Availability
Responsibility for data security should no longer be exclusively the responsibility of IT, she said. Protection, access control, privacy, confidentiality, compliance, and risk reduction should be shared. Policies and tools can facilitate this but ultimately, users need to share the responsibility for data.
On the other hand, Voreh said that data practitioners and people in IT often hear these demands from users:
- I need to have more and better data so I can do
- I need to be able to make that visualization right now
- I need the data in a format I can use. Don’t make me have to do conversions of date and time or figure out which field is the last name
- I have to have accessibility on my home computer, when I’m not at work, on the plane
- I need to have standards so that I can do all that instantly
“It’s a balance. I cannot have one without the other. I cannot focus so much on the attributes of data security, that I ignore the availability,” she said.
The Solution: Data Governance and Data Management
- Understand what data needs to be managed and when
- Have good programs and practices that provide guardrails and boundaries for users
- Provide a governance process that applies a framework for Data Management, allowing open and inclusive decision-making
The first instinct is to simply hire a bunch of scientists, but if the quality of the data isn’t there and the data is not available to them, the desired result is likely to be a bunch of high-priced scientists and no insight, she said.
“I could buy solutions, but without a concept of what I’m managing and why, I’m just putting that cart before the horse.”
Governance Connects Horse and Cart
First showing a cartoon of a cart (data solutions) pulling a horse (Data Management), Voreh then reversed the two, putting the horse first with the shaft connecting horse and cart labeled “Governance.” “What we really need to do is think about what our organization’s need is for Data Management,” in light of the business goals for the next five years.
A plan aligned with the business is key, but because each organization is unique, “You cannot borrow a Data Management plan from anybody else.” A Data Management strategy has to be customized, and governance will tie it to your solutions.
Voreh said that when stakeholders are included in the development of Governance, Data Management practices are democratic, and governance is accepted by the culture in the wider organization. “Otherwise, you’re making up rules in a bureaucratic way, and nobody’s going to follow them.”
How to Grow Data Governance
Voreh achieved success with governance at the FBI by focusing on the mission and vision of the organization, and suggested considering three questions during the process of growing a Data Governance program:
- What are we as an organization?
- What is our mission?
- Where is the vision that the director wants us to achieve in five years?
From there, a Data Strategy can be used to prioritize Data Management practices, and advance those that most align with the mission. Start with the most critical data in your organization and apply quality standards, then address compliance and oversight. “Putting a policy out that doesn’t have an effective oversight and compliance is like writing a memo nobody will read.” In the process of adopting these policies, she said, involve the business people and other users of the data.
Facilitate with Openness
For what is usually called “issue management,” Voreh prefers to use the term, “openness” to refer to the process of listening and addressing user concerns. “My job is not to dictate, my job is to facilitate.” It’s important to be able to accept problems as they are defined by the user and guide them toward a solution they own.
Last year the FBI combined several teams to jointly address issues around people, process, capability, and data: “It has been a huge success,” she said. The people on the board include data scientists, analysts, data stewards, along with front line users, legal, policy people, agents, and intelligence analysts. Discussing problems from a 360-degree view allows decisions to be made using the capability of the people maintaining data, collecting it, using it, and providing IT services.
Voreh said it’s easy to feel like all the problems have to be solved at once. “You can’t, unless you have a lot of money and a lot of staff. Take the baby steps, work on your foundation, and let it grow to help you.”
Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.
Here is the video from the DG Vision presentation:
Image used under license from Shutterstock.com