Machine Learning, Data Governance, and Data Ethics

Data ethics — the collection and use of data in an ethical manner — can be combined with Data Governance and machine learning. Data ethics places a strong emphasis on the behavior of people or organizations as they gather or distribute data.

Laws become necessary when some people chose to ignore ethical behavior.

At its most basic level, ethical behavior involves choosing the best course of action for an individual to support another person or persons, the larger culture, or the world, preferably with a minimum of damage to the individual. The most common ethical concerns stress not lying to people and not stealing.

Because use of the internet has advanced so quickly, cultures have not yet developed ethics and norms for interacting and doing business on the internet. While many internet businesses take the big picture perspective, using honesty to promote long-term business relationships, others choose the short-term approach.

The unethical use of data, particularly the abuse of people’s personal data, has led to the creation of laws such as Europe’s GDPR, Brazil’s LGDP, and California’s CCPA. Data ethics (originally called “big data ethics”) is based on six principles:

Ownership: A person owns their own personal data
Consent: Informed and explicitly expressed consent is needed for an individual, business, or legal entity to use someone’s personal data
Transaction Transparency: When personal data is used, the owner should have transparent and easy access to the algorithm used in generating aggregate data
Privacy: If data is collected, reasonable efforts should be made to preserve the individual’s privacy
Currency: People should be made aware of the profits gained by the use of their personal data
Openness: Aggregate data should be free and easily available

The unethical take advantage of the internet’s “wild, wild West” scenario, using personal information to steal people’s data, or selling people’s personal information on the dark web.

Another example of unethical data practices involves banks using algorithms that have built-in prejudices against BIPOC citizens. The prejudice is based on home location; the algorithm is prejudiced against people living in specific neighborhoods. If you live in a certain neighborhood, algorithmic bias will make it harder for you to get a loan.

The Toronto Declaration

The Toronto Declaration promotes data ethics for machine learning practitioners and various governing bodies. It is primarily concerned with non-discrimination and individual privacy.

The Toronto Declaration suggests all ML (machine learning) practitioners should be educated on the potential risks to human rights.

Machine learning engineers should start their work/projects with an awareness of human rights. The declaration includes as human rights:

“The right to privacy and data protection, the right to freedom of expression and association, to participation in cultural life, equality before the law, and access to effective remedy.”

Data Governance

When ethics is applied to data, the appropriate automated behaviors should be stored in, and applied by, the Data Governance program.

A Data Governance program carries out the instructions it’s been given, focusing on organizing data with the standards, procedures, and policies that have been developed and downloaded into the computer’s memory, and accessed by the program. While Data Governance has traditionally focused on data optimization and data-driven insights, in 2019 it began to include policies protecting people’s personal information, and has advanced to include a variety of legal policies and regulations from around the world.

A computer system’s Data Governance program is ideal for housing a data ethics subprogram.

Data Ethics and Machine Learning

Machine learning is based on algorithms, and because algorithms are currently written by humans, human bias may creep in, intentionally or unintentionally. These biased algorithms can do serious damage to people, ranging from denying home loans to blocking inmates from parole because of the color of their skin.

Machine learning systems can be trained, through feedback, to include biases when certain features and characteristics are given preference. Machine learning can stop someone from getting a job, based on facial expressions during a job interview.

Algorithms for machine learning can be designed to learn from a users’ feedback. If managers consistently hire white male candidates for certain jobs, the algorithm learns, and adjusts the criteria to offer job listings only to white males (not based on being white, but by the characteristics common to whites.)

The algorithm learns that providing job listings to people with certain characteristics is more often considered “correct.”

On the other hand, machine learning that has been trained with supervised learning (as opposed to algorithms that learn from on-the-job feedback) can be used to detect and stop biases. Using human evaluators (people representing those affected by the algorithms) should be part of the algorithm-building process, and can help in creating unbiased algorithms.

Data Ethics Frameworks

A data ethics framework is a set of ethical principles that have been created to appropriately guide the use of data.

The data ethics framework helps to guide individuals in using data responsibly while they work for the government and the wider public sector. It can also be adapted for use by private businesses. It helps public servants, private sector managers, and business owners to understand ethical considerations.

The United Kingdom’s Department for Digital, Culture, Media & Sport has developed a data ethics framework that works with Data Governance, and provides a model for other governments and businesses.

The UK’s model is essentially a guide for human behavior and can be adjusted to fit a variety of public and private organizations.

Data ethic frameworks are typically based on three principles: transparency, accountability, and fairness. These principles, supported by five specific actions, provide a process for minimizing biases. The five actions are:

Define the goal or benefit
Use diverse teams to minimize bias (evaluators may be part-time contractors)
Comply with the laws
Assure the data is accurate
Evaluate the impact from a big-picture perspective

The Benefits of Data Ethics

As a general rule, people are understandably cautious when doing business with a website for the first time. Without regulated codes of ethics, businesses can profit from their customers in ways that are unethical, but not illegal.

Businesses that are ethical will develop good reputations and a strong repeat customer base. A good foundation of ethical behavior produces three important business benefits:

Trust: The use of a data ethics model helps to gain and maintain the trust of customers. It helps to build loyalty and enhances brand value.
Fair Practices: Unintended bias can surprise you. (It can sneak in from anywhere, negatively impacting business decisions and the organization’s reputation.) Businesses using data ethics principles can consistently demonstrate fairness in their decision-making.
Data Privacy Compliance: Complying with existing data privacy regulations (GDPR, CCPA) is an extension of data ethics. These should not be difficult to comply with when a data ethics framework is already in place.

Data Ethics and Synergy

Currently, the word “synergy” can be defined as a combination of things that produces a result “greater than the sum of its parts.”

In the 1930s, an anthropologist named Ruth Benedict discovered a system of behaviors in some cultures that supported multiple positive effects from a single deed, or action. She used the word synergy to describe the behaviors in these cultures.

Good examples of “modern” synergy are apps development teams that have worked together for a while, and use Agile and DevOps philosophies. In this case, the team works synergistically to complete the project. These philosophies support workplace synergy with good communications, multiple skill sets, and positive relationships.

Within the apps development team, a single action will have multiple positive consequences.

The individuals in these teams are comfortable communicating with each other and sharing ideas. While there might be one cranky member of the team (no one is perfect), the value of that person’s skills outweigh the impact of their crankiness. The synergistic behavior of these teams enhances productivity and speeds up the production of quality apps. The Agile principles used by these apps teams are:

Data projects should be built around motivated individuals. Give the support and environment they need, and allow them to complete the job without interference.
The most efficient way of communicating information to team members is with face-to-face conversations. Zoom resolves this issue for remote workers.
The team (or individuals) should have a focus on excellence and quality.
There should also be a focus on the work “not yet done,” and on “necessary steps.”
The DevOps philosophy adds using automation whenever possible as a way to minimize errors (automation also takes care of tedious tasks).
To take the process a little further, in terms of data ethics (and cultural synergy), two questions can be asked: Who will benefit from this project and who will be damaged by this project?

Applying synergy to data ethics, prior to the data gathering process, involves asking these questions:

Will anyone (besides the competition) be damaged by the collection of this information? (Often the answer will be “no one,” but the question is worth asking).
Who can benefit from this collection of data? (Probably your company. But can anyone else, besides your competitors, benefit from sharing the gathered data?)
Partners
Customers
Government agencies
Nonprofits

After the fact, there is no reason this “sharing” cannot be advertised for promotional reasons.

Can the data be used for multiple projects? (Consider the value of data that can be used to resolve two to six issues).
In assigning the data gathering project, the question should be asked, “Who is capable, and has a genuine interest in this project?” Which is not the same as, “Who has the most experience?” Experience counts, but so does enthusiasm. If an outside contractor is being hired for the project, these same questions apply.
Are they focusing on the work not yet done, and necessary steps?

Image used under license from Shutterstock.com

TRAIN TO GET CERTIFIED AS A DATA QUALITY SPECIALIST

Data Topics