Click to learn more about author John Murray.
Now that privacy-enhancing technologies (PETs) have become a subject of dinner table conversations, our research team continues to field questions on these complex topics which can be difficult to explain. As part of this series, I will attempt to explain another PET, K-anonymization, in a first-grade context:
WANT TO STAY IN THE KNOW?
Get our weekly newsletter in your inbox with the latest Data Management articles, webinars, events, online courses, and more.
Since it’s Friday, your teacher has decided to let your class play Heads Up Seven Up. For those unaware, everyone starts by putting their heads down, closing their eyes, and putting their thumbs up. The teacher picks who gets to be it, who then chooses six other people by walking around and touching their thumbs. Then it and the six people they picked walk to the front of the class, and everyone else puts their heads up. The class then tries to guess who was it from the seven people at the front.
Last time you played, Kyle was winning way more than usual. Eventually, you noticed that Kyle would always sneak a peek at your shoes and pants when you walked by. Cheater! Fortunately, all of your other friends noticed as well, and together you have devised a plan. You noticed most kids wear similar shoes and pants, but a few kids have rare combinations of shoes and pants. If you tell everyone wearing rare combinations of shoes and pants to play another game, everyone left would have common combinations of shoes and pants. Some kids don’t like this plan since they all want to play. So, the class agrees (not Kyle) to only wear clothing that at least six other kids are wearing. Still some kids just can’t change their clothes enough to not be unique so they unfortunately will still be not playing this game. This makes Kyle’s peaking useless since the shoes and pants combinations are no longer tied to just one person. You call this plan Kyle-anonymization.
For everyone over the age of five, K-anonymization is a useful technique to hide among crowds by making it difficult to use indirect identifiers to identify an individual’s record. The K in K-anonymization is the minimum count any combination of values must appear in the dataset to not be redacted. Say you have a dataset of results from the Heads Up Seven Up games. You track data on the it person like their shoe color, pants color, and if the class correctly guessed them. If you K-anonymize shoe and pants color with a K of seven, if the combination blue shoes and black pants appears less than seven times, the colors would be redacted. If that combination appears seven or more times, it would appear unchanged. For both cases however, the result of if they were picked would always show up.
Our research team is continuously tracking the latest mathematical techniques for privacy-enhancing technologies. You can reach out to us with questions whether you’re a data scientist, compliance professional, passer-by, or anyone in Kyle’s class.