Database management and the right to be forgotten

Digital transformation has given businesses greater access to consumer data than ever before. For example, companies encouraging loyalty program sign-ups and newsletter lists — and great database management — can create a 360-degree view of how their customers interact with their brands. They can use insights from this data to guide product improvements, personalize service, and overall create better customer experiences.

Collecting consumer data looks like a win-win, but there is controversy surrounding it: Who owns the personally identifiable information (PII)?

The General Data Protection Regulation (GDPR) has made it clear that in the EU, PII belongs to the consumer. The GDPR allows consumers to control how their data is used and, if they wish, to have it erased, giving them “the right to be forgotten”. Great for consumers, but a potential nightmare for DBAs.

How to respect the right to be forgotten

AT Postgres Vision 2020Dr. Michael Stonebraker, an MIT professor and original creator of Postgres, shared his insights on how to tackle the issue of consumers’ right to be forgotten.

“In my opinion, the easiest way to deal with the right to be forgotten is to think of it as a database design issue,” says Stonebraker. PII can be found anywhere in a business, and finding and deleting it can be a challenge. “And the minute I add something, I change where you have to look to remove things. So if you just force a clean entity relationship (ER) schema, you can make this problem much easier.

He suggests using an ER designer tool to build a diagram of your data and automatically map to a set of 3NF tables in the database. For example, a company’s system might have an employee entity that contains data such as employee names, salaries, and ages, a department entity that contains department names and the department floor , and a “works_in” relation that can show that an employee can work in one or more departments. This data is mapped into the corresponding tables.

For the right to be forgotten, enforce surrogate keys for entities, disallow user access to surrogate keys, and disallow materialized views or copies. This allows you to delete an entity by deleting it from the entity table. It gets rid of all personal data – it’s now inaccessible because the mapping to surrogate keys is gone.

“As long as you have a clean schema, deleting PII data is simple,” he says.

Database Management Obstacles on the Road to Compliance

Stonebraker points out, however, that “real-world database administrators often build poor schemas for performance reasons.” They may want to speed up queries, but that can make the right to be forgotten difficult. He comments that clean schemes are always a good idea, and complying with the right to be forgotten part of the GDPR or other consumer data protection regulations could be a way to impose them on your users.

He adds that when applications, and subsequently schemas, need to change, you can lose a clean schema in an effort to facilitate application maintenance. Again, the right to be forgotten is much more difficult to grant.

If you can remove the PII from your databases, Stonebraker comments that if you’re a Postgres expert, you know the data is still in the log, however. It says to truly remove all PII, you need to update the log. “It’s a really dangerous thing to do. I don’t recommend it at all,” he says. “You have to trust the sysadmins not to leak the log.”

Also, he points out, you probably have offsite copies for disaster recovery, so you need to consider how you’re going to handle the removal of PII from onsite and offsite disaster recovery data.

The most dangerous problem related to the right to be forgotten

He says these are not the most dangerous issues for database management regarding the right to be forgotten. Stonebraker explains that many business units store data and share it when needed to facilitate their work. For example, a customer data unit may share PII with another supplier data unit, which writes information to its database. There is now a fresh copy of PII.

“This is a typical tactic in the legacy, siled world of data, where copies of data reside all over the enterprise,” he says.

Prohibiting apps from reading PII data is probably not practical, so you need to log in every time they read or write data. This requires the application sandbox. If you need to delete someone’s personal information, look for it everywhere.

“It’s more complicated than you think,” he said. You need to consider transformations, such as John Smith and J. Smith. A user can write data to a lookup table and then copy the data from it. Moreover, if data can be written on the screen, nothing prevents a user from writing it down and sending it to someone by e-mail. “At some level you have to trust users, otherwise you can’t let them access the data,” he says.

It’s not just a GDPR issue

Since the EU began enforcing GDPR in 2018, regulated businesses have been looking for the most effective and efficient ways to comply. Additionally, lawmakers are passing new regulations, such as the California Consumer Privacy Act (CCPA), that grant individuals the right to erase their PII, making it a more widespread database management challenge.

If you or your customers are not currently affected by these laws, you likely will be in the future. The debate over the ownership of PII comes up repeatedly on the side of the consumer. If your application uses consumer data, it’s time to consider how you will adapt the database management to comply.

Maria H. Underwood