One of the complaints people sometimes have about regulations is that they get in the way of doing business, even when those regulations are meant to put more power in the hands of government to protect the general public. It’s not often (enough) that regulations literally put that power in the hands of consumers themselves.

The European Privacy regulations are amazing things. If these laws existed in the USA, you could sign up for a credit card or some other service and not suddenly get come-ons for magazine subscriptions or piles of junk mail. Those laws state, in a nutshell, that anybody gathering data on European nationals must follow certain rules:

  1. The data elements to be gathered must be specified up front
  2. Only those elements can be gathered
  3. The data can only be used for a purpose that must also be stated up front
  4. The data can only be accessed by people who must also be specified up front
  5. Once the purpose has been fulfilled, the data must be destroyed

If you say, “Well, we’re an American company, we don’t care about data regulations governing European companies,” you’re not thinking that extra ten feet. Non-American companies who want to do business with American organizations still end up caring about Sarbanes-Oxley and HIPAA and other American regulations in many cases, because otherwise, their American partners tell them, “Our auditors won’t let us work with you.” It works that way in reverse as well.

These laws have been around a while. There are other aspects to EU laws, such as “the data of certain nationals must reside in servers within the borders of the country of those nationals, and only DBAs from that country can manage it.” I once traveled to Germany to argue in front of German officials that data on German nationals could be managed by DBAs in another country but only actually accessed by Germans. I clearly demonstrated Segregation of Duties, encryption, and masking that would guarantee that this scenario could be securely carried out. A panel of German lawyers and IT guys agreed with me. And then the government said, that’s great but it ain’t gonna happen. Just because.

So those are the mandated best practices. The evolution of this is GDPR, the General Data Protection Regulation. GDPR ups the ante on the Privacy Laws by declaring that citizens own their own data and how it can be acted upon. The data includes information on customers, employees, partners, contractors, and so on. It can also include genetic/biological data, social media, and pretty much anything personally identifiable.

European companies are taking GDPR, which officially kicks off in the middle of 2018, very seriously. The fines are massive. If you simply process data on behalf of a company that actually holds it, and there’s a breach, there’s no safe harbor. Everybody is liable.

On the admin side, GDPR calls for Data Protection Officers (DPOs) who are responsible for deploying encryption, pseduonymization (look that one up), Segregation of Duties, and other back-end security coverage (interestingly enough, encryption is strongly recommended by not mandated, which to me is daffy).

On the other side, GDPR grants citizens provenance over their own data. Let’s start with this: you can’t control all your data. You aren’t allowed to wipe your financial records and hop the border, for example. So the process starts with data classification, in which an institution figures out which aspects of the data they hold is eligible to be governed by end users. Then they must allow those users to:

  1. Review that data
  2. Rectify the data when it’s got boo-boos
  3. Request copies of the data for download (eg. PDF, Excel, etc.)
  4. Govern the data of their children / dependents
  5. Approve (or not) the use or processing of that data (eg. for inclusion in analysis or polling)
  6. Request deletion of that data

That last bit is a doozy, because data hosts are required to get rid of that data even on backups. Good luck enforcing that one.

Big consulting firms are pushing the value of their GDPR practices. Their “solution” is to spend many, many months measuring the drapes, looking under the carpet, and finding any sensitive data that might be eligible for GDPR governance, then suggesting remediation. This is as cheap as you might imagine. The various software companies offering their own “solutions” are typically selling data-finding tools, as well as scanners that will attempt to find the holes through which sensitive data might slip. In fact, existing data scanning tools are being re-marketed as GDPR tools. “Hey, look, I can sell my screwdriver as a chisel!”

My own organization has designed a more comprehensive approach to solving the problem of GDPR. And no kidding, it’s a problem, because this is not a trivial compliance regulation to support. We’ve tried to take into account all the administrative and end user requirements, and satisfy these with the best possible processes and user experiences, and this is not a trivial footprint.

Data classification is still where you start. You can’t protect it until you know what it is and where it’s at. I may have GDPR-sensitive data in my directories, databases, app databases, and elsewhere. Gotta find it.

Then as a DPO I need to apply the required (and just plain recommended) security layers to that data.

The fun is just starting. I need to aggregate, centralize, or otherwise serve up those data elements in a form that can be reviewed, rectified, and downloaded by those people the data pertains to. Maybe I give them access to the various silos. Ouch. Because maybe next year I have more data, in more silos, meaning more access vectors.

Or … I can pull that data into a lake, such as Hadoop. Or … I create a bi-directional, virtual layer of that data. When it’s acted upon, any changes get bubbled back to the source. Or … I use magic fairy dust. In fact, fairy dust is my preferred method, but my supplier offers no bulk discounts.

For when the auditors come calling, for knowing when to pull in resources, to simply have a good grasp of my compliance posture, I need to have a dashboard of that posture. Which databases are encrypted? Which databases have been pseudonymized (again, look it up). Which databases have been masked (redacted)? How many users have provided consent or requested deletion? Where do I stand?

Next, I must provide a viable interface for users. This is also not trivial. If those citizen data owners are customers, they likely already have an account, meaning a user id, meaning a way to authenticate to my site. For new users, I can (and should) display my policy for data management, and give them the ability to opt it. If they don’t like my policy, they may have to take their business elsewhere.

But if I hold the data of people who do not automatically have an account, I need to let them register, to create such an account. And then I have to let them claim their data. This might mean identity-proofing, i.e. require them to provide enough evidence to attach their name to that data. Your name is John Smith? Well, are you that John Smith, or another John Smith? Which John Smith are you, and which John Smith’s data do you now control?

Finally, the interface must allow users to view, correct, and download their data. If must allow them to provide consent for processing. This also gets hairy. “You can use my financial data, but you can’t touch my social media data.”

Don’t forget about deletion or, as they say, the right to be forgotten. Receive the requests and act on them. The more you can automate, the better. You could set up a workflow for request approvals, but think in advance about the possible volume. Seriously, automation is a good thing.

Clearly, GDPR is not a simple thing. It requires process, policy, and some moving pieces. You say, nuts, why bother, I’ll never be able to do all that. But it’s like any other major project with a lot of pieces. You pick your critical apps, your critical data sources, and you put those online first. Get them compliant. Pick your targets. You’ll learn your lessons, streamline your process, and eventually cover everything that would otherwise get you in hot water with the auditors. And it’s fine to get some consulting help to point you in the right direction, but don’t pay millions to somebody who’s going to say, “Look, there’s your data, and you should secure it this way.” Get some guidance, maybe, but then own it.

The users own the data. But as the host, you own the process, and the liability. And oh, the clock’s ticking. Just a little more than a year to go. Get moving. And bring a sandwich.