Fogger: Open-source tool for GDPR-friendly data masking

3 min

read

As a software developer, you like to focus on software development. Unfortunately, nowadays, you also need to struggle with a bunch of data privacy-related stuff – even in the staging environment. It would be nice to automate some tasks with a free tool for data masking, right? Well, here it is.

Problems with masking sensitive data

When the new application is being developed, we, developers, need some set of data to work with. Usually, it’s done through fixtures – randomly generated data trying to mimic the real world. But then the application is deployed to production and it turns out that the real-life data isn’t so pure and simple.

Real-life users can create things that no developer has ever dreamt of. We need a copy of this data in our development environment.

But that’s not all! We have a new kid on the block, a hot topic (at least here, in Europe): the General Data Protection Regulation or GDPR. Now, you cannot simply get the data, put it in your development machine and play with it. You need to make sure that no sensitive information (like names, emails, credit card numbers) is compromised.

There are a plethora of tools which can help you with that – usually tailored to big corporate projects – but the most common solution among startups and SMEs is a custom-made export script. Such a script masks the data in the database, replacing sensitive information with safe, randomly-generated substitutions. But developing it is problematic – it requires time and effort (therefore, money), it’s cumbersome, it’s boring, it’s prone to errors. And when the schema changes, the script needs to be updated.

Improve data masking with Fogger

We’ve struggled with writing export scripts at The Software House for quite a while. But, finally, we’ve said: “no more, let’s prepare a generic solution”. A tool that would be able to mask any schema with just a little configuration. And that’s how Fogger was born.

How does it work? What makes it so neat?

Fogger starts with analysing your database schema and prepares for you a configuration file that looks like this:

This is basically a list of all the tables and columns with masking strategy definitions. As you can see, the latter is blank for now – you need to fill in desired masking strategies next to columns containing sensitive data. For example, this line would replace all the emails with random ones (using example.com or similar domains):

What’s more, Fogger will read metadata from column comments. So, for example, if you put fogger::faker{method: “safeEmail”} in column’s comment during the development, the boilerplate will already have the strategy filled in. This way, you can define how to mask your data in the future, when the time comes, from the beginning of the development process.

The available masking strategies are starify, hashify and faker. The last one is especially great, as it uses the powerful fzaninotto/Faker library with all its methods.

Masking the data with Fogger is done in a consistent manner. For example, when a random value is being saved in place of a real email address, it’s kept in cache for future references. Therefore, if during the process of masking Fogger finds somewhere (be it the same table or not) the same email again, it’ll be replaced with the same substitution. And when a column being masked is a part of a foreign key constraint, all the other columns that are part of the constraint will be masked too.

See also: Learn the basics of cryptography in one go

Subsetting and excluding tables

In addition to data masking, you can define subsetting strategies for tables. If, for example, a table has millions of records and you’re interested only in a few thousand rows, you can achieve this with subsetting the table with one of the available strategies: head, tail and range. Head and tail will give you records from the beginning or the end of the table respectively. Range will let you filter the table by any column values (e.g. date columns to get only rows from October to December).

Last but not least, you can exclude whole tables. If your database contains tables with data that you don’t need – for example, log tables – you can exclude them. Fogger will copy the table’s schema, but not the data.

Usually, when subsetting and excluding, you can easily corrupt your database by removing entries that are referred to in other tables through foreign key constraints. But don’t worry – Fogger will refine the database at the end and put the constraints back in place, so the resulting database will be clean and consistent.

Fogger is not a standalone, do-everything tool. It needs to be run with access to Redis, as Redis provides it with the cache that was already mentioned and with queueing (on Redis lists). The queueing is necessary to process chunks of a database using workers. It enables horizontal scaling, as you can run multiple workers parallelly.

Fogger is designed to run as a Docker container. You can easily integrate it with your infrastructure or run it separately, providing it with access to a database (or a dump of it).

Now, it’s your turn to start data masking with Fogger!

We know that masking sensitive data in databases is a real problem, experienced by many of you out there. Because of that, we’ve decided to share Fogger with the world as an open-source project. This way, everyone can benefit from it – including you. And we hope that you can help us make it better by contributing to Fogger on GitHub. You can also check out our other open-source projects such as Kakunin or Babelsheet.

Do you feel that Fogger would be a perfect fit for your project – but you need even more features? At The Software House, we have a team of very talented PHP developers who will gladly help you. In order to receive free consultation, all you need to do is to fill in the contact form. We’re waiting for your message!

Estimate your project





or contact us directly at [email protected]

Thanks

Thank you!

Your message has been sent. We’ll get back to you in 24 hours.

Back to page
24h

We’ll get back to you in 24 hours

to address your needs as quick as possible.

Estimation

We’ll prepare an estimation of the project

describing the team compostition, timeline and costs.

Code review

We’ll perform a free code review

if you already have an existing system or a part of it.

Our work was featured in:

Tech Crunch
Forbes
Business Insider

Aplikujesz do

The Software House

Aplikuj teraz

wyślij CV na adres: [email protected]

CopiedTekst skopiowany!

Nie zapomnij dodać klauzuli:

Kopiuj do schowka Copy

Jakie będą kolejne kroki?

Phone

Rozmowa telefoniczna

Krótka rozmowa o twoim doświadczeniu,
umiejętnościach i oczekiwaniach.

Test task

Zadanie testowe

Praktyczne zadanie sprawdzające dokładnie
poziom twoich umiejętności.

Meeting

Spotkanie w biurze

Rozmowa w biurze The Software House,
pozwalająca nam się lepiej poznać.

Response 200

Response 200

Ostateczna odpowiedź i propozycja
finansowa (w ciągu kilku dni od spotkania).

spinner