06 December 2018

Fogger: Open-source GDPR-friendly data masking tool

Tomasz Surowiec

Senior PHP Developer

Back to all blogposts

5 minutes read

Contents:

Back to the start
1. You must secure data
2. Data masking with Fogger
3. Subsetting and excluding tables
4. Reasons to try Fogger

Share the article with your friends:

As a software developer, you like to focus on software development. Unfortunately, nowadays, you also need to struggle with a bunch of data privacy-related stuff – even in the staging environment. It would be nice to automate some tasks with a free tool for data masking, right? There’s plenty of dynamic data and static data masking techniques. You can set a variety of masking rules on production environments to prevent real data from unauthorized access. No matter what masking tools or ways to test data across multiple environments you choose – secure data should be one of your top priorities. We did our best to prepare a tool that can make your databases masked and well – here it is. Read the story about our search for the best data masking tool.

Problems with masking sensitive data

When the new application is being developed, we, developers, need some set of production data to work with. Usually, it’s done through fixtures – randomly generated data trying to mimic the real world. But then the application is deployed to production environment and it turns out that the real-life data isn’t so pure and simple. It would be nice if we could work with the real deal, but the sensitive data cannot be exposed.

Real-life users can create things that no developer has ever dreamt of. We need a copy of this data in our development environment.

But that’s not all! We have a new kid on the block, a hot topic (at least here, in Europe): the General Data Protection Regulation or GDPR. Now, you cannot simply get the data, put it in your development machine and play with it. You need to make sure that no sensitive information (like names, emails, credit card numbers) is compromised. Security is now more important than ever.

There are a plethora of tools which can help you with that – usually tailored to big corporate projects – but the most common solution among startups and SMEs is a custom-made export script. Such a script masks the data in the database, replacing sensitive information with safe, randomly-generated substitutions. But developing it is problematic – it requires time and effort (therefore, money), it’s cumbersome, it’s boring, it’s prone to errors. And when the schema changes, the script needs to be updated. There must be a better to go about data masking and data privacy in general, a more flexible and adjustable data tool.

Improve data masking with Fogger

To identify sensitive data is one but masking real data is something different. We’ve struggled with writing export scripts for the purposes of data security and data masking at The Software House for quite a while. But, finally, we’ve said: “no more, let’s prepare a generic solution”. A data tool that would be able to mask any schema with just a little configuration. And that’s how Fogger was born.

How does the tool like this work? What makes it so neat?

Fogger starts with analysing your database schema and prepares for you a configuration file that looks like this:

This is basically a list of all the tables and columns with masking strategy definitions. As you can see, the latter is blank for now – you need to fill in desired masking strategies next to columns containing sensitive data. For example, this line would replace all the emails with random ones (using example.com or similar domains):

What’s more, Fogger will read metadata from column comments. So, for example, if you put fogger::faker{method: “safeEmail”} in column’s comment during the development, the boilerplate will already have the strategy filled in. This way, you can define how to mask your data in the future, when the time comes, from the beginning of the development process.

The available masking strategies are starify, hashify, and faker. The last one is especially great, as it uses the powerful fzanionotto/Faker library with all its methods.

Masking the data with Fogger is done in a consistent manner. For example, when a random value is being saved in place of a real email address, it’s kept in cache for future references. Therefore, if during the process of masking Fogger finds somewhere (be it the same table or not) the same email again, it’ll be replaced with the same substitution. And when a column being masked is a part of a foreign key constraint, all the other columns that are part of the constraint will be masked too.

🤔 Your business needs more features than Fogger has?

As a custom software development company, we’re here to help you. Schedule a free consultation with our experts.

Describe your project

Subsetting and excluding tables

In addition to data masking, you can define subsetting strategies for tables. If, for example, a table has millions of records and you’re interested only in a few thousand rows, you can achieve this with subsetting the table with one of the available strategies: head, tail and range. Head and tail will give you records from the beginning or the end of the table respectively. Range will let you filter the table by any column values (e.g. date columns to get only rows from October to December).

Last but not least, you can exclude whole tables. If your database contains tables with data that you don’t need – for example, log tables – you can exclude them. Fogger will copy the table’s schema, but not the data.

Usually, when subsetting and excluding, you can easily corrupt your database by removing entries that are referred to in other tables through foreign key constraints. But don’t worry – Fogger will refine the database at the end and put the constraints back in place, so the resulting database will be clean and consistent.

The data masking tool Fogger is not a standalone, do-everything tool. It needs to be run with access to Redis, as Redis provides it with the cache that was already mentioned and with queueing (on Redis lists). The queueing is necessary to process chunks of a database using workers. It enables horizontal scaling, as you can run multiple workers parallelly.

Fogger is designed to run as a Docker container. You can easily integrate it with your infrastructure or run it separately, providing it with access to a database (or a dump of it).

Data masking tools can help. Now, it’s your turn to start data masking with Fogger!

There is plenty of different data masking tools available. Some are available for free trial period, the other masking solutions may require buying some license (with different pricing). We know that masking sensitive data in databases is a real problem, experienced by many of you out there. Because of that, we’ve decided to share Fogger with the world as an open-source project. This way, everyone can benefit from it – including you. And we hope that you can help us make it better by contributing to Fogger on GitHub. You can also check out our other open-source projects such as Kakunin or Babelsheet.

Do you feel that Fogger would be a perfect data security solution to mask sensitive data in your project – but you need even more data masking features? At The Software House, we have a team of very talented PHP and Node.js developers who will gladly help you with data security issues. In order to receive free consultation, all you need to do is to fill in the contact form. We’re waiting for your message!

💡 Read more

Tomasz Surowiec

Senior PHP Developer

The true Senior PHP Developer, programming since 1879 or so. The older, the geekier. Besides his beloved Symfony framework, he's also interested in crypto.