Obfuscate your production-database

1 minuut gelezen

In order to catch errors as early as possible, you should have a set of test-data that resembles the data that you'll encounter in the production-environment as closely as possible. If you use a set of only 15 records to test something that will eventually contain thousands of records, you'll probably spend more time doing rework and bug-fixing than is necessary.

The best place to find relevant test-data is the production-database of similar systems that the company is running. Downside to this is the fact that organizations tend to be protective of this sort of data.

For testing purposes, the actual meaning that the data has, is not important. We're more interested in aspects like volume, distribution, special characters and so on.

To sort of obfuscate a set of data, I've created a small tool to remove the “meaning” of the data while still preserving other aspects. It randomly replaces characters in fields; but instead of just creating random characters, it replaces numeric characters with another numeric character, a vowel with another vowel and a consonant with another consonant.

When applied to authors-table in the pubs-database, this list:

409-56-7008 Bennet Abraham
648-92-1872 Blotchet-Halls Reginald
238-95-7766 Carson Cheryl
722-51-5454 DeFrance Michel
712-45-1867 del Castillo Innes
427-17-2319 Dull Ann
213-46-8915 Green Marjorie
527-72-3246 Greene Morningstar
472-27-2349 Gringlesby Burt
846-92-7186 Hunter Sheryl
756-30-7391 Karsen Livia
486-29-1786 Locksley Charlene
724-80-9391 MacFeather Stearns
893-72-1158 McBadden Heather
267-41-2394 O'Leary Michael
807-91-6654 Panteley Sylvia
998-72-3567 Ringer Albert
899-46-2035 Ringer Anne
341-22-1782 Smith Meander
274-80-9391 Straight Dean
724-08-9931 Stringer Dirk
172-32-1176 White Johnson
672-71-3249 Yokomoto Akiko

.. is changed into:

409-56-7008 Binnet Abrehin
648-92-1872 Ggitched-Kuysq Xugifast
238-95-7766 Zusbib Rgezym
722-51-5454 DeHzujre Mogzem
712-45-1867 sel Cesforvi Amsej
427-17-2319 Nuvq Adm
213-46-8915 Vreax Cuppevie
527-72-3246 Gpeome Meryegfknax
472-27-2349 Mbofhmospy Godt
846-92-7186 Podnef Sserhl
756-30-7391 Zawsan Covaa
486-29-1786 Diykxzeh Zhilloza
724-80-9391 MevViithed Wkoermx
893-72-1158 MdBibqun Houmhar
267-41-2394 E'Loirf Wowdeul
807-91-6654 Pinmaney Xylvia
998-72-3567 Rowrer Albirx
899-46-2035 Tekgun Ahnu
341-22-1782 Mmith Deokric
274-80-9391 Stweungt Muan
724-08-9931 Ptsiggis Ximk
172-32-1176 Whibi Joxnsow
672-71-3249 Wobucoba Aqola

The text is still legible but it is impossible hard to relate it to the source. It is possible to control how different the result is from the orignal by controlling the “opacity”. By replacing only a percentage of the characters, the original values are blended with the random data.

  private char GetRandomCharacter(char CharacterIn, int Opacity)
  {
   int SkipCharacterDraw = randomGenerator.Next(101);
   if (SkipCharacterDraw > Opacity)
   {
    foreach(ArrayList alCharSet in CharSets)
    {
     if(alCharSet.IndexOf(CharacterIn) >= 0)
     {
      int positie = randomGenerator.Next(alCharSet.Count);
      return (char)alCharSet[positie];
     }
    }
   }
   
   return CharacterIn;
  }

 Drop me an email if you want to receive the entire project.

Categorieën: ,

Bijgewerkt: