Synthetic data generation

The BizDataX Designer provides a way to generate synthetic data to cover scenarios where production data may not available. How to do that is described on this page.

Table of contents
Prerequisites
Handler configuration
Default values
Mandatory fields handling
Masking rules modification
Result
Insert settings
Synthetic Data Generation Demo Video

Prerequisites

  • BizDataX Package (Visual Studio project) is created
  • appropriate NuGet packages are installed
  • data from BizDataX Portal is imported and Visual Studio project is built

Handler configuration

To insert new records in a table, replace the default .Handle.WithBulk() part of the Handler expression with .WithInsert().NewItemsCount(100).WithBulk() (100 is the number of new records) in every table masking activity in every Masking engine in the Package.xaml file.

Default values

Generating synthetic data for a default field in a database will behave depending on how a field was imported. (Write, Skip, Read)

  1. If import case = 'skip', field will be filled with default value.
  2. If import case = 'read', field will be filled with default value.
  3. If import case = 'write', field will be filled with record which is defined in BDX package or NULL if record is undefined. Exception will be thrown if field was defined as NOT NULL and value is not defined in BDX package.

Mandatory fields handling

Start the data masking by selecting Debug -> Start Without Debugging (or CTRL+F5).

A console window will pop-up showing some diagnostic outputs and, within seconds, a dialog will pop-up (Figure 1) showing the error messages containing the information about mandatory fields. For the Customer records, the BirthDate property is required and no rules have been set to specify its value. Consequently, entity validation failed and no records were created in the database.

Mandatory field error Figure 1: Mandatory field error

  • Add the Masking block activity from the BizDataX Masking Control Flow group in the Toolbox to the Customer masking table masking activity. Rename the Masking block title by modifying the DisplayName property to Mandatory fields masking.
  • Add the Generate date in range masking activity from the BizDataX Generators group to the Mandatory fields masking block. Set the Property to BirthDate and specify any From and To dates.

Masking rules modification

All masking rules have a setting called SkipDefaultValues in the Input: Filter category in the Properties window. If SkipDefaultValues is true, it specifies that the field will be masked only if the original value isn't empty. When generating synthetic data all records are initially empty, so this setting must be set to false on all masking rules.

Properties window Figure 2: Properties window

Result

Start the data masking by selecting Debug -> Start Without Debugging (or CTRL+F5) and check the results in the database. New records, like those shown on Figure 3, should be present.

Generated records Figure 3: Generated records

Caution: Synthetic data generation as shown will not mask existing records. It will only insert new records to a table according to the specified masking rules. For a more advanced configuration of the handler, read next chapter.

Insert settings

To use different settings while generating synthetic data visit Fluent Handler API.

Synthetic Data Generation Demo Video