Distribution

Distribution and Distribution branch masking activities are used in tandem to provide more control over the distribution of different masking algorithms across database records.

They provide a way to use percentages and ratios in determining how many records should be masked in a specific way.

Table of contents
Usage
Example
Properties

Usage

Distribution and Distribution branch activities are used when you want to precisely distribute different ways of masking in your database. For example, you could apply a different data masking technique for each half of your data records, or use different masking types in a certain ratio. The Distribution masking activity acts as an encompassing element that contains a variable number of distribution branches that define the way different masking techniques will be distributed across your database.

To use distribution branches, you have to enter the weight of each branch. These are weighed against weights of other distribution branches to determine in what way will the records be masked. Weights can be used to represent percentages (if their sum equals a hundred), ratios or any other way of comparing different masking techniques in terms of records masked. If only one distribution branch is used, it will represent the way to mask the entire data source, since it will have no other branch to compare to.

If we want to define two masking techniques and distribute each of them across half of our data, we create two distribution branches and enter the same value for their weight. This way, whatever value we entered, it will always be half of the total sum and will affect half of our data. If we want a fifth of our data to be masked differently than the rest, then we can use several weight distributions (20:80, 1:4, 2:8, etc.), depending on what you think the most logical and readable values are. Custom code or package parameters can be used for more complex weight calculations to keep your package more readable.

Example

Let's assume we have a database that has most of its customers from the US, but about 15% of its customers comes from Canada and about 10% from Mexico. We want our masked data to reflect that, and to achieve it we can use the distribution masking activity. We start by selecting the Distribution masking activity from the Toolbox and placing it in the Customer masking activity. Since we have three different ways to mask our data, we place three distribution branches inside the distribution masking activity.

Since we described the problem using percentages, we can also use them in the solution. We put 10 as the weight of the first distribution branch, 15 as the weight of the second distribution branch, and 75 (the rest of our data) as the weight of the last distribution branch. The sum of all weights equals a hundred, so our weights represent percentages correctly. As a last step, we place Pick first name from list activities beneath each distribution branch, select the FirstName property that we want to mask, and use picklists to choose first names from the countries corresponding to our weights.

Distributing first name masking Figure 1: Distributing first name masking

Properties

Activity Property group Property name Description Example
Distribution Input properties ItemsCount The number of items to mask. 1000000
Misc DisplayName Display name of the activity in the workflow. Distribution
Result Contains the masking definition object. It's a part of the masking infrastructure and should be ignored. -
Distribution branch Input properties Weight Weight of the distribution branch. 75
Misc DisplayName Display name of the activity in the workflow. Distribution branch
Result Contains the masking definition object. It's a part of the masking infrastructure and should be ignored. -