This document will present two examples to illustrate the discovery rule creation process.
To create a discovery rule you must enter the required parameters which define what type of data will be searched for and where.
|Table of contents|
|Discovery rules list|
|Creating a rule "Cities"|
|Creating a rule "First names"|
All rules currently contained in the project can be viewed on the rules list, the landing page of the rules functionality. It contains a table detailing every rule in the project. Information necessary for quickly identifying rules are present such as the rule name, fundamental data name, and what type of discoverer the rule uses, and on which scope of the data. The number of discovery findings of each status (sensitive, not sensitive, and unknown) are also presented to provide a glance at the results of the rule.
Figure 1: Discovery rules list
If there are no rules, the discovery rules list is empty. To create a new rule, click on the plus button in the top right corner.
To view the findings of the rule, select its View findings option on the right. Other options available are:
Use checkboxes to:
To start the discovery process, select one or more rules and select option Run discovery on the right. You can view previous discovery executions by clicking the book button in the top right corner of the page which will take you to the Discovery history page.
To create a rule, you must first enter the name of the rule. Make sure to name them in such a way that they are easily distinguishable from each other, especially if you're creating more detailed rules, multiple rules in the same environment, etc. In this case, we want to create only one rule which will be used to search for city data, so there is no need for a more detailed name.
Fundamental data name is a basic type of data in database columns and if not entered, will be by default be the same as the rule name.
Figure 2: "Cities" rule
Next, we need to choose a BizDataX discoverer. Each discoverer searches for specific types of data, so we need to choose a discoverer appropriate for our rule. Discoverers are selected from a list and are grouped by categories. In this case, we choose the
City discoverer. The description of the discoverer can be seen below the name of the discoverer to help differentiate between similar discoverers.
The last piece of information we need to give to the rule is target data, i.e. where we want to search for specific data. We have a choice of choosing an entire environment, a specific schema of that environment, or a specific table of a specific schema. Here we want to search for data in a particular table so we choose the environment and schema where we know the table is located.
Furthermore, we need to enter the desired sample size. The discovery process only analyzes a part of the data in the database since analyzing every piece of data could drastically increase the duration of the sensitive data discovery without much gain. Here you have an option to increase or decrease the amount of data that is being analyzed. A larger sample size results in a better quality of data and thus more precise sensitive data discovery results, but it also increases the duration of the process. Smaller sample size means we get the results faster but at the cost of data quality. The default sample size of
10000 records has proven to provide good quality of data in a reasonable amount of time.
Note: Performance and quality of data can also be affected by the data source analysis settings. The default options in both of these settings are the recommended settings which should work well with most tables.
Once all the information is set, select the Create rule button in the bottom right of the page to create the rule or edit an existing one. You can cancel the creation/edit of a rule at any time by clicking the Cancel button.
In this example, we will change some of the settings from the ones used during the creation of the Cities rule to demonstrate additional options available to some discoverers. We enter an appropriate name and fundamental data name for our new rule and leave the sample size as it is.
Figure 3: "First names" rule
In this example, we want to search for data not in a specific table, but in all tables of a specific schema. We can do that by selecting the environment that contains the schema. We do not enter a table name in the target data which will result in a discovery process that searches all tables of the selected schema. If we wanted to search for sensitive data in the entire environment, the schema name would also be omitted.
Lastly, we need to choose an appropriate discoverer. When dealing with first names, we can choose between several discoverers, where each has its own unique use. In this case, we want to search for everything that looks like a first name (regardless of whether it also looks like the last name or another type of sensitive data as well), so we choose the
First name discoverer. This discoverer provides us with additional options that appear next to the discoverer input field as value. The description below it says we can enter a country code so sensitive data discovery only searches for the first names of a specific country. In this example, we want to only look for first names from the United States, so we enter
US, the country code for the United States, in the value field. With all information filled, we can create the rule.