|Table of contents|
Different settings are available depending on the type of analyzer selected.
Multiple analysis settings are specified using
|Analysis setting||Description||Supported databases|
|sample||Defines the sampling algorithm for selecting data for Sensitive data discovery process. The following values are available:
- top - First N records will be selected from each table. Excellent performance. Use
- sample - Records sampling using probabilities will be used. Good performance. Use
- random - Random records will be selected from each table. Low performance, good quality of data. Use
|DB2 - top, sample, random (default).
Informix - top (default).
SQL Server - top, sample, random (default).
Oracle - top, sample, random (default).
Sybase - top (default).
PostgreSQL - top (default), random.
|schemas||Only listed schemas will be included in the import. Provide the list of schemas separated with comma (example:
|includeSystemSchemas||Defines if the import of data source will include system schemas and tables. The following values are available:
-true - system schemas and tables will be imported. Use
-false - system schemas and tables won't be imported. Use
|DB2, Oracle, PostgreSQL|
|countRecords||Defines if the import of data source will retrieve the number of records in each table. The following values are available:
-(blank) - count of the records won't be retrieved. Omit the countRecords declaration. This is the default setting.
-stat - reads the database information schema statistics to predict the number of records. Excellent performance but the number of records saved may not be equal to the real number of records. Use
-count - reads the exact count of records in the database. Database intensive operation. Use
|DB2, Informix, SQL Server, Oracle, Sybase, PostgreSQL|
In continuation, some additional specifics are explained:
Before using this setting, it is recommended to update the database statistics. Updating the statistics for databases can be found on the following links:
sample setting, there is also something called sample size and it is used in discovery rules. The performance and quality of the data can also be affected by the sample size which can be set during the creation of the discovery rules. The default options in both of these settings are the recommended settings which should work well with most database tables.
BizDataX Documentation © Built by Ekobit. All rights reserved.