Asset Report: Data Quality Scorecard

This report shows the data quality scoring for supported Field Health monitors configured.

Example output of generated report

The output of the report will show the report based on each monitor that was enabled based on the time period selected. For example, if you had enabled a Not Null monitor on a column, it will show the DQ score for the monitor for the selected time period.

The output shown on the UI is limited to a preview of 25 rows. You will need to download the report as a csv to see the entire output of the generated report if it is beyond 25 rows.

The output csv will include the following columns:

report_generation_date: The date the report was run on the platform.
data_owner: The current assigned owner in the Data Catalog.
qual_id: The fully qualified name of the backend object inside the Catalog.
source, database, collection, dataset, column: Name of the Catalog object and where it originates from.
Tags: Any tags that are added to the object. This is separated by ";" if there are more than 1 tag.
Dimension: The dimension that is associated to the monitor type.

Dimensions include: Validity, Completeness, Accuracy, Uniqueness, Timeliness, Consistency, Granularity, Others.

Monitor_name & monitor_description: the name and description added by the user to the monitor. (currently supported on Data Contracts only)
Monitor_id: Unique identifier for the added monitor.
Monitor_mode: It will show if the monitor was run on scheduled or on-demand to get the metrics.
Test_type: See supported test types below.
Filter_mode: The monitor configuration for scanning, either by incrementally scanning (timestamp, sql expression) or by entire table (all records). This is important to accurately calculate the DQ score per monitor by grouping them with the filter_mode (see more info below).
Agg_error_row_count: The aggregated count of rows with errorneous records (for incremental scans).
Agg_total_row_count: The aggregated total count of rows which were scanned for the selected time period (for incremental scans).
Agg_dq_score: The DQ score based on the ratio of agg_error_row_count to agg_total_row_count.
Latest_error_row_count: The count of rows with errorneous records (for all records scan).
Latest_total_row_count: The total count of rows that were scanned for (for all record scans).
Latest_dq_score: The DQ score based on the ratio of latest_error_row_count to latest_total_row_count.

It is recommended that where the configured filter_mode = timestamp / sql expression, that the agg_dq_score is taken as the DQ score for the monitor as this is the average dq scoring for all scans that were done incrementally for the time period.

Whereas where filter_mode = all records, where the entire table is scanned, the latest_dq_score should be taken instead to see the metrics taken from the last scan performed.

How to generate the report

By default, users will need to select the Data Source, start date and end date to generate the report. Without adding any filters, the output .csv will include all the monitors that have been enabled in the data source.

However, users can additionally add filters to the report to narrow down the output. These filters include:

Add Schemas or Add Tables: This limits the output of the csv to the selected schema or tables only.
Quality dimensions: Limit the output to only the test types with the selected Quality Dimensions.
Add an asset owner: Limit the output to only objects where this user is designated as the asset owner.
Filter by tags: Limit the output of the csv to only objects with the selected tags.
Filter by classifications: Limit the output of the csv to only objects that have classifications added to them.

Supported Monitors with default Quality Dimension

Test Type

Dimension

Current Feature Availability

Is Unique

Uniqueness