Asset Report: Data Quality Scorecard
This report shows the data quality scoring for supported Field Health monitors configured.
Last updated
This report shows the data quality scoring for supported Field Health monitors configured.
Last updated
The output of the report will show the report based on each monitor that was enabled based on the time period selected. For example, if you had enabled a Not Null monitor on a column, it will show the DQ score for the monitor for the selected time period.
The output shown on the UI is limited to a preview of 25 rows. You will need to download the report as a csv to see the entire output of the generated report if it is beyond 25 rows.
The output csv will include the following columns:
report_generation_date
: The date the report was run on the platform.
data_owner
: The current assigned owner in the Data Catalog.
qual_id
: The fully qualified name of the backend object inside the Catalog.
source
, database
, collection
, dataset
, column
: Name of the Catalog object and where it originates from.
Tags
: Any tags that are added to the object. This is separated by ";" if there are more than 1 tag.
Dimension
: The dimension that is associated to the monitor type.
Monitor_name
& monitor_description
: the name and description added by the user to the monitor. (currently supported on Data Contracts only)
Monitor_id
: Unique identifier for the added monitor.
Monitor_mode
: It will show if the monitor was run on scheduled or on-demand to get the metrics.
Test_type
: See supported test types below.
Filter_mode
: The monitor configuration for scanning, either by incrementally scanning (timestamp, sql expression) or by entire table (all records). This is important to accurately calculate the DQ score per monitor by grouping them with the filter_mode
(see more info below).
Agg_error_row_count
: The aggregated count of rows with errorneous records (for incremental scans).
Agg_total_row_count
: The aggregated total count of rows which were scanned for the selected time period (for incremental scans).
Agg_dq_score
: The DQ score based on the ratio of agg_error_row_count
to agg_total_row_count
.
Latest_error_row_count
: The count of rows with errorneous records (for all records scan).
Latest_total_row_count
: The total count of rows that were scanned for (for all record scans).
Latest_dq_score
: The DQ score based on the ratio of latest_error_row_count
to latest_total_row_count
.
It is recommended that where the configured filter_mode
= timestamp
/ sql
expression
, that the agg_dq_score
is taken as the DQ score for the monitor as this is the average dq scoring for all scans that were done incrementally for the time period.
Whereas where filter_mode
= all records
, where the entire table is scanned, the latest_dq_score
should be taken instead to see the metrics taken from the last scan performed.
By default, users will need to select the Data Source
, start date
and end date
to generate the report. Without adding any filters, the output .csv will include all the monitors that have been enabled in the data source.
However, users can additionally add filters to the report to narrow down the output. These filters include:
Add Schemas or Add Tables: This limits the output of the csv to the selected schema or tables only.
Quality dimensions: Limit the output to only the test types with the selected Quality Dimensions.
Add an asset owner: Limit the output to only objects where this user is designated as the asset owner.
Filter by tags: Limit the output of the csv to only objects with the selected tags.
Filter by classifications: Limit the output of the csv to only objects that have classifications added to them.
For monitors that have been enabled on your account but not listed here, they will not be included in the report for DQ scoring.
Test Type | Dimension | Current Feature Availability |
---|---|---|
Is Unique
Uniqueness
Config & Data Mesh > Data Contract
Regex
Accuracy
Config & Data Mesh > Data Contract
Is Email
Validity
Config & Data Mesh > Data Contract
Is UUID
Validity
Config & Data Mesh > Data Contract
Not Null
Completeness
Config & Data Mesh > Data Contract
Value is
Validity
Data Mesh > Data Contract
Value in
Accuracy
Data Mesh > Data Contract
Date in the past
Validity
Data Mesh > Data Contract
Date in the future
Validity
Data Mesh > Data Contract
Range
Validity
Data Mesh > Data Contract