Quality

The Quality tab provides a detailed view of data quality across various dimensions, supported by different test types. Here’s what you can expect:

Quality Dimensions and Test Types

Our platform currently supports four quality dimensions, each associated by default with specific test types:

Accuracy: Measures how close the data values are to the true values. Tests include “Regex” and “Value in.”
Completeness: Measures the extent to which all required data elements are present. Tests include “Not Null.”
Uniqueness: Checks each data record to ensure it is unique within the dataset. Tests include “Is Unique.”
Validity: Ensures data conforms to acceptable standards, such as ranges and formats. Tests include “Is Email” and “Is UUID.”
Timeliness: Measures how up-to-date the data is.
Consistency: Measures reliability and uniformity of data within datasets.
Granularity: Measures level of detail or the degree of aggregation present.
Others: Any other tests not within the other categories.

You can customize the association of each Dimension to a supported monitor that can output a Data Quality score.

The dimension Timeliness will be introduced for Freshness monitors soon so that you can customize the scoring criteria for Timeliness. Do keep updated on our future releases.

The Data Quality (DQ) score is calculated daily using the formula:

$DQ Score = 1 - \left( \frac{Error Rows}{Total Rows} \right)$

Where:

Error Rows: Total number of rows that failed the data quality check
Total Rows: Total number of rows scanned by the monitor

For example, if a “Not Null” test finds 10 null rows out of 1000 total rows, the score would be:

= 1-[\frac{10}{1000}]=0.99 = 99

Per Dimension (Health Score)

Each DQ dimension (e.g., Accuracy, Completeness, Validity) groups multiple monitors.

To calculate the score for a dimension:

Sum up error rows across all monitors under the dimension.
Sum up total scanned rows across those monitors.
Apply the same DQ score formula:

\textbf{Dimension Score} = 1 - \left( \frac{\sum \text{Error Rows}}{\sum \text{Total Rows}} \right)

This gives a weighted average, ensuring larger scans influence the score more than small ones.

Overall Data Health Score

The final DQ Health Score (shown on top of the dashboard) is:

\textbf{Overall Data Health Score} = \frac{\sum \text{Dimension Scores}}{\text{No. of Dimensions}}

Dimensions without any scanned rows are excluded from the average.

Only a select few types of monitors can produce the health score. To know which monitors generate health scores, read this article.

Overall Data Health Score

The Data Health Score represents the average score for all dimensions over the selected time period. The scores are color-coded for easy interpretation:

• Green (> 98%): Excellent health

• Yellow (95% - 98%): At risk

• Red (< 95%): Poor health

Custom Date Range and Filters

The custom date range supports up to six months, allowing for in-depth analysis over a quarter. The Quality dashboard also includes various filters to help you narrow down your data view, such as:

• Domains

• Data sources

• Data Owners

• Monitor mode (Scheduled, On-demand)

• Row creation preferences (filter for 'All Records' scan only)

• Tags

• Classifications

Source/Domain Summary

The Source/Domain Summary in the Quality tab provides results based on selected domains and shows scores for key quality metrics. This helps you gain a deeper understanding of your data’s health across different data sources and domains, making it easier to pinpoint areas for improvement.

PreviousIncidents NextGet alerts on email

Last updated 8 days ago