Quality

The Quality tab provides a detailed view of data quality across various dimensions, supported by different test types. Here’s what you can expect:

Quality Dimensions and Test Types

Our platform currently supports four quality dimensions, each associated with specific test types:

  • Accuracy: Measures how close the data values are to the true values. Tests include β€œRegex” and β€œValue in.”

  • Completeness: Measures the extent to which all required data elements are present. Tests include β€œNot Null.”

  • Uniqueness: Checks each data record to ensure it is unique within the dataset. Tests include β€œIs Unique.”

  • Validity: Ensures data conforms to acceptable standards, such as ranges and formats. Tests include β€œIs Email” and β€œIs UUID.”

The Data Quality (DQ) score is calculated daily using the formula:

where R_{fm} is the count of failed rows, and R_{tm} is the total count of rows in the scan. The health score for each dimension is the average of all monitors over the selected time period.

Data Health Score

The Data Health Score represents the average score for all dimensions over the selected time period. The scores are color-coded for easy interpretation:

β€’ Green (> 98%): Excellent health

β€’ Yellow (95% - 98%): At risk

β€’ Red (< 95%): Poor health

Custom Date Range and Filters

The custom date range supports up to six months, allowing for in-depth analysis over a quarter. The Quality dashboard also includes various filters to help you narrow down your data view, such as:

β€’ Domains

β€’ Data sources

β€’ Data Owners

β€’ Monitor mode

β€’ Row creation preferences

β€’ Tags

β€’ Classifications

Source/Domain Summary

The Source/Domain Summary in the Quality tab provides results based on selected domains and shows scores for seven key quality metrics. This helps you gain a deeper understanding of your data’s health across different data sources and domains, making it easier to pinpoint areas for improvement.

Additionally, you can view these scores sliced by your data sources or domains in the Source/Domain Summary section, providing a granular and comprehensive view of your data quality across different segments.

Last updated