Quality

The Quality tab provides a detailed view of data quality across various dimensions, supported by different test types. Hereā€™s what you can expect:

Quality Dimensions and Test Types

Our platform currently supports four quality dimensions, each associated with specific test types:

  • Accuracy: Measures how close the data values are to the true values. Tests include ā€œRegexā€ and ā€œValue in.ā€

  • Completeness: Measures the extent to which all required data elements are present. Tests include ā€œNot Null.ā€

  • Uniqueness: Checks each data record to ensure it is unique within the dataset. Tests include ā€œIs Unique.ā€

  • Validity: Ensures data conforms to acceptable standards, such as ranges and formats. Tests include ā€œIs Emailā€ and ā€œIs UUID.ā€

The Data Quality (DQ) score is calculated daily using the formula:

The health score for each dimension is the average of all monitors over the selected time period.

Data Health Score

The Data Health Score represents the average score for all dimensions over the selected time period. The scores are color-coded for easy interpretation:

ā€¢ Green (> 98%): Excellent health

ā€¢ Yellow (95% - 98%): At risk

ā€¢ Red (< 95%): Poor health

Custom Date Range and Filters

The custom date range supports up to six months, allowing for in-depth analysis over a quarter. The Quality dashboard also includes various filters to help you narrow down your data view, such as:

ā€¢ Domains

ā€¢ Data sources

ā€¢ Data Owners

ā€¢ Monitor mode (Scheduled, On-demand)

ā€¢ Row creation preferences (filter for 'All Records' scan only)

ā€¢ Tags

ā€¢ Classifications

Source/Domain Summary

The Source/Domain Summary in the Quality tab provides results based on selected domains and shows scores for key quality metrics. This helps you gain a deeper understanding of your dataā€™s health across different data sources and domains, making it easier to pinpoint areas for improvement.

Last updated