This document is designed to help you align your business data quality requirements with the technical settings in our platform.
By understanding how Thresholds impact your Data Quality (DQ) Score, you can ensure that incidents are only triggered when data truly falls below your standards.
1. The Relationship: DQ Score vs. Thresholds
To set up your tests correctly, you first need to decide on your Target Data Quality Score.
DQ Score: Represents the percentage of "clean" or "successful" records.
Threshold: Represents the percentage of "errored" or "failed" records you are willing to tolerate.
The Formula
The system calculates your quality using this formula:
Simple Rule of Thumb: Your Max Threshold is simply the "Error Budget" for your data. If you want a 98% DQ Score, you set a Max Threshold of 2%.
2. Standard Tests (Null, Unique, Email, Regex, UUID)
For standard tests, the system looks for a "limit" on failures. You generally leave the Min as null and only define the Max.
If your Business Goal is...
Set Max Threshold to...
Outcome
Zero Tolerance (e.g., Primary Keys)
0
Incident triggers if even one record is null or duplicate.
High Consistency (e.g., Customer Emails)
1
Incident triggers if more than 1% of data is invalid.
General Quality (e.g., Optional Fields)
5
Incident triggers if more than 5% of data is invalid.
3. Advanced Configuration: Custom SQL Tests
Custom SQL tests allow for more nuanced governance because you can define both a Min and a Max. This allows you to create "Allowable Ranges."
Case A: Simple Ceiling (Alert on "Too Much Error")
Use this when you only care if the error rate gets too high.
Min: null
Max: 1
Governance View: "I expect at least 99% quality. If errors exceed 1%, alert me."
Case B: Allowable Range (Alert on "Out of Bounds")
Use this for metrics that should always stay within a specific bracket (e.g., volume fluctuations or expected error rates).
Min: 1
Max: 5
Governance View: "It is normal for this data to have between 1% and 5% errors. If the error rate drops below 1% or spikes above 5%, something is wrong with the pipeline."
4. Quick Setup Summary
Test Type
Your DQ Goal
Set Min
Set Max
Resulting Logic
Any Standard
100%
null
0
Error > 0% = Incident
Any Standard
99%
null
1
Error > 1% = Incident
Custom SQL
95% to 99%
1
5
Error < 1% OR > 5% = Incident
Custom SQL
Min 95%
5
null
Error < 5% = Incident
Best Practices for Governance
Start Strict: Set thresholds at 0 or 1 for critical tables.
Adjust for Noise: If a test triggers too many "false alarms" for non-critical data, increase the Max threshold to align with what the business actually considers an "issue."
Review Regularly: Data evolves. Review your DQ Scores monthly to see if your thresholds still reflect the reality of your data pipeline.