Setting Up Your Data Quality Thresholds

This document is designed to help you align your business data quality requirements with the technical settings in our platform.

By understanding how Thresholds impact your Data Quality (DQ) Score, you can ensure that incidents are only triggered when data truly falls below your standards.

1. The Relationship: DQ Score vs. Thresholds

To set up your tests correctly, you first need to decide on your Target Data Quality Score.

DQ Score: Represents the percentage of "clean" or "successful" records.
Threshold: Represents the percentage of "errored" or "failed" records you are willing to tolerate.

The Formula

The system calculates your quality using this formula:

Simple Rule of Thumb: Your Max Threshold is simply the "Error Budget" for your data. If you want a 98% DQ Score, you set a Max Threshold of 2%.

2. Standard Tests (Null, Unique, Email, Regex, UUID)

For standard tests, the system looks for a "limit" on failures. You generally leave the Min as null and only define the Max.

If your Business Goal is...

Set Max Threshold to...

Outcome

Zero Tolerance (e.g., Primary Keys)

0

Incident triggers if even one record is null or duplicate.

High Consistency (e.g., Customer Emails)

1

Incident triggers if more than 1% of data is invalid.

General Quality (e.g., Optional Fields)

5

Incident triggers if more than 5% of data is invalid.

3. Advanced Configuration: Custom SQL Tests

Custom SQL tests allow for more nuanced governance because you can define both a Min and a Max. This allows you to create "Allowable Ranges."

Case A: Simple Ceiling (Alert on "Too Much Error")

Use this when you only care if the error rate gets too high.

Min: null
Max: 1
Governance View: "I expect at least 99% quality. If errors exceed 1%, alert me."

Case B: Allowable Range (Alert on "Out of Bounds")

Use this for metrics that should always stay within a specific bracket (e.g., volume fluctuations or expected error rates).

Min: 1
Max: 5
Governance View: "It is normal for this data to have between 1% and 5% errors. If the error rate drops below 1% or spikes above 5%, something is wrong with the pipeline."

4. Quick Setup Summary

Test Type

Your DQ Goal

Set Min

Set Max

Resulting Logic

Any Standard

100%

null

0

Error > 0% = Incident

Any Standard

99%

null

1

Error > 1% = Incident

Custom SQL

95% to 99%

1

5

Error < 1% OR > 5% = Incident

Custom SQL

Min 95%

5

null

Error < 5% = Incident

Best Practices for Governance

Start Strict: Set thresholds at 0 or 1 for critical tables.
Adjust for Noise: If a test triggers too many "false alarms" for non-critical data, increase the Max threshold to align with what the business actually considers an "issue."
Review Regularly: Data evolves. Review your DQ Scores monthly to see if your thresholds still reflect the reality of your data pipeline.

PreviousCustom Scheduling For Monitors NextHow to set up monitors

Last updated 1 month ago

hashtag1. The Relationship: DQ Score vs. Thresholds

hashtagThe Formula

hashtag2. Standard Tests (Null, Unique, Email, Regex, UUID)

hashtag3. Advanced Configuration: Custom SQL Tests

hashtagCase A: Simple Ceiling (Alert on "Too Much Error")

hashtagCase B: Allowable Range (Alert on "Out of Bounds")

hashtag4. Quick Setup Summary

hashtagBest Practices for Governance

1. The Relationship: DQ Score vs. Thresholds

The Formula

2. Standard Tests (Null, Unique, Email, Regex, UUID)

3. Advanced Configuration: Custom SQL Tests

Case A: Simple Ceiling (Alert on "Too Much Error")

Case B: Allowable Range (Alert on "Out of Bounds")

4. Quick Setup Summary

Best Practices for Governance