Setting Up Your Data Quality Thresholds
The four threshold types available in Decube monitors, which test types support each, and how thresholds relate to your DQ score.
A threshold defines the acceptable range for a monitored metric. When a scan result falls outside the threshold, Decube opens an incident. Choosing the right threshold type for your test determines both what you're measuring and how sensitive the monitor will be.
The four threshold types
Absolute
Compares the raw row count of failing records against a min/max bound. Use Absolute when you need to reason about a specific number of records — for example, "alert if more than 100 rows have null values."
Compatible tests: Null, Unique, Email, UUID, Regex Match
Max only
0
Zero failing records permitted — any failure triggers an incident
Max only
50
Up to 50 failing records accepted; incident triggers above that
Min and Max
10 / 50
Incident triggers if failing row count is below 10 or above 50
Percentage
Compares the percentage of failing records against a min/max bound. Values must be between 0 and 100. Use Percentage when you care about the proportion of bad data relative to the total, rather than the raw count — for example, "alert if more than 2% of emails are invalid."
Compatible tests: Null, Unique, Email, UUID, Regex Match
Max only
0
Zero tolerance — any failure triggers an incident
Max only
2
Up to 2% failures accepted
Min and Max
1 / 5
Incident triggers if failure rate is below 1% or above 5%
Positive Range
Compares a numeric metric value against a min/max bound, where bound values must be zero or positive. Use Positive Range for statistical metrics that cannot go negative — for example, "alert if the average order value drops below 50 or rises above 500."
Compatible tests: Average, Min, Max, String Length
Min only
100
Alert if the metric drops below 100
Max only
500
Alert if the metric exceeds 500
Min and Max
100 / 500
Alert if the metric falls outside the 100–500 range
Any Range
Compares a numeric metric value against a min/max bound, where bound values can be negative. Use Any Range when the monitored column can legitimately hold negative values — for example, a profit/loss column where a value of –1000 to +5000 is expected.
Compatible tests: Average, Min, Max
Auto (Smart Training)
When Smart Training is enabled, the ML model learns an expected confidence interval from historical data and sets the effective threshold bounds dynamically. You do not set bounds manually — the model determines what's normal for each scan window.
Auto is only available on Scheduled monitors and requires a timestamp-based row filtering mode.
Compatible tests: Null, Unique, Email, UUID, Regex Match, Average, Min, Max, String Length
See How Anomaly Detection Works for how the model builds and adjusts its confidence interval.
Threshold type compatibility by test
Null
✓
✓
—
—
✓
Unique
✓
✓
—
—
✓
✓
✓
—
—
✓
UUID
✓
✓
—
—
✓
Regex Match
✓
✓
—
—
✓
Average
—
—
✓
✓
✓
Min
—
—
✓
✓
✓
Max
—
—
✓
✓
✓
String Length
—
—
✓
—
✓
Cardinality
—
—
—
—
Rolling window (no manual threshold)
Thresholds and your DQ score
Your DQ score represents the percentage of records that passed their monitors in a given period. Thresholds determine when a monitor counts a record as failed, so they directly control what your DQ score measures.
The formula is:

Simple rule of thumb: your Max threshold is the error budget for your data. If you want a 98% DQ score, set a Max Percentage threshold of 2%.
Zero tolerance (e.g., primary keys)
Absolute
Max 0
Any null or duplicate triggers an incident
High consistency (e.g., customer emails)
Percentage
Max 1
Incident triggers when more than 1% are invalid
General quality (e.g., optional fields)
Percentage
Max 5
Incident triggers when more than 5% are invalid
Volume within expected range
Positive Range
Min 1000 / Max 10000
Incident if row count falls outside the band
Tuning thresholds over time
Start strict and adjust based on what you observe:
Start at zero or one for business-critical columns. It's easier to loosen a threshold than to explain why a gap went undetected.
Raise the threshold if false positives accumulate on non-critical fields — align the setting with what the business genuinely considers a problem.
Review monthly as data volumes and patterns evolve.
Last updated