Setting Up Your Data Quality Thresholds

The four threshold types available in Decube monitors, which test types support each, and how thresholds relate to your DQ score.

A threshold defines the acceptable range for a monitored metric. When a scan result falls outside the threshold, Decube opens an incident. Choosing the right threshold type for your test determines both what you're measuring and how sensitive the monitor will be.

The four threshold types

Absolute

Compares the raw row count of failing records against a min/max bound. Use Absolute when you need to reason about a specific number of records — for example, "alert if more than 100 rows have null values."

Compatible tests: Null, Unique, Email, UUID, Regex Match

Setting
Example value
Effect

Max only

0

Zero failing records permitted — any failure triggers an incident

Max only

50

Up to 50 failing records accepted; incident triggers above that

Min and Max

10 / 50

Incident triggers if failing row count is below 10 or above 50


Percentage

Compares the percentage of failing records against a min/max bound. Values must be between 0 and 100. Use Percentage when you care about the proportion of bad data relative to the total, rather than the raw count — for example, "alert if more than 2% of emails are invalid."

Compatible tests: Null, Unique, Email, UUID, Regex Match

Setting
Example value
Effect

Max only

0

Zero tolerance — any failure triggers an incident

Max only

2

Up to 2% failures accepted

Min and Max

1 / 5

Incident triggers if failure rate is below 1% or above 5%


Positive Range

Compares a numeric metric value against a min/max bound, where bound values must be zero or positive. Use Positive Range for statistical metrics that cannot go negative — for example, "alert if the average order value drops below 50 or rises above 500."

Compatible tests: Average, Min, Max, String Length

Setting
Example value
Effect

Min only

100

Alert if the metric drops below 100

Max only

500

Alert if the metric exceeds 500

Min and Max

100 / 500

Alert if the metric falls outside the 100–500 range


Any Range

Compares a numeric metric value against a min/max bound, where bound values can be negative. Use Any Range when the monitored column can legitimately hold negative values — for example, a profit/loss column where a value of –1000 to +5000 is expected.

Compatible tests: Average, Min, Max


Auto (Smart Training)

When Smart Training is enabled, the ML model learns an expected confidence interval from historical data and sets the effective threshold bounds dynamically. You do not set bounds manually — the model determines what's normal for each scan window.

Auto is only available on Scheduled monitors and requires a timestamp-based row filtering mode.

Compatible tests: Null, Unique, Email, UUID, Regex Match, Average, Min, Max, String Length

See How Anomaly Detection Works for how the model builds and adjusts its confidence interval.


Threshold type compatibility by test

Test type
Absolute
Percentage
Positive Range
Any Range
Auto

Null

Unique

Email

UUID

Regex Match

Average

Min

Max

String Length

Cardinality

Rolling window (no manual threshold)


Thresholds and your DQ score

Your DQ score represents the percentage of records that passed their monitors in a given period. Thresholds determine when a monitor counts a record as failed, so they directly control what your DQ score measures.

The formula is:

Simple rule of thumb: your Max threshold is the error budget for your data. If you want a 98% DQ score, set a Max Percentage threshold of 2%.

Business goal
Threshold type
Setting
Outcome

Zero tolerance (e.g., primary keys)

Absolute

Max 0

Any null or duplicate triggers an incident

High consistency (e.g., customer emails)

Percentage

Max 1

Incident triggers when more than 1% are invalid

General quality (e.g., optional fields)

Percentage

Max 5

Incident triggers when more than 5% are invalid

Volume within expected range

Positive Range

Min 1000 / Max 10000

Incident if row count falls outside the band

Tuning thresholds over time

Start strict and adjust based on what you observe:

  1. Start at zero or one for business-critical columns. It's easier to loosen a threshold than to explain why a gap went undetected.

  2. Raise the threshold if false positives accumulate on non-critical fields — align the setting with what the business genuinely considers a problem.

  3. Review monthly as data volumes and patterns evolve.

Monitor Configuration Reference

Last updated