# Setting Up Your Data Quality Thresholds

{% embed url="<https://www.loom.com/share/cd16da42dfc949e48a50d4a537120a2a>" %}

A threshold defines the acceptable range for a monitored metric. When a scan result falls outside the threshold, Decube opens an incident. Choosing the right threshold type for your test determines both what you're measuring and how sensitive the monitor will be.

## The four threshold types

### Absolute

Compares the raw **row count** of failing records against a min/max bound. Use Absolute when you need to reason about a specific number of records — for example, "alert if more than 100 rows have null values."

**Compatible tests:** Null, Unique, Email, UUID, Regex Match

| Setting     | Example value | Effect                                                            |
| ----------- | ------------- | ----------------------------------------------------------------- |
| Max only    | `0`           | Zero failing records permitted — any failure triggers an incident |
| Max only    | `50`          | Up to 50 failing records accepted; incident triggers above that   |
| Min and Max | `10` / `50`   | Incident triggers if failing row count is below 10 or above 50    |

***

### Percentage

Compares the **percentage** of failing records against a min/max bound. Values must be between 0 and 100. Use Percentage when you care about the proportion of bad data relative to the total, rather than the raw count — for example, "alert if more than 2% of emails are invalid."

**Compatible tests:** Null, Unique, Email, UUID, Regex Match

| Setting     | Example value | Effect                                                    |
| ----------- | ------------- | --------------------------------------------------------- |
| Max only    | `0`           | Zero tolerance — any failure triggers an incident         |
| Max only    | `2`           | Up to 2% failures accepted                                |
| Min and Max | `1` / `5`     | Incident triggers if failure rate is below 1% or above 5% |

***

### Positive Range

Compares a **numeric metric value** against a min/max bound, where bound values must be zero or positive. Use Positive Range for statistical metrics that cannot go negative — for example, "alert if the average order value drops below 50 or rises above 500."

**Compatible tests:** Average, Min, Max, String Length

| Setting     | Example value | Effect                                              |
| ----------- | ------------- | --------------------------------------------------- |
| Min only    | `100`         | Alert if the metric drops below 100                 |
| Max only    | `500`         | Alert if the metric exceeds 500                     |
| Min and Max | `100` / `500` | Alert if the metric falls outside the 100–500 range |

***

### Any Range

Compares a **numeric metric value** against a min/max bound, where bound values can be negative. Use Any Range when the monitored column can legitimately hold negative values — for example, a profit/loss column where a value of –1000 to +5000 is expected.

**Compatible tests:** Average, Min, Max

***

### Auto (Smart Training)

When Smart Training is enabled, the ML model learns an expected confidence interval from historical data and sets the effective threshold bounds dynamically. You do not set bounds manually — the model determines what's normal for each scan window.

Auto is only available on Scheduled monitors and requires a timestamp-based row filtering mode.

**Compatible tests:** Null, Unique, Email, UUID, Regex Match, Average, Min, Max, String Length

See [How Anomaly Detection Works](/data-quality/anomaly-detection-explained.md) for how the model builds and adjusts its confidence interval.

***

## Threshold type compatibility by test

| Test type     | Absolute | Percentage | Positive Range | Any Range | Auto                                 |
| ------------- | -------- | ---------- | -------------- | --------- | ------------------------------------ |
| Null          | ✓        | ✓          | —              | —         | ✓                                    |
| Unique        | ✓        | ✓          | —              | —         | ✓                                    |
| Email         | ✓        | ✓          | —              | —         | ✓                                    |
| UUID          | ✓        | ✓          | —              | —         | ✓                                    |
| Regex Match   | ✓        | ✓          | —              | —         | ✓                                    |
| Average       | —        | —          | ✓              | ✓         | ✓                                    |
| Min           | —        | —          | ✓              | ✓         | ✓                                    |
| Max           | —        | —          | ✓              | ✓         | ✓                                    |
| String Length | —        | —          | ✓              | —         | ✓                                    |
| Cardinality   | —        | —          | —              | —         | Rolling window (no manual threshold) |

***

## Thresholds and your DQ score

Your **DQ score** represents the percentage of records that passed their monitors in a given period. Thresholds determine when a monitor counts a record as failed, so they directly control what your DQ score measures.

The formula is:

<figure><img src="/files/UrNiIGIyOQW2N5ygI1Wx" alt=""><figcaption></figcaption></figure>

**Simple rule of thumb:** your Max threshold is the error budget for your data. If you want a 98% DQ score, set a Max Percentage threshold of 2%.

| Business goal                            | Threshold type | Setting                  | Outcome                                         |
| ---------------------------------------- | -------------- | ------------------------ | ----------------------------------------------- |
| Zero tolerance (e.g., primary keys)      | Absolute       | Max `0`                  | Any null or duplicate triggers an incident      |
| High consistency (e.g., customer emails) | Percentage     | Max `1`                  | Incident triggers when more than 1% are invalid |
| General quality (e.g., optional fields)  | Percentage     | Max `5`                  | Incident triggers when more than 5% are invalid |
| Volume within expected range             | Positive Range | Min `1000` / Max `10000` | Incident if row count falls outside the band    |

### Tuning thresholds over time

Start strict and adjust based on what you observe:

1. **Start at zero or one** for business-critical columns. It's easier to loosen a threshold than to explain why a gap went undetected.
2. **Raise the threshold if false positives accumulate** on non-critical fields — align the setting with what the business genuinely considers a problem.
3. **Review monthly** as data volumes and patterns evolve.

{% content-ref url="/pages/PZ2RI1yYFjLNn2k7GoMG" %}
[Monitor Configuration Reference](/data-quality/monitor-configuration-settings/configuration-reference.md)
{% endcontent-ref %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.decube.io/data-quality/monitor-configuration-settings/setting-up-your-data-quality-thresholds.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
