# Setting Up Your Data Quality Thresholds

By understanding how **Thresholds** impact your **Data Quality (DQ) Score**, you can ensure that incidents are only triggered when data truly falls below your standards.

{% embed url="<https://www.loom.com/share/cd16da42dfc949e48a50d4a537120a2a>" %}

### 1. The Relationship: DQ Score vs. Thresholds

To set up your tests correctly, you first need to decide on your **Target Data Quality Score**.

* **DQ Score**: Represents the percentage of "clean" or "successful" records.
* **Threshold**: Represents the percentage of "errored" or "failed" records you are willing to tolerate.

#### The Formula

The system calculates your quality using this formula:

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2FPVeK2aSebOBzFNUC9LVD%2Fimage.png?alt=media&#x26;token=7a6efd8a-049e-491b-b58f-d31d1921c698" alt=""><figcaption></figcaption></figure>

**Simple Rule of Thumb**: Your **Max Threshold** is simply the "Error Budget" for your data. If you want a 98% DQ Score, you set a Max Threshold of 2%.

### 2. Standard Tests (Null, Unique, Email, Regex, UUID)

For standard tests, the system looks for a "limit" on failures. You generally leave the **Min** as `null` and only define the **Max**.

| If your Business Goal is...                  | Set Max Threshold to... | Outcome                                                        |
| -------------------------------------------- | ----------------------- | -------------------------------------------------------------- |
| **Zero Tolerance** (e.g., Primary Keys)      | `0`                     | Incident triggers if even **one** record is null or duplicate. |
| **High Consistency** (e.g., Customer Emails) | `1`                     | Incident triggers if more than **1%** of data is invalid.      |
| **General Quality** (e.g., Optional Fields)  | `5`                     | Incident triggers if more than **5%** of data is invalid.      |

### 3. Advanced Configuration: Custom SQL Tests

Custom SQL tests allow for more nuanced governance because you can define both a **Min** and a **Max**. This allows you to create "Allowable Ranges."

#### Case A: Simple Ceiling (Alert on "Too Much Error")

Use this when you only care if the error rate gets too high.

* **Min**: `null`
* **Max**: `1`
* **Governance View**: "I expect at least 99% quality. If errors exceed 1%, alert me."

#### Case B: Allowable Range (Alert on "Out of Bounds")

Use this for metrics that should always stay within a specific bracket (e.g., volume fluctuations or expected error rates).

* **Min**: `1`
* **Max**: `5`
* **Governance View**: "It is normal for this data to have between 1% and 5% errors. If the error rate drops below 1% or spikes above 5%, something is wrong with the pipeline."

### 4. Quick Setup Summary

| Test Type        | Your DQ Goal | Set Min | Set Max | Resulting Logic               |
| ---------------- | ------------ | ------- | ------- | ----------------------------- |
| **Any Standard** | 100%         | `null`  | `0`     | Error > 0% = Incident         |
| **Any Standard** | 99%          | `null`  | `1`     | Error > 1% = Incident         |
| **Custom SQL**   | 95% to 99%   | `1`     | `5`     | Error < 1% OR > 5% = Incident |
| **Custom SQL**   | Min 95%      | `5`     | `null`  | Error < 5% = Incident         |

#### Best Practices for Governance

1. **Start Strict**: Set thresholds at `0` or `1` for critical tables.
2. **Adjust for Noise**: If a test triggers too many "false alarms" for non-critical data, increase the **Max** threshold to align with what the business actually considers an "issue."
3. **Review Regularly**: Data evolves. Review your DQ Scores monthly to see if your thresholds still reflect the reality of your data pipeline.
