How Anomaly Detection Works

How Decube detects anomalies in your data — what a monitor is, how the ML model trains, and what determines whether an alert fires.

Understanding how Decube's anomaly detection operates helps you configure monitors that behave predictably and interpret incidents accurately.

What a monitor is

A monitor represents one test applied to one asset. Each monitor produces its own independent incident stream — if you apply three different tests to the same table column, you have three monitors, each of which can open and close incidents independently.

How the ML model learns

For monitors that use Smart Training, Decube runs an ML model against the asset's historical data to learn the normal range for a given metric. The model builds a confidence interval — an expected upper and lower bound — for each scan point. When a new scan falls outside that interval, Decube opens an incident.

The confidence interval widens or narrows based on the Sensitivity setting you choose. See Sensitivity below.

Historical lookback by scan frequency

When a new monitor is created (or retrained), the model collects historical data to train on. The amount of history collected depends on the scan frequency you configure:

Scan frequency
Historical lookback

Hourly

7 days

Every 6 hours

30 days

Every 12 hours

60 days

Daily

192 days

Weekly

395 days

Monitors scan on a schedule after training is complete. During the training period, the monitor is visible in All Monitors but does not produce incidents.

Sparse data and silent skipping

The ML model requires a minimum amount of valid signal before it can produce a reliable confidence interval. If a scan finds fewer than 5 valid data points in the last 30 observations, the model will not run the test based on the collected metrics yet until the threshold is set.

This is intentional behaviour: firing an alert on insufficient data would produce unreliable signals. However, it means a monitor on a low-volume or infrequently-updated table may appear inactive. If your monitors are not producing incidents on a table you expect to have anomalies, check whether the table has enough scan history to meet the threshold.

Sensitivity

The Sensitivity setting controls how wide or narrow the model's confidence interval is, which in turn controls how easy it is for a data point to fall outside it and trigger an incident.

The scale runs from –5 to +5:

Value
Effect

–5

Widest confidence interval — least sensitive, fewest incidents

0

Default — balanced sensitivity

+5

Narrowest confidence interval — most sensitive, most incidents

You set Sensitivity via the feedback mechanism on an incident.

https://github.com/DecubeIO/decube-docs/blob/public/incident-model-feedback.md

Monitors that do not use the ML model

Not all monitor types use the ML model. Monitors with a fixed threshold configuration (Absolute, Percentage, Positive Range, or Any Range) compare each scan result directly against the bounds you define — no training period, no confidence interval.

For these monitors, the sparse-data rule and training timeout do not apply.

Setting Up Your Data Quality Thresholds

Last updated