Set Up Freshness Monitors

Set up Freshness monitors to detect when data stops arriving on its expected schedule.

A Freshness monitor learns when your data normally arrives and alerts you when it doesn't show up as expected. Unlike a simple staleness check, the scheduled Freshness monitor builds a probability model from historical arrival patterns — so it knows not to alert on weekends if your data never arrives on weekends.

Freshness vs Volume: which to use

Goal
Use

Detect that data arrived (or didn't)

Freshness

Detect that the right amount of data arrived

Volume

Table receives data on a known schedule

Freshness

Table grows by a predictable row count per period

Volume

If you need both signals, create one monitor of each type on the same table.

How the scheduled Freshness monitor works

The scheduled Freshness monitor uses an ML model trained on your table's historical write timestamps. The model learns the expected arrival windows for your data and flags a run as anomalous when the probability of data having arrived drops below 50%.

This means:

  • If your pipeline never runs on weekends, the model accounts for this and does not raise weekend incidents.

  • If your data normally arrives between 08:00 and 10:00, a scan at 07:00 that finds no new data will not trigger an alert — the model knows it is too early.

Each incident for a Freshness monitor includes the time_since_last_write metric, which shows exactly how long ago the table last received a write. Use this to triage whether a delay is minor or critical.

How the on-demand Freshness monitor works

The on-demand Freshness monitor performs a simpler presence check: it queries whether any new rows exist since the lookback period you specify at run time. It does not use the ML model or learn arrival patterns — it returns a pass or fail based purely on whether rows are present.

Before you begin

  • You need at least one data source connected and a table available under that source.

  • To use Smart Training, the table must have a timestamp column (or you must provide an SQL expression that produces one). Smart Training is not available in On-Demand mode.


Step 1: Set up

  1. In the Data Quality module, go to the Config tab and select Create.

  2. Select the Freshness monitor card.

  3. In the Create a New Monitor form, select your Source (Schema is optional) and Dataset.

  4. Choose Monitor mode: Scheduled or On-Demand.

  5. Optionally enable Grouped By — select a column to group on and click Validate to confirm the column is valid.

  6. Click Proceed to Monitor Setup.

Grouped By creates one sub-monitor per distinct value in the group column. The Grouped-By Monitors page explains the 100-distinct-values limit and other constraints.


Step 2: Configure — Scheduled monitor

Complete the required fields in the Configure form:

Field
Description

Monitor Name

A descriptive name for this monitor. You can create multiple Freshness monitors on the same table.

Monitor Description

Optional.

Row Creation

How Decube identifies new rows: Timestamp (select a timestamp column), SQL Expression (provide an expression that produces a timestamp), or All Records.

Smart Training

Toggle on to train the model on historical data. Requires Timestamp or SQL Expression row creation. Enabling Smart Training also makes the Lookback Period selectable.

Frequency

How often the monitor scans. See Custom Scheduling for Monitors.

Incident Level

Severity assigned to incidents this monitor opens.

SQL Expression

Use an SQL Expression when your table stores timestamps in a non-standard format (string, Unix epoch, or split date/time columns). The expression must produce a valid timestamp in your data source's SQL dialect.

Format
BigQuery example
PostgreSQL example

String → timestamp

CAST(your_col AS DATETIME)

your_col::timestamp

Unix seconds → timestamp

TIMESTAMP_SECONDS(your_col)

TO_TIMESTAMP(your_col)

Separate date + time columns

PARSE_DATETIME('%F %T', CONCAT(date_col, ' ', time_col))

(date_col || ' ' || time_col)::timestamp

Validating the expression before saving is required.

Notifications

Turn on Notify default channel to route incidents to a specific email or Slack channel. Click Submit to create the monitor.


Step 2: Configure — On-Demand monitor

On-Demand monitors do not use Smart Training, Auto Threshold, frequency scheduling, or Grouped By.

Field
Description

Monitor Name

A descriptive name.

Monitor Description

Optional.

Row Creation

Timestamp or SQL Expression only — All Records is not available in On-Demand mode.

Lookback Period

The time window to check for new rows when the monitor runs.

Incident Level

Severity assigned to incidents this monitor opens.

To finish:

  • Click Save to create the monitor without running it immediately.

  • Click Save and Run to create and run the monitor straight away.

After creation, you can run the monitor again from All Monitors by clicking the ellipsis (︙) and selecting View Monitor, then Run once.


Set Up Volume MonitorsHow Anomaly Detection WorksRetraining Monitors

Last updated