Decube
Try for free
  • 🚀Overview
    • Welcome to decube
    • Getting started
      • How to connect data sources
    • Security and Compliance
    • Data Policy
    • Changelog
    • Public Roadmap
  • Support
  • 🔌Data Warehouses
    • Snowflake
    • Redshift
    • Google Bigquery
    • Databricks
    • Azure Synapse
  • 🔌Relational Databases
    • PostgreSQL
    • MySQL
    • SingleStore
    • Microsoft SQL Server
    • Oracle
  • 🔌Transformation Tools
    • dbt (Cloud Version)
    • dbt Core
    • Fivetran
    • Airflow
    • AWS Glue
    • Azure Data Factory
    • Apache Spark
      • Apache Spark in Azure Synapse
    • OpenLineage (BETA)
    • Additional configurations
  • 🔌Business Intelligence
    • Tableau
    • Looker
    • PowerBI
  • 🔌Data Lake
    • AWS S3
    • Azure Data Lake Storage (ADLS)
      • Azure Function for Metadata
    • Google Cloud Storage (GCS)
  • 🔌Ticketing and Collaboration
    • ServiceNow
    • Jira
  • 🔒Security and Connectivity
    • Enabling VPC Access
    • IP Whitelisting
    • SSH Tunneling
    • AWS Identities
  • ✅Data Quality
    • Incidents Overview
    • Incident model feedback
    • Enable asset monitoring
    • Available Monitor Types
    • Available Monitor Modes
    • Catalog: Add/Modify Monitor
    • Set Up Freshness & Volume Monitors
    • Set Up Field Health Monitors
    • Set Up Custom SQL Monitors
    • Grouped-by Monitors
    • Modify Schema Drift Monitors
    • Modify Job Failure Monitors (Data Job)
    • Custom Scheduling For Monitors
    • Config Settings
  • 📖Catalog
    • Overview of Asset Types
    • Assets Catalog
    • Asset Overview
    • Automated Lineage
      • Lineage Relationship
      • Supported Data Sources and Lineage Types
    • Add lineage relationships manually
    • Add tags and classifications to fields
    • Field Statistcs
    • Preview sample data
  • 📚Glossary
    • Glossary, Category and Terms
    • Adding a new glossary
    • Adding Terms and Linked Assets
  • Moving Terms to Glossary/Category
  • AI Copilot
    • Copilot's Autocomplete
  • 🤝Collaboration
    • Ask Questions
    • Rate an asset
  • 🌐Data Mesh [BETA]
    • Overview on Data Mesh [BETA]
    • Creating and Managing Domains/Sub-domains
    • Adding members to Domain/Sub-domain
    • Linking Entities to Domains/Sub-domains
    • Adding Data Products to Domains/Subdomains
    • Creating a draft Data Asset
    • Adding a Data Contract - Default Settings
    • Adding a Data Contract - Freshness Test
    • Adding a Data Contract - Column Tests
    • Publishing the Data Asset
  • 🏛️Governance
    • Governance module
    • Classification Policies
    • Auto-classify data assets
  • ☑️Approval Workflow
    • What are Change Requests?
    • Initiate a change request
    • What are Access Requests?
    • Initiate an Access Request
  • 📋Reports
    • Overview of Reports
    • Supported sources for Reports
    • Asset Report: Data Quality Scorecard
  • 📊Dashboard
    • Dashboard Overview
    • Incidents
    • Quality
  • ⏰Alert Notifications
    • Get alerts on email
    • Connect your Slack channels
    • Connect to Microsoft Teams
    • Webhooks integration
  • 🏛️Manage Access
    • User Management - Overview
    • Invite users
    • Deactivate or re-activate users
    • Revoke a user invite
  • 🔐Group-based Access Controls
    • Groups Management - Overview
    • Create Groups & Assign Policies
    • Source-based Policies
    • Administrative-based Policies
    • Module-based Policies
    • What is the "Owners" group?
  • 🗄️Org Settings
    • Multi-factor authentication
    • Single Sign-On (SSO) with Microsoft
    • Single Sign-On (SSO) with JumpCloud
  • ❓Support
    • Supported Features by Integration
    • Frequently Asked Questions
    • Supported Browsers and System Requirements
  • Public API (BETA)
    • Overview
      • Data API
        • Glossary
        • Lineage
        • ACL
          • Group
      • Control API
        • Users
    • API Keys
Powered by GitBook
On this page
  • Step 1: Set up
  • Step 2: Configure: Scheduled Monitor
  • SQL Expression
  • Smart Training
  • Get Notified/Custom Alert
  • Step 2: Configure: On-demand Monitor
  1. Data Quality

Set Up Freshness & Volume Monitors

Here's how you set up Freshness & Volume Scheduled and On Demand Monitors

PreviousCatalog: Add/Modify MonitorNextSet Up Field Health Monitors

Last updated 3 months ago

Our Freshness monitors observe the time elapsed since the last update or insertion in a table, learning from the update frequency and alerting if delays arise. Similarly, Volume monitors gauge the number of rows added to a table. If the row count differs significantly from expected patterns based on past data, an incident is triggered.

The process to set up monitoring for Freshness and Volume monitors (both Scheduled and On Demand) is alike. To illustrate, let's walk through setting up monitoring using Freshness as our example.

Begin by selecting "Freshness" card from Create subtab on Config

Set up monitoring pop up for scheduled monitors is the same for both Volume and Freshness Monitors.

Once selected, you’ll be redirected to the “Create a New Monitor” form.

  • The “Create a New Monitor” form consists of two steps:

    1. Set up

    2. Configure

The form fields will become available as you select the mandatory options.

Step 1: Set up

  • Select the Source and Schema selection is optional

  • Select the Dataset

  • Choose Monitor mode: Scheduled or On-Demand

  • Enable “Grouped By” (if applicable) by toggling the switch.

  • Select the column for grouping and click Validate.

  • A success message (“Column is valid to be grouped by”) confirms validation.

  • Click “Proceed to Monitor Setup” to move to the next step.

Step 2: Configure: Scheduled Monitor

  • Once you proceed to setup, you’ll reach the “Configure” page, where you can review your previous selections.

In the “Configure” popup, users must complete the required fields to save their preferences and set up their Freshness/Volume monitor. These required fields include:

  • Can create multiple tests for each test type per column/table

  • Able to add a test name to differentiate monitors created

  • Monitor Name

  • Monitor Description is Optional.

  • Row Creation: Select the Row creation from given options:

    • Timestamp (Select Timestamp from the dropdown column)

    • Validation for SQL Expression (when SQL Expression is chosen)

    • All Records

    • Enable Smart Training(Optional): Train your monitor on historical data to reduce the training period

SQL Expression

SQL Expressions allow you to convert your non-standard timestamp formats into timestamp when setting up scheduled or on-demand monitors. This allows you to utilize the incremental monitor scanning functionality based on a given timestamp column.

When using SQL Expression:

  • Validating your query is compulsory.

  • Ensure your query is written in the SQL dialect compatible with your linked data source (e.g., BigQuery, PostgreSQL, Snowflake).

  • SQL Expressions are typically used after the WHERE clause to filter records.

Common Use Cases & Examples

  • Converting a String Column to a Timestamp

    • Some datasets store timestamps as strings, requiring conversion.

For example:

BigQuery- CAST(your_string_column AS DATETIME)

PostgreSQL- your_string_column::timestamp

  • Converting Unix Timestamp to a Datetime

    • If timestamps are stored as Unix epoch time (seconds or milliseconds), conversion is required.

For example:

BigQuery- TIMESTAMP_SECONDS(your_unix_column)

PostgreSQL- TO_TIMESTAMP(your_unix_column)

  • Combining Separate Date and Time Columns

    • Some databases store date and time in separate columns, requiring concatenation.

For example:

(use date_column || ' ' || time_column)

When Should You Use SQL Expressions?

  • Use SQL Expressions when:

    • Your timestamps are stored as strings instead of actual timestamp values.

    • Your dataset uses Unix timestamps instead of standard date/time formats.

    • You have separate date and time columns that need to be combined.

  • If your data already contains a proper timestamp column, you can simply select it without needing an SQL Expression.

Smart Training

Smart Training automatically learns patterns from historical data to enhance row creation and threshold settings. It helps users define dynamic monitoring parameters by analyzing past trends, improving anomaly detection and data quality insights.

Smart Training is applicable for both Row Creation and Threshold settings.

To activate Smart Training in Row Creation:

  • Users should initially select the timestamp.

  • If SQL Expression is chosen for row creation, users are required to validate the SQL Expression.

When Smart Training is activated for Row Creation, it enables the option to choose the Lookback Period. It's important to note that the Lookback Period becomes selectable only when Smart Training is enabled.

Get Notified/Custom Alert

  • To set custom alerts, first, turn on the "Notify default channel" toggle. This enables users to specify their preferred alert channels, such as email or Slack.

    • Select the desired alert channels from the dropdown.

    • Enter the email address or Slack channel name in the provided field.

  • Finally, specify the Incident Level.

  • Click on Submit to create your monitor successfully.

  • Once the monitor is created, you will be redirected to the ALL MONITORS tab.

Step 2: Configure: On-demand Monitor

Note: Key Differences:

i. The "Frequency" field is not applicable for On-Demand monitors and is therefore neglected.

ii. The "Enable Smart Training" and "Auto Threshold" options are not considered when setting up an On-Demand monitor.

iii. "Grouped By" is not available in On-Demand monitor mode.

  • Select On-Demand as the Monitor Mode and click “Proceed to Monitor Setup”.

  • Within the 'Configure' popup, users must complete the required fields to save their preferences and successfully set up their Freshness/Volume Monitor. The required fields include:

  • Can create multiple tests for each test type per column/table

  • Able to add a test name to differentiate monitors created

  • Monitor Name

  • Monitor Description is Optional.

  • Row Creation: Select the Row creation from given options:

    • Timestamp (Select a timestamp column from the dropdown)

    • Validation for SQL Expression (when SQL Expression is chosen)

  • Lookback Period

  • Incident Levels

Custom Notifications: Custom alerts can be configured as in scheduled monitors.

Mention the link to the Get Notified/Custom Alert

Finalizing On-Demand Monitor Setup

  • Upon completing the required fields to create an On Demand monitor for Freshness/Volume, users can complete the setup process by choosing one of the following confirmation buttons based on their preferred use case:

    • Save: Creates the monitor without running it. This is an applicable option for users who wish to set up an On-Demand monitor without running the monitor scan immediately after creation.

    • Save and run : Runs the On-Demand Monitor immediately upon creation. To run the monitor again later, users can navigate to All Monitors.

    • After selecting the above option you will be redirected to ALL MONITORS tab.

Modify Monitoring

To modify an existing monitor:

  1. Click the ellipsis (︙) and select View Monitor.

  2. Click on Run once to run the monitor manually.

For a detailed understanding of monitor modes, check out

For a detailed understanding of Grouped By, check out

Frequency (Learn more about )

When working with Google BigQuery, you can review the provided documentation for further details .

Go to

✅
Available Monitor Modes
Grouped-By Monitors
Custom Scheduling For Monitors
here
All Monitors.
Overview for selecting monitor card
Overview for different steps in the form
Overview for setting up monitor
Grouped-by toggle disabled
Grouped-by toggle enabled with success message
Overview for selecting Monitor mode
Overview for previous selection
Overview of Configuring monitor
Overview for setting notification
Overview for selecting Monitor Mode
Overview for Modify form through All Monitors