Incidents Overview

Build trust in your data with proactive monitoring and intelligent incident management.

Transform Reactive Fire-Fighting into Proactive Data Trust

Decube's Data Quality module empowers your team to shift from reactive incident response to proactive data trust. Our ML-powered monitoring system detects anomalies before they impact downstream systems, AI/ML models, and business decisions.

Why Data Quality Monitoring Matters

  • 🚨 Early Detection: Catch data issues before they cascade downstream

  • 🤖 AI-Ready Data: Ensure clean inputs for accurate ML models and AI systems

  • 📊 Business Confidence: Make decisions based on trusted, validated data

  • ⚡ Faster Resolution: Automated alerts enable immediate response to quality issues


Getting Started with Data Quality

🚀 Quick Setup (15 minutes)

  1. Enable Asset Monitoring - Start monitoring your critical tables

  2. Set Up Alert Notifications - Get notified when issues occur

  3. Review Incidents - Learn to manage quality issues


Monitor Types & Capabilities

🔍 Available Monitor Types

Core Monitoring:

Advanced Monitoring:

  • Custom SQL - Write custom validation logic for specific business rules

  • Job Failure - Monitor ETL pipeline job execution

  • Grouped-By - Segment monitoring by dimension values

🎯 Monitor Modes

  • Scheduled: Continuous monitoring with configurable frequency

  • On-Demand: Manual execution for ad-hoc validation

Available Monitor Modes

Understanding Incidents

Understanding Incidents

Data Quality incidents are automatically triggered when monitors detect anomalies or threshold violations. Each incident provides detailed context to help you understand and resolve data quality issues quickly.

Incident Categories

Our monitoring system categorizes incidents into six main types:

Type

Purpose

Use Case

Freshness

Data update delays

Critical for real-time dashboards and daily reports

Volume

Unexpected row count changes

Detect missing data loads or data pipeline issues

Field Health

Column-level data quality

Validate nulls, uniqueness, ranges, and patterns

Schema Drift

Table structure changes

Prevent downstream application failures

Custom SQL

Business rule violations

Monitor complex business logic and data relationships

Job Failure

ETL pipeline failures

Ensure data transformation processes complete successfully

Incident Management Workflow

Incident Overview

The incident dashboard provides a consolidated view of all data quality issues across your organization.

Incident Overview Dashboard

Incident Details & Investigation

By selecting any incident from the Data Quality module, users will be redirected to the Incident Details page. This page provides you with a deeper understanding of and historical trend of the chosen incident.

Key Features:

  • 📋 Assignee Management: Assign incidents to team members for accountability

  • 📈 Historical Trends: View patterns and frequency of similar incidents

  • 🔍 Root Cause Analysis: Access detailed metrics and context

  • 📝 Audit Trail: Track all actions and changes in the incident lifecycle

Example of a Custom SQL incident — Add Assignee functionality, with actions tracked in Audit History

On the Incident Details page, you can add an Assignee to the selected incident. All such actions will subsequently be logged and can be reviewed in the Audit History section on the bottom right.

Incident Status Management

When an incident is raised, it creates an open incident. You can choose to either:

  • Close an incident when resolved

  • Mute it for a specified time period to prevent alert fatigue

Muting ensures you don't get duplicate alerts when another incident is triggered on the same table/column. Incidents are automatically unmuted after the time period you have set.

Example of a Custom SQL incident that is currently open on right panel

To view open, closed, or muted incidents, click 'Apply Filters' on the Incidents Overview page and select the appropriate checkboxes under 'Incident Status'.

Filter incidents by status

Advanced Investigation Tools

Historical Analysis

In the History tab, a list of past tests is shown, including the metrics of successful scans. This is a quick way to identify the scans that had failed along with the values that had caused the failed tests.

Use Cases:

  • 📊 Pattern Recognition: Identify recurring issues and trends

  • ⚖️ Threshold Validation: Verify if alert thresholds are appropriate

  • 🔄 Root Cause Analysis: Understand what changed to cause failures

Example of a History section for a Field Health incident.

Impact Assessment

Based on the downstream lineage, a list of impacted assets is able to be generated to show to the user the potential downstream tables or jobs or dashboards that may be affected by the incident. Users can then opt to export this list as a CSV and send them to the respective owners that were designated in the Catalog.

Key Benefits:

  • 🎯 Targeted Communication: Know exactly who to notify about data issues

  • 📈 Business Impact: Understand which reports and dashboards may be affected

  • ⚡ Faster Resolution: Prioritize fixes based on downstream impact

Example of the list of assets impacted by the Custom SQL incident
Continued list of assets impacted by the Custom SQL incident

Advanced Features & Configuration

🛠️ Setup & Configuration

Essential Setup Pages:

Enable asset monitoringConfig Settings

Monitor Configuration:

Custom Scheduling For Monitors

📊 Reporting & Analytics

Asset Report: Data Quality ScorecardQuality

🔧 Advanced Topics

Incident model feedback

Best Practices & Tips

🎯 Getting the Most from Data Quality Monitoring

Start Small, Scale Gradually:

  1. Begin with critical tables - Focus on business-critical data assets first

  2. Use default settings - Start with out-of-the-box configurations

  3. Monitor feedback - Adjust thresholds based on false positive rates

  4. Expand coverage - Gradually add more tables and custom rules

Alerting Strategy:

  • ⚠️ High Priority: Real-time alerts for business-critical data

  • 📧 Medium Priority: Daily digest for operational monitoring

  • 📋 Low Priority: Weekly reports for trend analysis

Team Collaboration:

  • 👤 Assign owners to critical data assets and monitors

  • 📝 Document context in incident descriptions and monitor names

  • 🔄 Regular reviews of monitor effectiveness and threshold accuracy

💡 Pro Tips

  • Use Custom SQL monitors for complex business rule validation

  • Set up Grouped-By monitoring for dimension-based quality checks

  • Leverage Smart Training to reduce false positives

  • Export incident impact lists to communicate with stakeholders


Need Help? Contact our support team at [email protected] with data quality monitoring setup and optimization.

Last updated