Frequently Asked Questions (FAQ)

Find quick answers to common questions about our platform, features, and support. If you can’t find what you’re looking for, feel free to reach out to our support team.

Data Quality

Why isn’t my monitor triggering any Incidents?

Check if any tests are assigned to the monitor. Monitors need at least one failing test to trigger an incident.

Which monitors when generating incidents has an incident preview?

  • Only Scheduled Field Health monitors provide an incident preview.

  • On-demand monitors do not have an incident preview, as they are designed for one-time runs based on user configuration.

    • Firstly, Decube does not store any raw data for Preview. All previews run are a one-time query to your data.

    • When setting up an On-demand test, the user sets a lookback period, which we will run a lookback of x hours/days of data. This means that when you set a lookback period of 1 hour, the monitor will run a check of the data of the past 1 hour and raise an incident if they are any.

    • From then, once time has passed, the on-demand test incident preview may not provide an accurate representation of the data that triggered the incident failure.


Catalog

How often does Decube refresh the data catalog?

Decube refreshes the Catalog on an hourly basis. Throughout the hour, we perform automated metadata ingestion on your data sources to keep your Catalog updated.

Why are some of my columns not appearing in the Field Statistics?

The current version of our profiler does not support the profiling of date, time and datetime columns. If this is something you wish to see in the future, please let us know via the Live chat.


Config

What are the differences between enabled and disabled Smart Training?

When Smart Training is enabled (e.g., for Row Creation):

  • The monitor immediately scans historical data based on available records.

  • It uses the confidence level to determine an appropriate threshold.

  • This allows the test to be created and run quickly, often without needing to wait for multiple executions.

When Smart Training is disabled:

  • The monitor must run a few times to collect enough data to establish a baseline.

  • This can take several days, as it doesn’t scan historical data retrospectively.

  • During this period, the test may be marked as "Skipped" until enough data is available.

We recommend enabling Smart Training to accelerate the learning process and reduce the time needed to generate meaningful test results.

Why is the test status marked as Skipped if Smart Training is not enabled?

Even though the monitor is running, the test status may appear as "Skipped" because there is no historical data collected yet. Without Smart Training enabled, the system requires a few days to gather enough historical data before it can generate and execute the test. Until sufficient data is available, the test cannot be performed, so Decube marks it as "Skipped".

What does the Freshness Monitor actually do?

The Freshness Monitor is designed to track how frequently a table is updated, not just when it was first created.

Does Freshness Monitor only check based on the day-1 of the row creation column? What if our table is updated multiple times a day?

If your table is updated multiple times a day (e.g., hourly), the Freshness Monitor will compare the last scan time with the most recent update timestamp of the table. This allows you to detect if data is being refreshed within the expected frequency.

Can we check same-day freshness using a scheduled monitor?

Yes, you can monitor same-day freshness by configuring the monitor to run on a scheduled basis (e.g., hourly, every few hours). The monitor will validate if the table has been updated within the defined freshness threshold.

How do the monitors for Schema Drift and Job Failures work?

Schema Drift

  • Decube begins by ingesting metadata on a scheduled basis from connected data sources.

  • It then compares the latest schema from the current scan with the schema from the previous scan.

  • If any structural changes are detected — such as columns being added, removed, renamed, or data types being changed — a Schema Drift incident is automatically created.

Job Execution

  • Decube monitors data job executions by collecting metadata that includes the run results, whether a job has passed or failed.

  • For example, Decube ingests the dbt Core manifest stored in S3, then uses this manifest along with job metadata to build data lineage. During each scheduled metadata ingestion (e.g., hourly), Decube reads the job run status to check for any failures.

  • This enables teams to proactively monitor their pipelines and get notified when a job fails, helping to maintain the reliability of data operations.

In the Data Catalog, the default sorting order is Relevance. How is this calculated? Can it be changed to A-Z?

The Relevance sorting is based on several search factors, including how well the search term matches metadata fields such as the asset’s name, description, and more. This is powered by the Elasticsearch engine, a widely used industry-standard search technology.

We default to Relevance to ensure more accurate and useful search results. If we defaulted to A-Z, the top results would be sorted alphabetically, not by how closely they match your search query — making it harder to find what you're actually looking for.

Currently, it's not possible to change the default sorting order, but you can manually sort by A-Z after performing a search.


Data Governance

🧐 Is PII data masked in Decube?

Yes, once you have applied a classification to these PII data, it will be masked in Decube when user is viewing the Preview through the Catalog.

Last updated