Set Up Field Health Monitors

Follow these steps to configure monitors for specific field tests, available as On-Demand and Scheduled modes.

NOTE: On Demand Monitors are not applicable for "Cardinality" test.

Enabling Field Health Monitoring

To enable field health monitoring:

  1. Navigate to the Config landing page.

  2. Select the “Field Health” card.

Selecting Field Health Card

You can also activate field health monitoring within the "Asset Details" section of the Data Catalog Module. This can be done via the "Monitors" tab.

For more detailed information you can refer to below link

Catalog: Add/Modify Monitor

Once selected, you’ll be redirected to the “Create a New Monitor” form.

  • The “Create a New Monitor” form consists of two steps:

    Setup

    Configure

The form fields will become available as you select the mandatory options.

Step 1: Set-up

  • Choose the test type from dropdown

    • You can choose which test types to run. Our system provides several test types mentioned below:

      1. Null: Monitors null values with configurable thresholds (Absolute/Percentage/Auto modes)

      2. Not Null: Verifies that all values in a column are non-null, ensuring that critical fields contain valid data.

      3. Null%: Legacy test - use "Null" with Percentage mode instead

      4. Unique: Monitors duplicate values with configurable thresholds (Absolute/Percentage/Auto modes)

      5. Is Unique: Legacy test - use "Unique" with Percentage mode instead

      6. Unique%: Legacy test - use "Unique" with Percentage mode instead

      7. Average: Computes the average of numerical values in a column with range thresholds or auto-learning.

      8. Min: Identifies the minimum value in a column with range thresholds or auto-learning.

      9. Max: Identifies the maximum value in a column with range thresholds or auto-learning.

      10. Cardinality: Measures the number of distinct values in a column, useful for understanding data variability.

      11. String Length: Validates the length of strings with range thresholds or auto-learning.

      12. Email: Monitors invalid email addresses with configurable thresholds (Absolute/Percentage/Auto modes)

      13. Is Email: Legacy test - use "Email" with appropriate threshold mode instead

      14. UUID: Monitors invalid UUIDs with configurable thresholds (Absolute/Percentage/Auto modes)

      15. Is UUID: Legacy test - use "UUID" with appropriate threshold mode instead

      16. Regex Match: Validates values against a user-defined pattern with configurable thresholds (Absolute/Percentage/Auto modes)

  • Important Note for Synapse/SQL Server Users:

When setting up the Match REGEX threshold for data quality monitoring:

Synapse/SQL Server requires string matching patterns instead of standard REGEX syntax.

• Ensure that the value entered in the “Set Threshold” field aligns with the string pattern supported by Synapse/SQL Server.

If the current preset field tests is not sufficient to run the specific test you require, you can also create a test via custom SQL script.

Custom SQL Monitor

  • Select the data source & Filter by Schema is optional.

  • Select the dataset and Column you want to monitor.

  • Choose Monitor mode: Scheduled or On-Demand

For a detailed understanding of monitor modes, check out Available Monitor Modes

  • Enable “Grouped By” (if applicable) by toggling the switch.

  • Select the column for grouping and click Validate.

  • A success message (“Column is valid to be grouped by”) confirms validation.

Grouped-By Disabled
Grouped-by enabled
  • Click “Proceed to Monitor Setup” to move to the next step.

Choose Monitor Mode

Step 2: Configure: Scheduled Monitor

  • Once you proceed to setup, you’ll reach the “Configure” page, where you can review your previous selections.

Overview from previous selection

Within the "Configure" popup, users must complete the necessary fields to save their preferences and successfully set up their field health monitor. These required fields include:

  • Can create multiple tests for each test type per column/table

  • Able to add a test name to differentiate monitors created

  • Monitor Name

  • Monitor Description is Optional.

  • Row Creation: Select the Row creation from given options:

    • Timestamp (Select Timestamp from the dropdown column)

    • Validation for SQL Expression (when SQL Expression is chosen)

    • All Records

    • Enable Smart Training(Optional): Train your monitor on historical data to reduce the training period

  • Frequency (Learn more about Custom Scheduling For Monitors)

  • Threshold Mode: Select the appropriate threshold type based on your test:

    • Absolute (Row Count): Set thresholds based on number of rows (e.g., "fail when > 100 invalid rows")

      • Available for: Null, Unique, Email, UUID, Regex Match

      • Input: Non-negative integer values for min/max bounds

    • Percentage: Set thresholds based on percentage (e.g., "fail when > 5% nulls")

      • Available for: Null, Unique, Email, UUID, Regex Match, Null%, Unique%,

      • Input: Integer values between 0-100 for min/max bounds

    • Auto (Machine Learning): Let the system learn acceptable patterns from historical data

      • Only available for scheduled monitors

      • Available for: Null, Unique, Email, UUID, Regex Match, Average, Min, Max, String Length, Null%, Unique%

      • No manual threshold input required

    • Range: Set numeric range thresholds for statistical tests

      • Available for: Average, Min, Max, String Length

      • Input: Min/max bounds (at least one required)

  • Set Threshold: Configure min/max values based on selected threshold mode

    • At least one bound (min or max) is required

    • For percentage mode: values must be 0-100

    • For absolute mode: values must be non-negative integers

    • For range mode: constraints depend on test type

  • Quality Dimension (Optional) for more understanding refer to supported test types

When using SQL Expression, validating your query is compulsory. Ensure that your query is written in the dialect compatible with your linked data source, as illustrated below:

Google BigQuery - CAST(your_string_column AS DATETIME)

PostgreSQL - your_timestamp_column::timestamp

When working with Google BigQuery, you can review the provided documentation for further details here.

Smart Training requires Row Creation to be selected.

To activate Smart Training in Row Creation:

  • Users should initially select the timestamp.

  • If SQL Expression is chosen for row creation, users are required to validate the SQL Expression.

Regex Match Test: When configuring a Regex Match test, you'll specify the regex pattern in a separate field in the test configuration (not in the threshold). The threshold controls how many non-matching values will trigger an incident.

Overview for setting-up frequency
Overview of set-up monitor with supported quality dimension

  • To set custom alerts, you must first turn on the "Notify default channel" toggle. Activating this will allow users to specify their desired alert channels, be it via emails or Slack channels.

    • You can select the desired alert channels in the dropdown.

    • Mention the address or channel name in the field.

  • At last Specify the Incident Level.

Setting-up notification/custom alert
  • Click on Submit and your monitor is created successfully.

  • Once monitor is created successfully you will be redirected to ALL MONITORS tab.

Step 2: Configure: On-demand Monitor

Note: Key Differences: i. The "Frequency" field is not relevant for configuring any On Demand monitors and is therefore neglected.

ii. “Enable Smart Training”"Auto Threshold" options is neglected when it comes to setting up any On Demand Monitor.

iii. Grouped By is not available for On-demand monitor mode.

  • Select On-Demand as the Monitor Mode and click “Proceed to Monitor Setup”.

Choosing monitor mode
  • Within the "Configure" popup, users must complete the necessary fields to save their preferences and successfully set up their field health monitor. These required fields include:

  • Can create multiple tests for each test type per column/table

  • Able to add a test name to differentiate monitors created

  • Monitor Name

  • Monitor Description is Optional.

  • Row Creation: Select the Row creation from given options:

    • Timestamp (Select a timestamp column from the dropdown)

    • Validation for SQL Expression (when SQL Expression is chosen)

    • All Records

  • Lookback Period

  • Threshold Mode: Select the appropriate threshold type:

    • Absolute (Row Count): For Null, Unique, Email, UUID, Regex Match

    • Percentage: For Null, Unique, Email, UUID, Regex Match, Null%

    • Range: For Average, Min, Max, String Length

    • Note: Auto threshold is not available for on-demand monitors

  • Set Threshold: Configure min/max values (at least one required)

  • Quality Dimension (Optional) for more understanding refer to supported test types

  • Incident Levels

On-demand monitor configuration

Custom Notifications: Custom alerts can be configured as in scheduled monitors.

Get Notified/Custom Alert

Finalizing On-Demand Monitor Setup

  • Upon completing the required fields to create an On Demand monitor for field health, users can complete the setup process by choosing one of the following confirmation buttons based on their preferred use case:

    • Save: Creates the monitor without running it. This is an applicable option for users who wish to set up an on demand monitor without running the monitor scan immediately after creation.

    • Save and run : This option is applicable for users who wish to run the On Demand Monitor immediately upon creation. For the next time users wish to run the on demand monitor again, users can navigate to the All Monitor's.

    • After selecting the above option you will be redirected to ALL MONITORS tab.

  • Modify Monitoring

To modify an existing monitor:

  1. Click the ellipsis (︙) and select View Monitor.

  2. Click on Run once to run the monitor manually.

Modify Monitor from All Monitors

Last updated