Set Up Field Health Monitors
Here's how you set up monitors for specific field tests, both On Demand and Scheduled.
Last updated
Here's how you set up monitors for specific field tests, both On Demand and Scheduled.
Last updated
NOTE: On Demand Monitors are not applicable for "Cardinality" test.
You can enable monitoring for field health by selecting the "Field Health" card in the Config landing page.
You can also activate field health monitoring within the "Asset Details" section of the Data Catalog Module. This can be done via the "Monitors" tab by selecting the "Configure Monitors" option.
Upon selecting the "Field Health" card, it's compulsory to first choose the data source. Only after making the data source selection, you then are able to pick the desired Schema (Optional) followed by your primary table.
Upon choosing your primary table, a set of columns from the selected table will be loaded enabling you to set up monitoring (Both Scheduled and On Demand). Concurrently, a list showcasing both active and inactive Scheduled and On Demand monitors that was previously configured for that specific chosen column will also be displayed.
On the list of tables loaded from the selected data source, users are able to set up/modify both On-Demand and Scheduled monitors.
For Scheduled Monitors: i. For tables that have never had Scheduled monitoring configured, a "Scheduled Monitoring" button will be displayed when initiating setup for the first time.
ii. For tables previously set up for Scheduled monitoring, you will notice that the "Scheduled Monitoring" button will be changed into "Modify Scheduled" state with a green indication.
For On-Demand Monitors: i. For tables that have never had On-Demand monitoring configured, a "On-demand monitoring" button will be displayed when initiating setup for the first time.
ii. For tables previously set up for On-Demand monitoring, you will notice that the "On-demand monitoring" button.
You can choose which test types to run. Our system provides several test types mentioned below:
Null%: Measures the percentage of null values in a column to identify data gaps or incomplete records.
Not Null: Verifies that all values in a column are non-null, ensuring that critical fields contain valid data.
Unique%: Calculates the percentage of unique values in a column, helping detect redundancy or duplication.
Unique: Confirms whether all values in a column are unique, ensuring data consistency where required.
Average: Computes the average of numerical values in a column, useful for analyzing trends or outliers.
Min: Identifies the minimum value in a column to detect anomalies or validate data boundaries.
Max: Identifies the maximum value in a column to verify data limits or detect errors.
Cardinality: Measures the number of distinct values in a column, useful for understanding data variability or relationships.
String Length: Validates the length of strings in a column to ensure compliance with formatting rules or constraints.
Is Email: Checks if the values in a column conform to a valid email format, ensuring proper data integrity for email fields.
Is UUID: Verifies whether the values in a column match a valid UUID (Universal Unique Identifier) format.
Matches Regex: Validates values in a column against a user-defined regular expression to ensure they match specific patterns or rules.
Important Note for Synapse/SQL Server Users:
When setting up the Match REGEX threshold for data quality monitoring:
• Synapse/SQL Server
requires string
matching patterns instead of standard REGEX syntax.
• Ensure that the value entered in the “Set Threshold” field aligns with the string pattern supported by Synapse/SQL Server
.
If the current preset field tests is not sufficient to run the specific test you require, you can also create a test via custom SQL script.
Upon configuring monitoring for the first time on the table asset you aim to monitor, a popup will appear upon clicking the "Scheduled monitoring" option on your selected column.
Within the "Set Up Monitoring" popup, users must complete the necessary fields to save their preferences and successfully set up their field health monitor. These required fields include:
Row Creation
Timestamp (Select Timestamp from the dropdown column)
Validation for SQL Expression (when SQL Expression is chosen)
Frequency (Learn more about Custom Scheduling below)
Threshold (if Enable Machine Learning is deactivated)
Incident Levels.
To set custom alerts, you must first turn on the "Notify" toggle. Activating this will allow users to specify their desired alert channels, be it via emails or Slack channels.
When using SQL Expression, validating your query is compulsory. Ensure that your query is written in the dialect compatible with your linked data source, as illustrated below:
Google BigQuery - CAST(your_string_column AS DATETIME)
PostgreSQL - your_timestamp_column::timestamp
When working with Google BigQuery, you can review the provided documentation for further details here.
Smart Training requires Row Creation to be selected.
To activate Smart Training in Row Creation:
Users should initially select the timestamp.
If SQL Expression is chosen for row creation, users are required to validate the SQL Expression.
Upon completing the required fields, users can click on "Save Preferences
." Subsequently, the freshly configured monitor will be displayed in the list of table properties for the chosen table.
The Freshness Set Up Monitoring pop up for On-Demand field test monitoring typically differ slightly from those used in scheduled monitoring. Users must complete the following required fields in order to successfully create an On Demand Freshness monitor.
Row Creation ('All Records' option is neglected)
Timestamp (Select a timestamp column from the dropdown)
Validation for SQL Expression (when SQL Expression is chosen)
Lookback Period
Threshold
Incident Levels.
Note:
i. The "Frequency
" field is not relevant for configuring any On Demand monitors and is therefore neglected.
ii. "Smart Training"
and "Auto Threshold"
options is neglected when it comes to setting up any On Demand Monitor.
Upon completing the required fields to create an On Demand monitor for field health, users can complete the set up process by choosing either of the following confirmation buttons based on their preferred use case:
Save Preferences
: This is an applicable option for users who wish to set up an on demand monitor without running the monitor scan immediately after creation.
Save & Run:
This option is applicable for users who wish to run the On Demand Monitor immediately upon creation. For the next time users wish to run the on demand monitor again, users can navigate to the All Monitor's page and select "Run manually" through the ellipsis (︙) menu.
On the Set Up Monitoring pop up for both On Demand and Scheduled monitors, can also toggle the notification settings as well as custom alerts for each monitor modes.
The newly created Field Health monitor (both Scheduled and On-Demand) will appear on the list of loaded columns upon selecting a primary table, they can also be found on the "All Monitor's Tab".
This provides a streamlined approach for users who prefer not to reselect the data source, test type, and , schema and table when revisiting their recently created on-demand/scheduled monitor.
If you have any modification to be made on your Scheduled or On Demand field health monitor, simply click "Modify Scheduled" or "Modify On Demand".
Monitors modification can also be made at All Monitor's Tab.