Set Up Grouped-by Monitors

Here's how you set up Grouped-by Monitors

By utilizing Group By monitors, data teams have the ability to highlight and define specific segments within a table, like those aggregated by values in a dimension column. Following this segmentation, monitors can be applied. This functionality enables teams to not only track the overall row count of a table but also monitor counts of its individual logical subdivisions.

The configuration process for Grouped-by monitors differs from the setup routine of other monitor types, here's how you begin setting up for Grouped-by monitors.

Setting Up Grouped-by monitoring

To configure Grouped-by monitors, begin by choosing the Grouped-by card on the config landing page. It's then imperative to select your data source. Following this, you have the option to pick a specific schema, though it's not mandatory, and then determine the tables. Once tables are selected, pinpoint your preferred column from the chosen table and select your preferred column fields.

Click "Proceed" post column fields selection, and this will navigate you to the "Set Up Monitoring" popup where the official configuration process commences.

Set Up Monitoring Pop Up

Grouped-by monitoring offers different setup processes due to its ability to configure various test types. Some of these test types require thresholds, while others do not.

Threshold Required TestsNon Threshold Required Tests

Null%

Freshness

Unique%

Volume

Average

Not Null

Min/Max

Uniqueness

String Length

Cardinality

Is Email

Is UUID

Matches Regex

Non-Threshold Required Tests

For test types that requires thresholds, the setup process mirrors that of setting up Field Health Monitors.

When you're setting up monitoring for a table asset for the first time, clicking on "Set up monitoring" will trigger a Set Up Monitoring popup for most configurations. However, when dealing with Volume/Freshness in Grouped-by monitoring, the setup process unfolds directly on the main monitoring setup page. This distinction arises because Grouped-by volume/freshness configuration is directly tied to the specific table and column you've chosen.

Let's dive into the setup process for threshold-required tests within Grouped-by monitoring, using Freshness as our example.

Set up monitoring process for non-threshold required test is identical to setting up monitoring for Freshness/Volume.

Within the "Set Up Monitoring" form for Grouped-by Freshness, you must complete the necessary fields to save their preferences and successfully set up their field health monitor. These required fields include:

  • Row Creation

  • Timestamp (Select Timestamp from the dropdown column)

  • Validation for SQL Expression (when SQL Expression is chosen)

  • Frequency

  • Incident Levels.

Within the collapsible section, a confirmation of the Grouped-by columns you've chosen will be visible. Additionally, within this area, you have the capability to select, deselect, and fetch values for the fields you've previously selected.

When using SQL Expression, validating your query is compulsory. Ensure that your query is written in the dialect compatible with your linked data source, as illustrated below:

Google BigQuery - CAST(your_string_column AS DATETIME)

PostgreSQL - your_timestamp_column::timestamp

When working with Google BigQuery, you can review the provided documentation for further details here.

Smart Training is applicable for both Row Creation and Threshold settings.

To activate Smart Training in Row Creation:

  • Users should initially select the timestamp.

  • If SQL Expression is chosen for row creation, users are required to validate the SQL Expression.

When Smart Training is activated for Row Creation, it enables the option to choose the Lookback Period. It's important to note that the Lookback Period becomes selectable only when Smart Training is enabled.

Upon completing the required fields, users can click on "Save Preferences." Subsequently, the freshly configured monitor will be saved.

Modify Monitoring (Non-Threshold tests)

Upon saving your monitoring preferences, the form will be converted into a modification state where the "Save preferences" button will transform into "Update changes" and "Delete Monitors" which indicates what your grouped-by freshness monitor has been successfully created.

After finalizing and saving your monitoring preferences, the interface will transition to a modification mode. Here, the "Save preferences" button evolves into "Update changes" alongside a new "Delete Monitors" option. This change signifies that your Grouped-by freshness monitor has been successfully established.

When setting up Grouped-by monitors, a unique combination of selected table, column, column fields, and test type can only be configured once. If you attempt to choose the same components again, the system will guide you into the modification workflow, as demonstrated here.

Threshold Required Tests

When it comes to configuring threshold-required tests, the procedure aligns closely with the setup process used for field health monitoring.

To better demonstrate the setup for grouped-by threshold-required tests, let's take "Null%" test type as our example.

Set Up Monitoring Pop Up

Upon configuring monitoring for the first time on the table asset you aim to monitor, a popup will appear upon clicking the "Set up monitoring" option.

Within the "Set Up Monitoring" popup, users must complete the necessary fields to save their preferences and successfully set up their Grouped-by monitor. These required fields include:

  • Row Creation

  • Timestamp (Select Timestamp from the dropdown column)

  • Validation for SQL Expression (when SQL Expression is chosen)

  • Frequency

  • Threshold (if Auto-Thresholding is deactivated)

  • Incident Levels.

When using SQL Expression, validating your query is compulsory. Ensure that your query is written in the dialect compatible with your linked data source, as illustrated below:

Google BigQuery - CAST(your_string_column AS DATETIME)

PostgreSQL - your_timestamp_column::timestamp

When working with Google BigQuery, you can review the provided documentation for further details here.

Smart Training is applicable for both Row Creation.

To activate Smart Training in Row Creation:

  • Users should initially select the timestamp.

  • If SQL Expression is chosen for row creation, users are required to validate the SQL Expression.

When Smart Training is activated for Row Creation, it enables the option to choose the Lookback Period. It's important to note that the Lookback Period becomes selectable only when Smart Training is enabled.

Upon completing the required fields, users can click on "Save Preferences." Subsequently, the freshly configured monitor will be displayed in the list of table properties for the chosen table.

List of table and columns loaded from the selected Grouped-by-table

The newly established Grouped-by Null% monitor will appear in the list of table properties for the chosen test type and is also accessible in the "All Monitors' Tab." This offers users a quick way to access their new monitor without reselecting the data source, test type, or table.

Last updated