Configure a profile run

Control which columns, rows, and data range are included when you generate a profile

When you generate a profile, you can configure exactly what gets profiled before the job starts. This lets you scope the profile to a specific set of columns, restrict rows to a date range, and control how the data is sampled.

Configuring a profile run is useful when:

  • You want to profile a large table without scanning the full dataset

  • You only care about recent data, such as rows added in the last 30 days

  • You want to reduce query cost or execution time by scoping to specific columns

  • You need to compare profiles of the same table across different time windows

Run options

When you click Generate a new profile, you have three options:

Quick Run — Runs the profiler immediately with system recommended defaults.

Run with latest settings — Re-runs the profiler using the most recent configuration saved for that asset. If no previous configuration exists, it falls back to a "Quick Run" setting.

Configure profile run — Opens the configuration modal where you can customize column selection, date/time filters, and sampling strategy before starting the job.

circle-info

On assets with no previous runs, the option menu shows Quick Run and Configure profile run only.

Configuration modal

Selecting Configure profile run opens a modal with three independent sections.

Column selection

By default, all columns are included in the profile. Switch to Selected Columns to choose a specific subset.

When you select Selected Columns, a searchable dropdown appears where you can pick one or more columns to include. This is useful for wide tables where you only need statistics on a few columns, or when you want to avoid the cost of profiling columns that are not relevant to your analysis.

circle-info

Column selection controls which columns appear in the profiling output — it does not affect which rows are scanned.

Date/time filter

Toggle Filter data by date/time to restrict the profile to rows within a specific date range. When enabled, you must select a timestamp column and define the range.

Timestamp column — Choose a date or datetime column from the asset. Only date and datetime columns are shown.

Date range type — Choose between Absolute and Relative:

  • Absolute — Specify a fixed start and end date. When you re-run with saved settings, the same fixed dates are used each time. Use this when you want to profile a specific historical window.

  • Relative — Define a rolling window such as "Last 7 Days". The window is recalculated from the current time each time the profile runs. Use this when you want the profile to always reflect recent data.

For absolute ranges, quick-select options are available to populate the date fields automatically, such as Last 7 days or Last 30 days.

circle-info

If the asset has no date or datetime columns, the date/time filter toggle is disabled.

Sampling strategy

The sampling options available depend on whether the date/time filter is active.

Without a date/time filter

  • Auto (default) — The system selects the optimal sample size based on the total row count.

  • Sampling Percentage — Sample a percentage of the total dataset. Enter a value between 1 and 100.

  • Target Row Count — Sample approximately a specified number of rows. The final count may vary slightly depending on the database.

  • Full — Profile the entire dataset without sampling. Because this can be slow and resource-intensive on large tables, you must acknowledge the performance impact before running.

With a date/time filter active

When a date/time filter is applied, percentage-based and auto sampling are not available. The options switch to:

  • All matching rows (default) — Profile every row that falls within the date range. Because a broad range can match a large number of rows, you must acknowledge the performance impact before running.

  • Limit row count — Cap the number of rows returned from the filtered dataset. The rows returned are not guaranteed to be random — ordering depends on the database.

Not all sampling options are available for every source. See Sampling strategy support by source for details.

Viewing run settings on results

After a profile completes, the settings used for that run are displayed directly below the Generated profile heading in the results panel. The summary shows columns profiled, sampling strategy, and date filter if one was applied.

FAQ

What happens if I save a configuration with specific columns and one of those columns is later removed from the table?

If you use Run with latest settings and the saved configuration references columns that no longer exist, the profile job may fail. Review your saved settings and run a fresh Configure profile run to update the column selection.

Will Target Row Count return exactly the number of rows I entered?

Not always. The system uses random sampling methods that work at the block level, so the result is an approximation. The final row count may be slightly higher or lower than the value you entered.

Does Target Row Count return the most recent rows?

No. Target Row Count uses random sampling to return a representative subset of the full table. To restrict the profile to recent records, use the Date/Time Filter.

When using Limit row count with a date filter, are the rows randomly selected?

It depends on the database. Some sources return rows in a random order, while others return rows in insertion or physical order.

Last updated