Profiler source support and limitations
Understand profiler support, metric availability, and source limitations
Profiler behavior varies by source. This page explains how Decube calculates row counts, which metric types are available, and which source-specific limitations can affect profiling results.
How Decube profiles a table
When you generate a profile, Decube:
Retrieves the table row count.
Determines how to sample rows from the table based on the source and table size.
Calculates profiling metrics from that sample.
Stops the run if the profiling job exceeds the execution limit.
Profiling jobs have a 30-minute execution limit.
Very wide tables can also hit profiling limits at around 250 columns. This is a rough guideline because numeric columns generate more metrics than non-numeric columns.
Row count modes
The table row count shown in Profiler can be exact or estimated, depending on the data source.
Exact row count
An exact row count is calculated directly from the table at the time the profile runs. This gives the most accurate result, but it can be more expensive on large tables.
Estimated row count
An estimated row count is retrieved from source metadata instead of scanning the full table. This is faster, but it depends on the source's metadata being up to date.
If the source metadata is stale, the reported total row count can be lower than the sampling row count.
Metric availability by data type
Profiler always shows metric availability based on both the source and the column type.
Common metrics
These metrics are available for supported column types across most sources:
Null count
Null percentage
Unique count
Unique percentage
Numeric metrics
Numeric columns can also include:
Minimum value
Maximum value
Mean
Median
Standard deviation
Zero count
Zero percentage
Text metrics
Text columns can also include:
Minimum length
Maximum length
Mean length
Empty string count
Empty string percentage
Boolean metrics
Boolean columns can also include:
True count
True percentage
False count
False percentage
Date and time metrics
Date, time, and datetime columns can include:
Minimum value
Maximum value
Source support and limitations
PostgreSQL
Row count mode: Estimated
If PostgreSQL has not refreshed table statistics, the row count can return
-1Views are supported, but they always return a row count of
0
Redshift
Row count mode: Estimated
If Redshift has not refreshed table statistics, the row count can return
-1Views are supported, but they always return a row count of
0MEDIANis not supported for numeric metricsUnique count and unique percentage are not supported because they require expensive queries.
This may change in a future update where you can select the columns to be profiled to avoid full table query.
BigQuery
Row count mode: Exact
Views always return a row count of
0
Microsoft SQL Server
Row count mode: Estimated
TEXT,NTEXT, andIMAGEcolumns do not support count-based common metricsProfiling is supported on SQL Server 2016 and later.
Azure Synapse
Row count mode: Exact
BITis supported as a boolean typeMEDIANis not supported for numeric metricsGeneral wide-table guidance is around 250 columns, but Azure Synapse supports up to 150 columns because of engine limits
Azure Synapse also requires additional permissions for profiling. For setup steps, see Azure Synapse.
Databricks
Row count mode: Exact
Interval data types are not supported for profiling
AWS Glue
Row count mode: Exact
Row count is always reported as
0Decube does not run row count queries against Glue sources because files can be too large.
MySQL
Row count mode: Estimated
Boolean is not supported as a native data type
MEDIANis not supported for numeric metrics
SingleStore
SingleStore inherits the same profiling behavior as MySQL.
Oracle
Row count mode: Estimated
num_rowsis only populated after Oracle statistics are collected withDBMS_STATSCLOBandNCLOBmap to string types but are not supported for profilingBoolean is not supported
Snowflake
Row count mode: Exact
Unique count and unique percentage are not supported because they require expensive queries.
This may change in a future update where you can select the columns to be profiled to avoid full table query.
Profiling speed depends on warehouse size, table size, and column count
Large profiling jobs can time out before completion
FAQ
Why is the total row count lower than the sampling row count?
This usually means the source uses an estimated row count and the metadata has not been refreshed recently.
Why do some metrics show as not supported?
Metric support depends on the source engine, the column data type, and the cost of calculating the metric on that source.
Why did my profile run time out?
Large tables, wide schemas, and source-specific engine limits can increase query time. As a rough guideline, tables with around 250 columns can hit profiling limits, especially when many columns are numeric and generate more metrics. If the run exceeds the profiling execution limit, Decube stops the job.
Last updated