Databricks

Adding Databricks to your decube connections helps your team to find relevant datasets, understand their quality via incident monitoring and apply governance policies via our data catalog.

The Databricks connection supports connecting to the Unity Catalog, or the legacy Hive metastore.

Supported Capabilities

Capability

Supported

Metadata Extraction

✅

Metadata Types Collected

Schema, Table, Column

Data Profiling

✅

Data Preview

✅

Data Quality

✅

Configurable Collection

❌

External Table

❌

View Table

✅

Stored Procedure

❌

Data Quality Support

Capability

Supported

Freshness

✅

Volume

✅

Field Health

✅

Custom SQL

✅

Schema Drift

✅

Job Failure

❌

Lineage Support

Capability

Supported

View Table Lineage

✅

External Table Lineage

❌

SQL Query Lineage

❌

Foreign Key Lineage

❌

Stored Procedure Lineage

❌

Connection Requirements

Adding a Databricks connection on decube.

From the My Account page, select the Databricks tile to be brought to the Databricks connection form.

Required Information

Workspace URL - jump to section
Personal Access Token - jump to section
SQL Warehouse HTTP Path - jump to section
Catalog to scan

1. Getting the Workspace URL

Go to your workspace and copy the url in the browser bar.

It should look like <some_values>.cloud.databricks.com . The entire url highlighted in the screenshot is the Workspace URL to be added to decube's form.

2. Getting a Personal Access Token

The full documentation from Databricks can be found here.

Navigate to your Users Settings page.

Click on Generate New Token after navigating to the Access Tokens tab.

Give your token a name and specify the lifetime of the token. We suggest not specifying a lifetime to ensure uninterrupted service.
Once you click Generate ensure that you note down the token somewhere as you cannot retrieve it again. This is the Access Token for decube's form.

3. Getting the SQL Warehouse HTTP Path

Go to your SQL Workspace as shown and then Navigate to the SQL Warehouses section.

Either create a new SQL warehouse (recommended) or choose an existing SQL Warehouse to be used with decube and navigate to the Connection Details tab. You will see both the Server hostname which should match your Workspace URL and the HTTP Path of the SQL Warehouse

We recommend creating a serverless Databricks SQL Warehouse. Other solutions may cause failure during metadata ingestion or data monitoring due to warehouse warm-up time.

If there is any issues or questions related with connecting your data sources, please reach out to us via our Live chat for support.

4. Catalog to scan

The Catalog is the first layer of the object hierarchy, used to organize your data assets in the Unity Catalog. Read more about the Unity Catalog here.

This is the name of first layer of the the Unity Catalog's three-level namespace that would be ingested into the Decube Catalog. Add the correct name to the connection form to ensure that schemas and tables in the selected Catalog is ingested as a source.

Supported Capabilities

Data Quality

Capability

Freshness

✅

Volume

✅

Schema Drift

✅

Field Health

✅

Custom SQL

✅

Job Failure

❌

Catalog

Capability

Data Profiling

✅

Data Preview

✅

Data Recon

Capability

Add Recon

❌

PreviousGoogle Bigquery NextAzure Synapse

Last updated 12 days ago