# Databricks

{% hint style="info" %}
The Databricks connection supports connecting to the Unity Catalog, or the legacy Hive metastore.
{% endhint %}

## Supported Capabilities

{% tabs %}
{% tab title="Supported Capabilities" %}
**General**

* **Metadata** — metadata extraction and display of asset information (tables, columns, schemas). Types collected: Schema, Table, Column, Data Job, Data Run, Data Task
* **Profiling** — data profiling on the Profiler tab
* **Preview** — sample data preview
* **Data Quality** — data quality monitoring and observability
* **View Table** — view tables, which are virtual tables based on SQL queries

**Data Quality Monitors**

* Freshness
* Volume
* Field Health
* Custom SQL
* Schema Drift
* Job Failure

**Lineage**

* **View Table Lineage** — tracks virtual tables (views) and their data dependencies
* **Databricks Job Lineage** — tracks data movement and transformations through Databricks pipeline jobs
  {% endtab %}

{% tab title="Not Supported" %}
**General**

* Configurable Collection
* External Table

**Lineage**

* External Table Lineage
* SQL Query Lineage
* Foreign Key Lineage
* Stored Procedure Lineage
  {% endtab %}
  {% endtabs %}

## Connection Requirements

## Adding a Databricks connection on decube.

From the Add New Connection page, find the Databricks connector logo to go to the connection page.

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-5eb38b09721d3bc598e7538b1a20956e695f2551%2FSCR-20251024-lscx.png?alt=media" alt=""><figcaption></figcaption></figure>

## Required Information

1. Workspace URL - [jump to section](#getting-the-workspace-url)
2. Personal Access Token - [jump to section](#getting-a-personal-access-token)
3. SQL Warehouse HTTP Path - [jump to section](#getting-the-sql-warehouse-http-path)

### 1. Getting the Workspace URL

Go to your workspace and copy the url in the browser bar.

It should look like `<some_values>.cloud.databricks.com` . The entire url highlighted in the screenshot is the **Workspace URL** to be added to decube's form.

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-d49f8a888e7e5ce4cc79154290accc44c20ce15c%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

### 2. Getting a Personal Access Token

The full documentation from Databricks can be found [here](https://docs.databricks.com/dev-tools/auth.html).

1. Navigate to your Users Settings page.

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-162dcb9116e3b8d7a3a9d61a9be8855556654e25%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

2. Click on `Generate New Token` after navigating to the `Access Tokens` tab.

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-180832cc3362e417cc66ce32c35645fee42bcd9a%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Give your token a name and specify the lifetime of the token. We suggest not specifying a lifetime to ensure uninterrupted service.
4. Once you click `Generate` ensure that you note down the token somewhere as you cannot retrieve it again. This is the **Access Token** for decube's form.

### 3. Getting the SQL Warehouse HTTP Path

1. Go to your `SQL` Workspace as shown and then Navigate to the `SQL Warehouses` section.

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-612ab893281eeb8dd27ddf173019dbb6a56634b2%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

2. Either create a new SQL warehouse (recommended) or choose an existing SQL Warehouse to be used with decube and navigate to the `Connection Details` tab. You will see both the Server hostname which should match your Workspace URL and the HTTP Path of the SQL Warehouse

{% hint style="warning" %}
We recommend creating a serverless Databricks SQL Warehouse. Other solutions may cause failure during metadata ingestion or data monitoring due to warehouse warm-up time.
{% endhint %}

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-5d30c167a0b39e7adc6584c811ba65ab9f3a88a4%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
If there is any issues or questions related with connecting your data sources, please reach out to us via our Live chat for support.
{% endhint %}

### FAQ

#### If I have multiple catalogs in my Databricks, how will that be handled?

{% hint style="info" %}
The Catalog is the first layer of the object hierarchy, used to organize your data assets in the Unity Catalog. Read more about the Unity Catalog [here](https://docs.databricks.com/en/data-governance/unity-catalog/index.html).
{% endhint %}

Decube ingests all Databricks catalogs upon adding the connection into a single data source.
