# Databricks

{% hint style="info" %}
The Databricks connection supports connecting to the Unity Catalog, or the legacy Hive metastore.
{% endhint %}

## Supported Capabilities

{% tabs %}
{% tab title="Supported Capabilities" %}
**General**

* **Metadata** — metadata extraction and display of asset information (tables, columns, schemas). Types collected: Schema, Table, Column, Data Job, Data Run, Data Task
* **Profiling** — data profiling on the Profiler tab
* **Preview** — sample data preview
* **Data Quality** — data quality monitoring and observability
* **View Table** — view tables, which are virtual tables based on SQL queries

**Data Quality Monitors**

* Freshness
* Volume
* Field Health
* Custom SQL
* Schema Drift
* Job Failure

**Lineage**

* **View Table Lineage** — tracks virtual tables (views) and their data dependencies
* **Databricks Job Lineage** — tracks data movement and transformations through Databricks pipeline jobs
  {% endtab %}

{% tab title="Not Supported" %}
**General**

* Configurable Collection
* External Table

**Lineage**

* External Table Lineage
* SQL Query Lineage
* Foreign Key Lineage
* Stored Procedure Lineage
  {% endtab %}
  {% endtabs %}

## Connection Requirements

## Adding a Databricks connection on decube.

From the Add New Connection page, find the Databricks connector logo to go to the connection page.

<figure><img src="/files/zppAlndJ2iPo8YdiUDq4" alt=""><figcaption></figcaption></figure>

## Required Information

1. Workspace URL - [jump to section](#getting-the-workspace-url)
2. Personal Access Token - [jump to section](#getting-a-personal-access-token)
3. SQL Warehouse HTTP Path - [jump to section](#getting-the-sql-warehouse-http-path)

### 1. Getting the Workspace URL

Go to your workspace and copy the url in the browser bar.

It should look like `<some_values>.cloud.databricks.com` . The entire url highlighted in the screenshot is the **Workspace URL** to be added to decube's form.

<figure><img src="/files/208Pkp3Apo3lBysRrFEZ" alt=""><figcaption></figcaption></figure>

### 2. Getting a Personal Access Token

The full documentation from Databricks can be found [here](https://docs.databricks.com/dev-tools/auth.html).

1. Navigate to your Users Settings page.

<figure><img src="/files/2oxl8XqGajSNcEjWKZBg" alt=""><figcaption></figcaption></figure>

2. Click on `Generate New Token` after navigating to the `Access Tokens` tab.

<figure><img src="/files/DTOnJvFcOGlMV1n4P2xs" alt=""><figcaption></figcaption></figure>

3. Give your token a name and specify the lifetime of the token. We suggest not specifying a lifetime to ensure uninterrupted service.
4. Once you click `Generate` ensure that you note down the token somewhere as you cannot retrieve it again. This is the **Access Token** for decube's form.

### 3. Getting the SQL Warehouse HTTP Path

1. Go to your `SQL` Workspace as shown and then Navigate to the `SQL Warehouses` section.

<figure><img src="/files/BXdTsFAaFC6qQF0BXWVN" alt=""><figcaption></figcaption></figure>

2. Either create a new SQL warehouse (recommended) or choose an existing SQL Warehouse to be used with decube and navigate to the `Connection Details` tab. You will see both the Server hostname which should match your Workspace URL and the HTTP Path of the SQL Warehouse

{% hint style="warning" %}
We recommend creating a serverless Databricks SQL Warehouse. Other solutions may cause failure during metadata ingestion or data monitoring due to warehouse warm-up time.
{% endhint %}

<figure><img src="/files/IZBvTruO7rjj0ldIr4Zg" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
If there is any issues or questions related with connecting your data sources, please reach out to us via our Live chat for support.
{% endhint %}

### FAQ

#### If I have multiple catalogs in my Databricks, how will that be handled?

{% hint style="info" %}
The Catalog is the first layer of the object hierarchy, used to organize your data assets in the Unity Catalog. Read more about the Unity Catalog [here](https://docs.databricks.com/en/data-governance/unity-catalog/index.html).
{% endhint %}

Decube ingests all Databricks catalogs upon adding the connection into a single data source.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.decube.io/warehouses/databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
