Decube
Try for free
  • 🚀Overview
    • Welcome to decube
    • Getting started
      • How to connect data sources
    • Security and Compliance
    • Data Policy
    • Changelog
    • Public Roadmap
  • 🔌Data Warehouses
    • Snowflake
    • Redshift
    • Google Bigquery
    • Databricks
    • Azure Synapse
  • 🔌Relational Databases
    • PostgreSQL
    • MySQL
    • SingleStore
    • Microsoft SQL Server
    • Oracle
  • 🔌Transformation Tools
    • dbt (Cloud Version)
    • dbt Core
    • Fivetran
    • Airflow
    • AWS Glue
    • Azure Data Factory
    • Apache Spark
      • Apache Spark in Azure Synapse
    • OpenLineage (BETA)
    • Additional configurations
  • 🔌Business Intelligence
    • Tableau
    • Looker
    • PowerBI
  • 🔌Data Lake
    • AWS S3
    • Azure Data Lake Storage (ADLS)
      • Azure Function for Metadata
    • Google Cloud Storage (GCS)
  • 🔌Ticketing and Collaboration
    • ServiceNow
    • Jira
  • 🔒Security and Connectivity
    • Enabling VPC Access
    • IP Whitelisting
    • SSH Tunneling
    • AWS Identities
  • ✅Data Quality
    • Incidents Overview
    • Incident model feedback
    • Enable asset monitoring
    • Available Monitor Types
    • Available Monitor Modes
    • Catalog: Add/Modify Monitor
    • Set Up Freshness & Volume Monitors
    • Set Up Field Health Monitors
    • Set Up Custom SQL Monitors
    • Grouped-by Monitors
    • Modify Schema Drift Monitors
    • Modify Job Failure Monitors (Data Job)
    • Custom Scheduling For Monitors
    • Config Settings
  • 📖Catalog
    • Overview of Asset Types
    • Assets Catalog
    • Asset Overview
    • Automated Lineage
      • Lineage Relationship
      • Supported Data Sources and Lineage Types
    • Add lineage relationships manually
    • Add tags and classifications to fields
    • Field Statistcs
    • Preview sample data
  • 📚Glossary
    • Glossary, Category and Terms
    • Adding a new glossary
    • Adding Terms and Linked Assets
  • Moving Terms to Glossary/Category
  • AI Copilot
    • Copilot's Autocomplete
  • 🤝Collaboration
    • Ask Questions
    • Rate an asset
  • 🌐Data Mesh [BETA]
    • Overview on Data Mesh [BETA]
    • Creating and Managing Domains/Sub-domains
    • Adding members to Domain/Sub-domain
    • Linking Entities to Domains/Sub-domains
    • Adding Data Products to Domains/Subdomains
    • Creating a draft Data Asset
    • Adding a Data Contract - Default Settings
    • Adding a Data Contract - Freshness Test
    • Adding a Data Contract - Column Tests
    • Publishing the Data Asset
  • 🏛️Governance
    • Governance module
    • Classification Policies
    • Auto-classify data assets
  • ☑️Approval Workflow
    • What are Change Requests?
    • Initiate a change request
    • What are Access Requests?
    • Initiate an Access Request
  • 📑Data reconciliation
    • Adding a new recon
    • Understand your recon results
    • Supported sources for Recon
  • 📋Reports
    • Overview of Reports
    • Supported sources for Reports
    • Asset Report: Data Quality Scorecard
  • 📊Dashboard
    • Dashboard Overview
    • Incidents
    • Quality
  • ⏰Alert Notifications
    • Get alerts on email
    • Connect your Slack channels
    • Connect to Microsoft Teams
    • Webhooks integration
  • 🏛️Manage Access
    • User Management - Overview
    • Invite users
    • Deactivate or re-activate users
    • Revoke a user invite
  • 🔐Group-based Access Controls
    • Groups Management - Overview
    • Create Groups & Assign Policies
    • Source-based Policies
    • Administrative-based Policies
    • Module-based Policies
    • What is the "Owners" group?
  • 🗄️Org Settings
    • Multi-factor authentication
    • Single Sign-On (SSO) with Microsoft
    • Single Sign-On (SSO) with JumpCloud
  • ❓Support
    • Supported Features by Integration
    • Frequently Asked Questions
    • Supported Browsers and System Requirements
  • Public API (BETA)
    • Overview
      • Data API
        • Glossary
        • Lineage
        • ACL
          • Group
      • Control API
        • Users
    • API Keys
Powered by GitBook
On this page
  • Supported Capabilities
  • Adding a Databricks connection on decube.
  • Required Information
  • 1. Getting the Workspace URL
  • 2. Getting a Personal Access Token
  • 3. Getting the SQL Warehouse HTTP Path
  • 4. Catalog to scan
  • Supported Capabilities
  1. Data Warehouses

Databricks

Adding Databricks to your decube connections helps your team to find relevant datasets, understand their quality via incident monitoring and apply governance policies via our data catalog.

PreviousGoogle BigqueryNextAzure Synapse

Last updated 3 months ago

The Databricks connection supports connecting to the Unity Catalog, or the legacy Hive metastore.

Supported Capabilities

Data Quality
Capability

Freshness

Volume

Schema Drift

Field Health

Custom SQL

Job Failure

Catalog
Capability

Data Profiling

Data Preview

Data Recon
Capability

Add Recon

Adding a Databricks connection on decube.

From the My Account page, select the Databricks tile to be brought to the Databricks connection form.

Required Information

  1. Catalog to scan

1. Getting the Workspace URL

Go to your workspace and copy the url in the browser bar.

It should look like <some_values>.cloud.databricks.com . The entire url highlighted in the screenshot is the Workspace URL to be added to decube's form.

2. Getting a Personal Access Token

  1. Navigate to your Users Settings page.

  1. Click on Generate New Token after navigating to the Access Tokens tab.

  1. Give your token a name and specify the lifetime of the token. We suggest not specifying a lifetime to ensure uninterrupted service.

  2. Once you click Generate ensure that you note down the token somewhere as you cannot retrieve it again. This is the Access Token for decube's form.

3. Getting the SQL Warehouse HTTP Path

  1. Go to your SQL Workspace as shown and then Navigate to the SQL Warehouses section.

  1. Either create a new SQL warehouse (recommended) or choose an existing SQL Warehouse to be used with decube and navigate to the Connection Details tab. You will see both the Server hostname which should match your Workspace URL and the HTTP Path of the SQL Warehouse

We recommend creating a serverless Databricks SQL Warehouse. Other solutions may cause failure during metadata ingestion or data monitoring due to warehouse warm-up time.

If there is any issues or questions related with connecting your data sources, please reach out to us via our Live chat for support.

4. Catalog to scan

This is the name of first layer of the the Unity Catalog's three-level namespace that would be ingested into the Decube Catalog. Add the correct name to the connection form to ensure that schemas and tables in the selected Catalog is ingested as a source.

Supported Capabilities

Data Quality
Capability

Freshness

Volume

Schema Drift

Field Health

Custom SQL

Job Failure

Catalog
Capability

Data Profiling

Data Preview

Data Recon
Capability

Add Recon

Workspace URL -

Personal Access Token -

SQL Warehouse HTTP Path -

The full documentation from Databricks can be found .

The Catalog is the first layer of the object hierarchy, used to organize your data assets in the Unity Catalog. Read more about the Unity Catalog .

🔌
here
here
jump to section
jump to section
jump to section
✅
✅
✅
✅
✅
❌
✅
✅
❌
✅
✅
✅
✅
✅
❌
✅
✅
❌
You will need to generate an access token and create a service principal to connect to Databricks.