Decube
Try for free
  • 🚀Overview
    • Welcome to decube
    • Getting started
      • How to connect data sources
    • Security and Compliance
    • Data Policy
    • Changelog
    • Public Roadmap
  • 🔌Data Warehouses
    • Snowflake
    • Redshift
    • Google Bigquery
    • Databricks
    • Azure Synapse
  • 🔌Relational Databases
    • PostgreSQL
    • MySQL
    • SingleStore
    • Microsoft SQL Server
    • Oracle
  • 🔌Transformation Tools
    • dbt (Cloud Version)
    • dbt Core
    • Fivetran
    • Airflow
    • AWS Glue
    • Azure Data Factory
    • Apache Spark
      • Apache Spark in Azure Synapse
    • OpenLineage (BETA)
    • Additional configurations
  • 🔌Business Intelligence
    • Tableau
    • Looker
    • PowerBI
  • 🔌Data Lake
    • AWS S3
    • Azure Data Lake Storage (ADLS)
      • Azure Function for Metadata
    • Google Cloud Storage (GCS)
  • 🔌Ticketing and Collaboration
    • ServiceNow
    • Jira
  • 🔒Security and Connectivity
    • Enabling VPC Access
    • IP Whitelisting
    • SSH Tunneling
    • AWS Identities
  • ✅Data Quality
    • Incidents Overview
    • Incident model feedback
    • Enable asset monitoring
    • Available Monitor Types
    • Available Monitor Modes
    • Catalog: Add/Modify Monitor
    • Set Up Freshness & Volume Monitors
    • Set Up Field Health Monitors
    • Set Up Custom SQL Monitors
    • Grouped-by Monitors
    • Modify Schema Drift Monitors
    • Modify Job Failure Monitors (Data Job)
    • Custom Scheduling For Monitors
    • Config Settings
  • 📖Catalog
    • Overview of Asset Types
    • Assets Catalog
    • Asset Overview
    • Automated Lineage
      • Lineage Relationship
      • Supported Data Sources and Lineage Types
    • Add lineage relationships manually
    • Add tags and classifications to fields
    • Field Statistcs
    • Preview sample data
  • 📚Glossary
    • Glossary, Category and Terms
    • Adding a new glossary
    • Adding Terms and Linked Assets
  • Moving Terms to Glossary/Category
  • AI Copilot
    • Copilot's Autocomplete
  • 🤝Collaboration
    • Ask Questions
    • Rate an asset
  • 🌐Data Mesh [BETA]
    • Overview on Data Mesh [BETA]
    • Creating and Managing Domains/Sub-domains
    • Adding members to Domain/Sub-domain
    • Linking Entities to Domains/Sub-domains
    • Adding Data Products to Domains/Subdomains
    • Creating a draft Data Asset
    • Adding a Data Contract - Default Settings
    • Adding a Data Contract - Freshness Test
    • Adding a Data Contract - Column Tests
    • Publishing the Data Asset
  • 🏛️Governance
    • Governance module
    • Classification Policies
    • Auto-classify data assets
  • ☑️Approval Workflow
    • What are Change Requests?
    • Initiate a change request
    • What are Access Requests?
    • Initiate an Access Request
  • 📑Data reconciliation
    • Adding a new recon
    • Understand your recon results
    • Supported sources for Recon
  • 📋Reports
    • Overview of Reports
    • Supported sources for Reports
    • Asset Report: Data Quality Scorecard
  • 📊Dashboard
    • Dashboard Overview
    • Incidents
    • Quality
  • ⏰Alert Notifications
    • Get alerts on email
    • Connect your Slack channels
    • Connect to Microsoft Teams
    • Webhooks integration
  • 🏛️Manage Access
    • User Management - Overview
    • Invite users
    • Deactivate or re-activate users
    • Revoke a user invite
  • 🔐Group-based Access Controls
    • Groups Management - Overview
    • Create Groups & Assign Policies
    • Source-based Policies
    • Administrative-based Policies
    • Module-based Policies
    • What is the "Owners" group?
  • 🗄️Org Settings
    • Multi-factor authentication
    • Single Sign-On (SSO) with Microsoft
    • Single Sign-On (SSO) with JumpCloud
  • ❓Support
    • Supported Features by Integration
    • Frequently Asked Questions
    • Supported Browsers and System Requirements
  • Public API (BETA)
    • Overview
      • Data API
        • Glossary
        • Lineage
        • ACL
          • Group
      • Control API
        • Users
    • API Keys
Powered by GitBook
On this page
  • Supported Capabilities
  • Minimum Requirement
  • Connection Options:
  • a. AWS Roles
  • b. Retrieving Access Keys from AWS
  • AWS KMS
  • Folder partition
  • Example 1 - Multiple Projects
  • Example 2 - Multiple Projects with Environments
  • Example 3 - Single Project
  • Example 4 - No Project
  • Upload project files
  • Connecting DBT Core with Decube
  • Additional configuration for lineage
  1. Transformation Tools

dbt Core

Connect your decube platform to dbt Core to see all data jobs in the Catalog and see end-to-end lineage.

Previousdbt (Cloud Version)NextFivetran

Last updated 1 day ago

Supported Capabilities

Data Quality
Capability

Freshness

Volume

Schema Drift

Field Health

Custom SQL

Job Failure

Catalog
Capability

Data Profiling

Data Preview

Data Recon
Capability

Add Recon

This documentation is on how to add a data source connection to dbt Core, which is the open source framework for dbt. If you are interested to connect to your dbt Cloud instance instead, please check out for dbt Cloud version.

Important: Our system does not parse or collect metadata from past/old DBT runs.

In order for metadata to be collected and properly ingested into our system, the DBT data job must be re-run to get the same day data.

Integrating DBT Core with Decube involves reading files from an AWS S3 bucket, which shares similarities with how AWS S3 itself connects to the platform.

  1. Set up an S3 bucket following the same procedure outlined in our documentation for .

  2. Define folder partitions (details will be provided in the following section).

  3. Upload the necessary files to those partitions.

A summary of steps to set up dbt core:

  1. Define folder partitions (details will be provided in the following section).

  2. Upload the necessary files to those partitions.

Following these steps, the metadata collector will connect to the S3 bucket and retrieve the data.

Minimum Requirement

Currently, only S3 storage is supported for DBT Core under the "Storage Type" dropdown.

To connect your AWS Glue to decube, we will need the following information:

Choose authentication method:

  • Select AWS Identity

  • Customer AWS Role ARN

  • Path

  • Region

  • Storage Type

  • Data source name

b. AWS Access Key:

  • Access Key ID

  • Secret Access Key

  • Path

  • Region

  • Storage Type

  • Data source name

where 'Path' follows these format: s3://some-bucket s3://some-bucket/path-to-dbt-core

Connection Options:

a. AWS Roles

This section will create a Customer AWS Role within your AWS account that has the right set of permission to access your data sources.

  • Step 1: Go to your AWS Account → IAM Module → Roles

  • Step 2: Click on Create role.

  • Step 3: Choose Custom trust policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "<DECUBE-AWS-IDENTITY-ARN>"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<EXTERNAL-ID>"
                }
            }
        }
    ]
}

  • Step 5: Click next to proceed to attach policy.

  • Step 6: Click on Attach Policies and Create Policy and choose JSON Editor. Input the following policy and press next, input the policy name of your choice and press Create Policy.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor0",
			"Effect": "Allow",
			"Action": [
				"s3:GetObject",
				"s3:ListBucket",
				"s3:ListAllMyBuckets"
			],
			"Resource": [
				"arn:aws:s3:::{bucket-name}",
				"arn:aws:s3:::{bucket-name}/*"
			]
		}
	]
}

b. Retrieving Access Keys from AWS

  • Step 1: Login to AWS Console and proceed to IAM > User > Create User

  • Extra Step: Click on Attach Policies and Create Policy and choose JSON Editor input the following policy and press next, input the policy name of your choice and press Create Policy

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor0",
			"Effect": "Allow",
			"Action": [
				"s3:GetObject",
				"s3:ListBucket",
				"s3:ListAllMyBuckets"
			],
			"Resource": [
				"arn:aws:s3:::{bucket-name}",
				"arn:aws:s3:::{bucket-name}/*"
			]
		}
	]
}

  • Step 2: Search for the policy you created just now, select it and press Next.

  • Step 3: Review and Create user.

  • Step 4: Navigate to the newly created user and click on Create access key

  • Step 5: Choose Application running outside AWS

  • Step 6: Save the provided access key and secret access key. You will not be able to retrieve these keys again.

AWS KMS

If the bucket intended to be connected to Decube is encrypted using a customer managed KMS key, you will need to add the AWS IAM user created above to the key policy statement.

  1. Login to AWS Console and proceed to AWS KMS > Customer-managed keys.

  2. Find the key that was used to encrypt the AWS S3 bucket.

  3. On the Key policy tab, click on Edit

  1. Assuming the user created is decube-s3-datalake

a. If there is not an existing policy attached to the key

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Allow decube to use key",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<AWSAccountID>:user/{decube-s3-datalake}"
                ]
            },
            "Action": "kms:Decrypt",
            "Resource": "*"
        }
    ]
}

b. If there is an existing policy, append this section to the Statement array:

{
    "Statement": [
        {
            "Sid": "Allow decube to use key",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<AWSAccountID>:user/{decube-s3-datalake}"
                ]
            },
            "Action": "kms:Decrypt",
            "Resource": "*"
        }
    ]
}
  1. Save Changes

Folder partition

  • Decube supports ingesting information from multiple dbt projects. You would need to structure the bucket using a format that we define based on the current date.

Given that base_path for a single project uses the following format:

  • base_path = ”${year}/${month}/${day}” where:

    • year = $(date +%Y)

    • month = $(date +%B)

    • day = $(date +%d)

  • Example of a folder partition on your S3 - s3://your-bucket/${base_path}

    • Where the full path of the folder could be s3://your-bucket/2024/May/01/

After setting up the format based on the current date partition, you can proceed to define your own structure.

decube currently supports reading two-level deep bucket structure. You could define how you would want to upload project files into separate directories.

Basically, all of the following are valid bucket path and you can refer to the examples below:

  • Assuming the run takes place on the 1st of May 2024:

Example 1 - Multiple Projects

  • project_a

    • year=2024

      • month=May

        • day=01

          • [location of project files]

  • project_b

    • Same as project_a

  • project_c

    • Same as project_a

Example 2 - Multiple Projects with Environments

  • dev

    • project_a

      • year=2024

        • month=May

          • day=01

            • [location of project files]

    • project_b

      • Same as project_a

    • project_c

      • Same as project_a

  • prod

    • project_a_prod

    • project_b_prod

    • …

Example 3 - Single Project

  • project_a

    • year=2024

      • month=May

        • day=01

          • [location of project files]

Example 4 - No Project

  • year=2024

    • month=May

      • day=01

        • [location of project files]

Upload project files

You would need to upload specific files from the target/ directory into the bucket after your dbt command has concluded.

    • dbt run —full-refresh

    • dbt build

  • catalog.json, which is only produced by docs generates and is optional. This is required if you want to acquire column metadata. The command can be run like so:

    • dbt docs generate

To ensure the collector runs successfully, you will need to upload in the following manner:

  • (in pair) manifest.json and run_results.json or

  • (in triplets) manifest.json and run_results.json and catalog.json.

Additional Notes

For uploading the project files, you may choose to do the following:

  • Only upload the latest project files to the specified bucket where there is only one set of manifest.json and run_results.json in that bucket for that folder partition at any time.

    • Caution: If you were to do it this way, you may lose out information of the runs before the latest project files are processed.

  • Retain a series of project files based on the timestamp of when it was run. For example, for each run append a timestamp after the filename:

    • Do: manifest_20240503142827.json

    • Do not: 20240503142827_manifest.json

    • Timestamped project file in this example was generated using the following commands:

      • Using timestamp=$(date +%Y%m%d%H%M%S) to create manifest_${timestamp}.json

Note: To ensure that each project is successfully collected by our metadata collector, we recommend uploading the manifest.json and run_results.json in the same folder. If you want to include column metadata, make sure you include catalog.json as well.

Sample Script

Here is a sample script for uploading the project files:

#!/bin/bash

# Project name
project_name=some_project

# Generate timestamp
export TZ=UTC
timestamp=$(date +%Y%m%d%H%M%S)

# Generate date-based directory structure
year=$(date +%Y)
month=$(date +%B)
day=$(date +%d)

# Define the base path for S3
base_path="${project_name}/${year}/${month}/${day}"

# Copy project files to S3 with the new structured path
aws s3 cp /path/to/target/manifest.json s3://some-bucket/${base_path}/manifest_${timestamp}.json
aws s3 cp /path/to/target/run_results.json s3://some-bucket/${base_path}/run_results_${timestamp}.json
aws s3 cp /path/to/target/catalog.json s3://some-bucket/${base_path}/catalog_${timestamp}.json

You may modify and integrate this into your existing workflows.

Connecting DBT Core with Decube

After following the above steps, you may start ingesting the metadata from your DBT Core bucket into decube by navigating to My Account > Data Sources Tab > Connect A New Data Source > DBT Core.

where 'Path' follows these format: s3://some-bucket s3://some-bucket/path-to-dbt-core

Please provide the required credentials and click "Test this connection" to verify their validity. Afterward, assign a name to your data source, and by selecting the "Connect This Data Source" option, your connection between DBT Core and Decube will be successfully established.

Additional configuration for lineage

Set up an S3 bucket following the same procedure outlined in our documentation for .

a. :

The path spec above will be created during the step.

Step 4: Specify the following as the trust policy, replacing DECUBE-AWS-IDENTITY-ARN and EXTERNAL-ID with values from .

project_a/2024/May/01 -

level1/2024/May/01 -

dev/project_a/2024/May/01 -

level1/level2/2024/May/01 -

the_project/2024/May/01 -

2024/May/01 -

manifest.json, which is generated by . Here is an example of a command that generates the file:

This contains a full representation of your dbt project's resources (models, tests, macros, etc), including all node configurations and resource properties.

run_results.json, which is generated by a few commands such as build, compile, and run just to name a few (you can refer to the ). Here is an example of a command that generates the file:

This contains information about a completed invocation of dbt, including timing and status info for each node (model, test, etc) that was executed.

This contains information from your about the tables and produced and defined by the resources in your project.

Please be aware in order for the lineage to connect successfully with accuracy, you would need to on your dbt project.

Once you have connected your dbt core, you will then need to map the connection sources to the data sources on the decube platform. Refer how to do that in .

🔌
this documentation
AWS S3
AWS S3
any command that parses your project
single file
documentation
file
file
data warehouse
views
configure the source tables
this documentation
AWS Identity
Upload Project Files
Example 1
Example 1
Example 2
Example 2
Example 3
Example 4
❌
❌
❌
❌
❌
✅
❌
❌
❌
Generating a Decube AWS Identity