Decube
Try for free
  • 🚀Overview
    • Welcome to decube
    • Getting started
      • How to connect data sources
    • Security and Compliance
    • Data Policy
    • Changelog
    • Public Roadmap
  • Support
  • 🔌Data Warehouses
    • Snowflake
    • Redshift
    • Google Bigquery
    • Databricks
    • Azure Synapse
  • 🔌Relational Databases
    • PostgreSQL
    • MySQL
    • SingleStore
    • Microsoft SQL Server
    • Oracle
  • 🔌Transformation Tools
    • dbt (Cloud Version)
    • dbt Core
    • Fivetran
    • Airflow
    • AWS Glue
    • Azure Data Factory
    • Apache Spark
      • Apache Spark in Azure Synapse
    • OpenLineage (BETA)
    • Additional configurations
  • 🔌Business Intelligence
    • Tableau
    • Looker
    • PowerBI
  • 🔌Data Lake
    • AWS S3
    • Azure Data Lake Storage (ADLS)
      • Azure Function for Metadata
    • Google Cloud Storage (GCS)
  • 🔌Ticketing and Collaboration
    • ServiceNow
    • Jira
  • 🔒Security and Connectivity
    • Enabling VPC Access
    • IP Whitelisting
    • SSH Tunneling
    • AWS Identities
  • ✅Data Quality
    • Incidents Overview
    • Incident model feedback
    • Enable asset monitoring
    • Available Monitor Types
    • Available Monitor Modes
    • Catalog: Add/Modify Monitor
    • Set Up Freshness & Volume Monitors
    • Set Up Field Health Monitors
    • Set Up Custom SQL Monitors
    • Grouped-by Monitors
    • Modify Schema Drift Monitors
    • Modify Job Failure Monitors (Data Job)
    • Custom Scheduling For Monitors
    • Config Settings
  • 📖Catalog
    • Overview of Asset Types
    • Assets Catalog
    • Asset Overview
    • Automated Lineage
      • Lineage Relationship
      • Supported Data Sources and Lineage Types
    • Add lineage relationships manually
    • Add tags and classifications to fields
    • Field Statistcs
    • Preview sample data
  • 📚Glossary
    • Glossary, Category and Terms
    • Adding a new glossary
    • Adding Terms and Linked Assets
  • Moving Terms to Glossary/Category
  • AI Copilot
    • Copilot's Autocomplete
  • 🤝Collaboration
    • Ask Questions
    • Rate an asset
  • 🌐Data Mesh [BETA]
    • Overview on Data Mesh [BETA]
    • Creating and Managing Domains/Sub-domains
    • Adding members to Domain/Sub-domain
    • Linking Entities to Domains/Sub-domains
    • Adding Data Products to Domains/Subdomains
    • Creating a draft Data Asset
    • Adding a Data Contract - Default Settings
    • Adding a Data Contract - Freshness Test
    • Adding a Data Contract - Column Tests
    • Publishing the Data Asset
  • 🏛️Governance
    • Governance module
    • Classification Policies
    • Auto-classify data assets
  • ☑️Approval Workflow
    • What are Change Requests?
    • Initiate a change request
    • What are Access Requests?
    • Initiate an Access Request
  • 📋Reports
    • Overview of Reports
    • Supported sources for Reports
    • Asset Report: Data Quality Scorecard
  • 📊Dashboard
    • Dashboard Overview
    • Incidents
    • Quality
  • ⏰Alert Notifications
    • Get alerts on email
    • Connect your Slack channels
    • Connect to Microsoft Teams
    • Webhooks integration
  • 🏛️Manage Access
    • User Management - Overview
    • Invite users
    • Deactivate or re-activate users
    • Revoke a user invite
  • 🔐Group-based Access Controls
    • Groups Management - Overview
    • Create Groups & Assign Policies
    • Source-based Policies
    • Administrative-based Policies
    • Module-based Policies
    • What is the "Owners" group?
  • 🗄️Org Settings
    • Multi-factor authentication
    • Single Sign-On (SSO) with Microsoft
    • Single Sign-On (SSO) with JumpCloud
  • ❓Support
    • Supported Features by Integration
    • Frequently Asked Questions
    • Supported Browsers and System Requirements
  • Public API (BETA)
    • Overview
      • Data API
        • Glossary
        • Lineage
        • ACL
          • Group
      • Control API
        • Users
    • API Keys
Powered by GitBook
On this page
  • Overview of Lineage Types
  • Foreign Key
  • ETL Lineage
  • SQL Lineage
  • BI Lineage
  • View Lineage
  • External Lineage (S3 to Snowflake)
  1. Catalog
  2. Automated Lineage

Lineage Relationship

In this document, we explore various types of data lineage and their significance in understanding data relationships.

Overview of Lineage Types

Each data source supports different types of lineage, ensuring transparency in data transformations, relationships, and reporting:

  • Foreign Key Lineage – Tracks relationships between tables via primary and foreign keys.

  • ETL Lineage – Shows how data moves and transforms through ETL processes.

  • SQL Lineage – Tracks data movement through SQL queries (SELECT, JOIN, INSERT, etc.).

  • BI Lineage – Traces data from sources to reports, dashboards, and BI tools.

  • View Lineage – Tracks virtual tables (views) and their data dependencies.

  • External Lineage – Tracks data movement across external platforms (e.g., AWS S3 to Snowflake).

Foreign Key

What it is:

A foreign key is a column in one table that refers to the primary key of another table, establishing a relationship between them. This lineage type visualizes table relationships in relational databases, ensuring referential integrity.

Example:

Consider a scenario where a sale is linked to a credit card transaction. The id column in the credit_card table serves as a foreign key for the payment_card_id column in the sale table. This relationship allows us to track transactions based on the payment method used.

Why it matters:

• Helps in understanding one-to-many or many-to-many table relationships.

• Ensures data consistency and integrity across tables.

ETL Lineage

What it is:

ETL (Extract, Transform, Load) lineage tracks how data is processed in ETL pipelines. It ensures transparency in data movement and transformation—helping users understand how raw data evolves into its final state.

Example:

In dbt (Data Build Tool), transformation jobs are frequently used to reshape data within a data warehouse. ETL lineage helps visualize these changes, showing the step-by-step evolution of data.

Why it matters:

• Provides a clear transformation path for debugging and optimization.

• Ensures trust in analytics and reporting by making data workflows transparent.

SQL Lineage

What it is:

SQL lineage maps the movement of data between tables and columns as SQL queries execute. It tracks data transformations performed by SQL commands such as SELECT, JOIN, INSERT, UPDATE, and DELETE.

Example:

If a report is generated based on a JOIN operation between multiple tables, SQL lineage will track which columns were used, modified, or derived in the process.

Why it matters:

• Helps in tracing data changes for auditing and compliance.

• Improves query optimization and debugging.

BI Lineage

What it is:

BI (Business Intelligence) lineage tracks how data moves from raw sources to reports, dashboards, and visualizations in a BI tool. It provides insights into data transformations within BI platforms.

Example:

If data from Snowflake (a data warehouse) is used in Looker Studio (a BI tool), BI lineage helps track the source-to-report journey, ensuring transparency in insights.

Why it matters:

• Helps users understand the accuracy of reports.

• Improves troubleshooting of incorrect metrics.

View Lineage

What it is:

A view is a virtual table created using SQL queries. View lineage is similar to SQL lineage but focuses specifically on data transformations within views.

Example:

If a database view is built using a SELECT query that joins multiple tables, view lineage tracks the origin of each field displayed in that view.

Why it matters:

• Ensures data consistency in virtual tables.

• Helps in understanding dependencies between views and base tables.

External Lineage (S3 to Snowflake)

What it is:

External lineage refers to data flow between different platforms and environments. This is crucial for tracking external data ingestion, ensuring an end-to-end view of data movement.

Example:

If data is ingested from AWS S3 into Snowflake, external lineage will capture:

✔ The source location in S3

✔ How data is processed before entering Snowflake

✔ Any transformations performed during ingestion

Why it matters:

• Helps in governance and security by tracking external data sources.

• Ensures completeness of data flow visualization.

PreviousAutomated LineageNextSupported Data Sources and Lineage Types

Last updated 3 months ago

📖