OpenLineage (BETA)

This document provides a step-by-step guide to connecting with the OpenLineage connector and viewing lineage data from jobs using the OpenLineage framework.

Supported Capabilities

Capability

Supported

Metadata Extraction

✅

Metadata Types Collected

Schema, Virtual Table, Virtual Column, Data Job, Data Task, Data Run

Data Profiling

❌

Data Preview

❌

Data Quality

❌

Configurable Collection

❌

External Table

❌

View Table

❌

Stored Procedure

❌

Data Quality Support

Capability

Supported

Freshness

❌

Volume

❌

Field Health

❌

Custom SQL

❌

Schema Drift

❌

Job Failure

✅

Connection Requirements

Step 1: Go to My Account and click on the Data Sources tab

Step 2: Scroll down to the Connect a new data source section
Step 3: Click on the OpenLineage icon

Step 4: Enter a name for the data source and click Submit

Step 5: A Webhook UUID and an API Key will be provided. Copy them into your connector’s configuration settings.

Webhook Endpoint

Payload must submitted to the following endpoint:

https://integrations.<region>.decube.io/integrations/openlineage/webhook/<webhook-uuid>

Submitting Payload to OpenLineage Webhook

If you're using these tools, please follow the respective documentation in the OpenLineage website.

Tool

Documentation

Airflow

https://openlineage.io/docs/integrations/airflow/usage

Apache Spark

https://openlineage.io/docs/integrations/spark/configuration/usage

Apache Flink

https://openlineage.io/docs/integrations/flink/configuration

Custom Integration

If you want to create your own integration for your tools, follow these steps:

Submit the webhook payload to the above endpoint.
Use the Bearer token system for authentication.

Example request:

curl -X POST \
   -H "Authorization: Bearer <api-key>" \
   -H "Content-Type: application/json" \
   --data '{}' \
   https://integrations.<region>.decube.io/integrations/openlineage/webhook/<webhook-uuid>

PreviousApache Spark in Azure Synapse NextAdditional configurations

Last updated 5 hours ago