AWS Glue
Last updated
Last updated
To connect your AWS Glue to decube, we will need the following information:
Create an IAM user for us with AWSGlueServiceRole
IAM user's Access Key
IAM user's Secret Key
Glue Region
Login to AWS Console and proceed to IAM > User > Create User
Click on attach policies directly and search for AWSGlueServiceRole
Review and create your user
Navigate to the newly created user and click on Create access key
Choose Application running outside AWS
Save the provided access key and secret access key. You will not be able to retrieve these keys again.
This section is applicable if you intend to view lineages from your AWS Glue jobs. OpenLineage is an open framework for data lineage collection and analysis. At its core is an extensible specification that systems can use to interoperate with lineage metadata.
Follow below steps to enable OpenLineage on AWS Glue:
In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
Use the URL directly from Maven Central openlineage-spark
Ensure you select the version for Scala 2.12, as Glue Spark is compiled with Scala 2.12, and version 2.13 won't be compatible.
On the page, for the specific OpenLineage version for Scala 2.12, copy the URL of the jar file from the Files row and use it in Glue.
Alternatively, upload the jar to an S3 bucket and use its URL. The URL should use the s3
scheme: s3://<your bucket>/path/to/openlineage-spark_2.12-<version>.jar
In the same Job details tab, add a new property under Job parameters:
Use the format param1=value1 --conf param2=value2 ... --conf paramN=valueN
.
Make sure every parameter except the first has an extra -conf
in front of it.
Example: spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener --conf spark.openlineage.transport.type=http --conf spark.openlineage.transport.url=https://integrations.decube.io --conf spark.openlineage.transport.endpoint=/integrations/apache_spark/webhook/<webhook-uuid> --conf spark.openlineage.transport.auth.type=api_key --conf spark.openlineage.transport.auth.apiKey=<webhook-key>
Add the --user-jars-first parameter and set its value to true
To confirm that OpenLineage registration has been successful, check the logs for the following entry:
If you see this log message, it indicates that OpenLineage has been correctly registered with your AWS Glue job.
Insert the "access key" and "secret key" with "region" of the connection form, then test the connection. If it is successful, you can now add the name and connect to the data source.