Apache Iceberg (BETA)

This document provides a step-by-step process for connecting Decube to Apache Iceberg.

This is a connector in beta access. For issues or feedback, please email us at support@decube.io.

The Apache Iceberg connector is dependent on the Catalog type to be added. Currently, these are the Catalog types that are supported:

  • AWS Glue

AWS Glue Integration

Connecting Decube to Apache Iceberg in AWS Glue

  • This document outlines all the required steps to integrate Decube with an Apache Iceberg instance in AWS Glue Catalog.

Ensure that Apache Iceberg is properly set up in your AWS Glue instance before proceeding with this integration.

The process is divided into two sections:

  • Configuring AWS Credentials using AWS IAM

  • Setting up the connection on the Decube platform

Minimum Requirements

To configure the connection using AWS credentials, you’ll need the following:

  • AWS IAM User

Required access permissions:

Action Permissions

  • AWS Glue

    • glue:GetDatabases

    • glue:GetTables

    • glue:GetTable

  • AWS S3

    • s3:GetObject

  • AWS Key Management Service

    • Note: Required if table metadata is encrypted, otherwise it is optional.

    • kms:Decrypt

Resources Permissions

  • AWS Glue

    • arn:aws:glue:<region>:<account-id>:catalog

    • arn:aws:glue:<region>:<account-id>:database/<iceberg-database>

    • arn:aws:glue:<region>:<account-id>:table/*/*

  • AWS S3

    • arn:aws:s3:::<iceberg-bucket-name>

    • arn:aws:s3:::<iceberg-bucket-name>/*

  • AWS Key Management Service

    • Note: Required if table metadata is encrypted, otherwise it is optional.

    • arn:aws:kms:<region>:<account-id>:key/<key-id>

Configuring AWS Credentials using AWS IAM

  1. Log in to the AWS Console and navigate to IAM > Users > Create User.

  1. Attach policies by clicking Attach Policies and Create Policy, then use the JSON Editor to input the following policy details and press next. Then, name the policy and click Create Policy.

Note: You can also separate it into two different policies, e.g. one for Glue and another for S3. Make sure that those policies are attached to the IAM User in the next step.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabases",
                "glue:GetTables",
                "glue:GetTable",
                "s3:GetObject"
                "kms:Decrypt" ->Only required if table metadata is encrypted
            ],
            "Resource": [
                "arn:aws:glue:<region>:<account-id>:catalog",
                "arn:aws:glue:<region>:<account-id>:database/<iceberg-database>",
                "arn:aws:glue:<region>:<account-id>:table/*/*",
                "arn:aws:s3:::<iceberg-bucket-name>",
                "arn:aws:s3:::<iceberg-bucket-name>/*"
                "arn:aws:kms:<region>:<account-id>:key/<key-id>" ->Only required if table metadata is encrypted
            ]
        }
    ]
}
  1. After creating the policy, search for it, select it, and click Next.

  1. Review the details and click Create user.

  1. Once the user is created, generate an access key by selecting Create access key.

  1. Choose Application running outside AWS and click

  1. Save the access key and secret access key, as they will not be retrievable again.

Configuring the Decube platform

On the Decube connector page, navigate to Connect a new data source and select the Apache Iceberg connector.

For AWS Glue, you will need the following:

  • AWS Access Key ID

  • AWS Secret Access Key

  • AWS Region

Steps to connect Iceberg database:

  1. Click Test this connection to verify the connection between Decube and your Iceberg instance is successfully validated.

  2. Once the connection is successfully validated, additional configuration options will appear.

  3. Provide a name for the data source, then click Connect this data source to begin onboarding your metadata into Decube.

Last updated