# Redshift

## Supported Capabilities

{% tabs %}
{% tab title="Supported Capabilities" %}
**General**

* **Metadata** — metadata extraction and display of asset information (tables, columns, schemas). Types collected: Schema, Table, Column, View, Data Job, Data Run, Data Task
* **Sync Objects Descriptions** — syncs object descriptions from Redshift to the Catalog
* **Profiling** — data profiling on the Profiler tab
* **Preview** — sample data preview
* **Data Quality** — data quality monitoring and observability
* **Configurable Collection** — selective ingestion of schemas/workspaces in Data Source Management
* **View Table** — view tables, which are virtual tables based on SQL queries
* **Stored Procedure** — stored procedures (precompiled SQL; listed as Data Jobs in Metadata)

**Data Quality Monitors**

* Freshness
* Volume
* Field Health
* Custom SQL
* Schema Drift
* Job Failure

**Lineage**

* **View Table Lineage** — tracks virtual tables (views) and their data dependencies
* **SQL Query Lineage** — maps data movement through SQL queries (SELECT, JOIN, INSERT, etc.)
* **Foreign Key Lineage** — tracks relationships between tables via primary and foreign keys
* **Stored Procedure Lineage** — tracks data flow through stored procedures as they execute
  {% endtab %}

{% tab title="Not Supported" %}
**General**

* External Table

**Lineage**

* External Table Lineage
  {% endtab %}
  {% endtabs %}

## Connection Requirements

### Allowing Access

To allow our connector to access your Redshift instance, you will need to either:

1. Allow public access
2. Connect through a SSH Bastion

### Allowing Public Access

You can still limit who can connect to your Redshift instance through security-group inbound rules when you enable public access.

Go to Actions > Modify publicly accessible setting

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-74ac39461a6abc28ace672b2756de87c586593cf%2Fimage.png?alt=media" alt=""><figcaption><p>Ref 1.1</p></figcaption></figure>

Check `Turn on Publicly accessible` and select an Elastic IP address

<div align="center"><figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-27682223570c75bfca545bd1a06a87bf664ae575%2Fimage.png?alt=media" alt=""><figcaption><p>Ref 1.2</p></figcaption></figure></div>

Navigate to the `Properties` tab

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-cda73bb94e645f15e486a537d7ee97247dc66b65%2Fimage.png?alt=media" alt=""><figcaption><p>Ref 1.3</p></figcaption></figure>

Scroll down to the `Network and security settings` and click through to your security group

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-15a6b4917b6a0663fb1d9ce2babd2564f3c54f2b%2Fimage.png?alt=media" alt=""><figcaption><p>Ref 1.4</p></figcaption></figure>

Navigate to the `Inbound rules` tab and click `Edit inbound rules`

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-f4c62269fcf2703a49fbbbcf4bc9a3dabe1a2eb8%2Fimage.png?alt=media" alt=""><figcaption><p>Ref 1.5</p></figcaption></figure>

Click `Add rule` and in `Type` choose Redshift and in the `Source` section, add all of Decube's collector IPs.

{% hint style="info" %}
You will need to post-fix the IP with /32 to limit it to only that IP. I.e. `xxx.xxx.xxx.xxx/32`
{% endhint %}

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-34533aa20cab8d3d54f319befb2bbbd53189a563%2Fimage.png?alt=media" alt=""><figcaption><p>Ref 1.5</p></figcaption></figure>

{% hint style="warning" %}
Be careful with modifying inbound rule policies. It can affect connectivity within your own VPC if you remove existing rules.
{% endhint %}

### SSH Bastion

You can also use a SSH Bastion if enabling public access is not an option. Setting up a Bastion host is out of the scope of this guide but you can refer to [SSH Tunneling](https://docs.decube.io/how-to-connect-data-sources/enabling-vpc-access/ssh-tunneling) guide for more information.

Once you have setup a Bastion host, modify your Redshift security group inbound rule (refer to Ref 1.5) to allow source connection from your Bastion host's private IP address instead.

### Connection Details

Connecting to decube is as easy as providing us with credentials to your Redshift database. At a minimum, we require

* `username`
* `password`
* `host address`
* `host port`
* `database name`

<figure><img src="https://1779874722-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTw0qpCVzfrIXqS4FEg4T%2Fuploads%2Fgit-blob-67f1db079271c4264c3eea52ee6ca361f42e1420%2Fimage.png?alt=media" alt=""><figcaption><p>Amazon Redshift</p></figcaption></figure>

The `source name` will be for you to differentiate and recognize particular sources within the decube application.

We strongly encourage you to create a decube **read-only** user for this credential purpose, which you can follow [these steps](#custom-user-for-decube).

### **Security Concerns**

### Custom User for decube

We highly recommend that you create a Read-Only user for decube. We have prepared a script that you may run on your Redshift database to create this user.

1. Create a New User for decube

```
CREATE USER decube PASSWORD 'a_new_password';
```

2\. Add access to SYSLOG to build lineage and ingest Stored Procedures.

```
ALTER USER decube WITH SYSLOG ACCESS UNRESTRICTED;
```

3. Add access to information\_schema.

```
GRANT USAGE ON SCHEMA information_schema TO decube;
GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO decube;
```

4\. You may need to run this per schema that you have based on the default behavior of the schema.

```
GRANT USAGE ON SCHEMA <schema_name> TO decube;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO decube;
```
