LAB: Loading Data into a Redshift Cluster
Introduction
In this lab, you work as a Database Administrator and manage your company's Redshift cluster. The Development team has requested a way to import data into the Redshift cluster from either an S3 bucket or DynamoDB table. They have provided you with a small sample of the data to be imported from the S3 bucket or DynamoDB table into Redshift.
Solution
Log in to the live AWS environment using the credentials provided. Use an incognito or private browser window to ensure you're using the lab account rather than your own.
Make sure you're in the N. Virginia (us-east-1
) region throughout the lab.
Open a terminal session and log in to the provided EC2 instance via SSH using the credentials listed on the lab page:
Configure AWS CLI
In the AWS console, click the cloud_user username in the upper right corner.
Select My Security Credentials.
Under Access keys for CLI, SDK, & API access, click Create access key.
Save Access Key and Secret Access Key to a scratch pad or download the CSV file.
In the terminal, configure the AWS CLI client:
Paste the Access Key into the first prompt.
In the second prompt, paste the Secret Access Key.
For Default region name, enter
us-east-1
.For Default output format, press Enter to leave it as
None
.
Prepare the Source Data
Download CSV-formatted data:
Download JSON-formatted data:
Create the S3 bucket and give it a globally unique name (e.g., having today's date and the current time at the end):
Load the data into the bucket, replacing
<BUCKET_NAME>
with your bucket name:Confirm
redshift-data.csv
file is listed:Create the DynamoDB table:
Confirm DynamoDB table was created:
Import the JSON data into the DynamoDB table (NOTE:
"UnprocessedItems": {}
will appear if successful.):Confirm it was imported:
Create IAM Role
In the AWS Management Console, navigate to IAM.
In the side menu, click Roles.
Click Create role.
Scroll down and select EC2 as the service.
Click the first EC2 use case.
Click Next: Permissions.
On the Attach permissions policies page, in the Filter policies box, search for and select each of the following managed policies:
AmazonS3ReadOnlyAccess
AmazonDynamoDBReadOnlyAccess
Click Next.
For Role name, enter "redshift-import".
Click Create role.
On the IAM > Roles page, select the newly created redshift-import role.
On the redshift-import Summary page, click the Trust relationships tab.
Click Edit trust relationship.
Change the
"Service"
line to:Click Update Policy.
Load the Data into the Redshift Cluster
Navigate to Redshift and access the dashboard.
Select our listed cluster.
Click Actions > Manage IAM roles.
In Available IAM roles, open the dropdown menu and select redshift-import.
Click Associate IAM role.
Click Save changes.
In the terminal, list the Redshift clusters:
In the
"Endpoint"
section, copy the listedAddress
within the quotation marks.Paste the endpoint name in the
PGHOST
environment variable:List the IAM roles:
Copy the value of the
"Arn"
for theredshift-import
role, and paste it into a text file.List the S3 buckets:
Copy the bucket name and paste it into a text file, if you haven't already.
Echo
$PGHOST
:Connect to the cluster:
At the prompt, enter the password:
Create the table:
Navigate to Amazon Redshift, select your cluster, go under Actions and choose Manage IAM Roles. Select the redshift-import IAM role, click Associate IAM role, and then click Save changes.
Import data from the S3 bucket, replacing
<BUCKET_NAME>
with your bucket name and<IAM_ROLE_ARN>
with the ARN you copied:Query the table to verify rows were inserted:
Clear the
users_import
table:Confirm table status:
Import the data from DynamoDB, replacing
<IAM_ROLE_ARN>
with the ARN you copied:Query the table to ensure rows were inserted:
Last updated
Was this helpful?