Amazon Athena, Amazon S3, and VPC Flow Logs

During this hands-on project you will explore using Amazon Athena, Amazon S3, and VPC Flow Logs to deploy an easily searchable analytics platform using SQL-like queries.

Successfully complete this lab by achieving the following learning objectives.

1Create the Amazon S3 Bucket

Create a new S3 bucket that is prefixed with csaa-hol-

2Create the VPC Flow Log and Generate Records

Create a brand new VPC Flow Log for the entire VPC.
Name the flow logs vpc-to-s3
Set the filter to All
Set a 1 minute aggregation interval
Configure the flow log to be sent to your new Amazon S3 bucket
Use the AWS default format
Use the Parquet log file format
Enable Hive-compatible S3 prefixes
Partition logs by every 1 hour
Browse to the DNS entry for OurApplicationLoadBalancer using HTTP and refresh a couple of times to generate traffic
Wait a few minutes and refresh the S3 objects list until you start seeing objects generated before you move on

3Set up Amazon Athena

Change your Athena settings to save query results to your new Amazon S3 bucket and append /queries/ to the end of the bucket name (Example: _s3://csaa-hol-213432inh32i4/queries/)
Create a new database called vpc_flow_logs_db within the AwsDataCatalog data source. SQL Query - Create Database
Within the new database, create a new table called vpc_flow_logs. SQL Query - Create Table
After successful, update and repartition the table using MSCK. SQL Query - Repair Table and Update Partitions
Run a SQL query of your choosing!
If you receive a DDL error when attempting to run the MSCK command, execute the Fixing Errors steps and restart the Athena process.

PreviousHosting a Wordpress Application on ECS Fargate with RDS, Parameter Store, and Secrets Manager NextCreating a CloudTrail Trail and EventBridge Alert for Console Sign-Ins

Last updated 7 months ago

Was this helpful?