Amazon Athena, Amazon S3, and VPC Flow Logs
During this hands-on project you will explore using Amazon Athena, Amazon S3, and VPC Flow Logs to deploy an easily searchable analytics platform using SQL-like queries.
Successfully complete this lab by achieving the following learning objectives.
1Create the Amazon S3 Bucket
- Create a new S3 bucket that is prefixed with - csaa-hol-
2Create the VPC Flow Log and Generate Records
- Create a brand new VPC Flow Log for the entire VPC. 
- Name the flow logs - vpc-to-s3
- Set the filter to - All
- Set a - 1 minuteaggregation interval
- Configure the flow log to be sent to your new Amazon S3 bucket 
- Use the AWS default format 
- Use the - Parquetlog file format
- Enable - Hive-compatible S3 prefixes
- Partition logs by every - 1 hour
- Browse to the DNS entry for - OurApplicationLoadBalancerusing HTTP and refresh a couple of times to generate traffic
- Wait a few minutes and refresh the S3 objects list until you start seeing objects generated before you move on 
3Set up Amazon Athena
- Change your Athena settings to save query results to your new Amazon S3 bucket and append - /queries/to the end of the bucket name (Example: _s3://csaa-hol-213432inh32i4/queries/)
- Create a new database called - vpc_flow_logs_dbwithin the- AwsDataCatalogdata source. SQL Query - Create Database
- Within the new database, create a new table called - vpc_flow_logs. SQL Query - Create Table
- After successful, update and repartition the table using MSCK. SQL Query - Repair Table and Update Partitions 
- Run a SQL query of your choosing! 
- If you receive a DDL error when attempting to run the MSCK command, execute the Fixing Errors steps and restart the Athena process. 
Last updated
Was this helpful?