A GuardDuty finding tells you the what — an instance communicated with a known C&C address, a credential was used from an unusual IP. CloudTrail analysis tells you the everything-else: who made the call, from where, to what, at what time, and what happened as a result. Without the ability to query CloudTrail efficiently, every security incident becomes a time-consuming manual log scan that delays containment.
Two query interfaces dominate CloudTrail analysis: Amazon Athena for organizations that want flexibility and control, and CloudTrail Lake for those that want managed infrastructure with faster time-to-query. This guide covers both approaches, the most valuable queries for security investigation, and the retention architecture that ensures your logs are available when you need them.
Athena Setup for CloudTrail Logs
Setting up Athena for CloudTrail requires creating a table in the AWS Glue data catalog that points to your CloudTrail S3 bucket. AWS provides a CloudFormation template for creating this table in the CloudTrail console (under Event history → Create Athena table). The table uses partition projection to avoid Athena scanning all partitions — critical for cost control, since Athena charges per byte scanned.
Configure partition projection with the following fields: account ID, region, year, month, and day. This allows queries to specify a date range and Athena will only scan the relevant partitions. A query scoped to a specific account, region, and date range typically scans gigabytes rather than terabytes — a difference of two to three orders of magnitude in both cost and query time.
Store CloudTrail logs in a dedicated S3 bucket with a consistent prefix structure: AWSLogs/{accountId}/CloudTrail/{region}/{year}/{month}/{day}/. This structure aligns with the partition projection configuration and ensures queries can efficiently locate relevant files. Enable S3 Intelligent-Tiering on the bucket to automatically optimize storage costs for logs that transition from frequent to infrequent access.
Essential Investigation Queries
The following queries cover the most common security investigation scenarios. All assume a CloudTrail table named cloudtrail_logs with standard partition fields.
All API calls by a specific principal in a time window: Filter by useridentity.arn (for IAM roles/users) or useridentity.accountid (for cross-account investigation) and scope to the investigation window. This is the starting query for any credential compromise investigation — establish the complete blast radius of what the compromised identity did.
All actions on a specific resource: Filter by requestparameters containing the resource ARN or ID. This is valuable for investigating unauthorized access to a specific S3 bucket, EC2 instance, or database. Combine with errorcode IS NULL to find only successful operations, or include error codes to see failed access attempts.
New IAM entities created in a time window: Filter for eventname IN ('CreateUser', 'CreateRole', 'CreateGroup') in the investigation period. Attackers who establish persistence typically create new IAM entities — this query finds those entries quickly.
Cross-account activity from a specific source account: Filter useridentity.accountid = '{compromised_account_id}' and exclude the primary account ID from results. This reveals what cross-account roles the compromised identity used and what resources it accessed in other accounts.
All console logins in a period: Filter eventname = 'ConsoleLogin' and examine source IPs, user agents, and success/failure status. Brute force attacks appear as clusters of failed ConsoleLogin events followed by a success.
CloudTrail Lake
CloudTrail Lake stores CloudTrail events in a managed, queryable data store without requiring S3 delivery, Glue tables, or partition management. You query it using SQL through the CloudTrail Lake console or API. The tradeoff: CloudTrail Lake charges per GB of data scanned (queries) and per GB of data ingested, making it potentially more expensive than Athena for high-volume investigation scenarios but significantly faster to set up.
CloudTrail Lake event data stores can be configured to retain data for 7 years — aligning with most compliance retention requirements. Federated event data stores allow querying S3-delivered CloudTrail logs through the CloudTrail Lake interface, combining the cost benefits of S3 storage with the query convenience of CloudTrail Lake.
For incident response teams that need to investigate quickly without infrastructure setup, CloudTrail Lake's pre-configured event data store (enable it in the CloudTrail console under Lake) provides immediate query access to your management events without any additional configuration.
Forensic Investigation Workflow
When investigating a security incident, work through CloudTrail in a structured sequence. Start with the initial indicator (GuardDuty finding, billing anomaly, abuse report) to establish the investigation anchor. Query all API calls by the identified principal in the 48 hours before and after the indicator. From that call list, identify resource modifications — new IAM entities, modified security groups, new resource launches. For each modified resource, query its full history to determine its current state and any subsequent access. Finally, query for any new principals (users, roles) created during the investigation period that might represent attacker persistence mechanisms.
Related Reading
- CloudTrail best practices — log configuration and retention architecture
- Cloud forensics with CloudTrail — post-incident investigation procedures
- AWS incident response plan — building runbooks for cloud security events
FAQ
How much does Athena query cost for CloudTrail investigation?
Athena charges $5 per TB of data scanned. With proper partition projection, a query scoped to one account, one region, and one week of data typically scans 100MB–10GB depending on API call volume — costing $0.0005 to $0.05. Investigation queries across multiple accounts or time periods scan more data proportionally. Compress your CloudTrail logs (they compress at roughly 10:1) to reduce scan costs further.
What fields are most important in CloudTrail log analysis?
The most investigation-relevant fields are: eventtime (when), eventname (what API call), useridentity (who — includes ARN, type, session context), sourceipaddress (from where), requestparameters (what resources were targeted), responseelements (what was created or returned), and errorcode/errormessage (success or failure). Focus investigation queries on these fields before expanding to others.
Can CloudTrail logs be deleted by an attacker?
Yes, if the S3 bucket lacks proper protections. An attacker with S3 permissions to the CloudTrail bucket can delete log files. Protect against this using S3 Object Lock in Compliance mode (which prevents deletion even by the root account within the retention period), placing the logging bucket in a separate dedicated logging account, and restricting bucket access to the CloudTrail service principal only.
Protect your AWS accounts before it's too late
Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.
Written by Vigilare Engineering
Platform Team