What AWS services does Vigilare monitor?

Vigilare monitors 7+ AWS services: Billing (cost anomalies and usage spikes), IAM (policy drift, root account usage, missing MFA), GuardDuty (threat findings), SES (sending reputation), CloudTrail (suspicious API patterns), Service Quotas (limit utilisation), and Account Health (AWS notifications). Coverage is continuously expanded.

How does Vigilare access my AWS account?

Vigilare uses a read-only cross-account IAM role that you provision using our open-source Terraform module. The role grants only the minimum permissions required for monitoring — no write access, no credential storage, and no data leaves your account except what is needed to display findings in your dashboard.

How quickly will I be alerted?

Depending on your plan, Vigilare scans every 1, 5, or 15 minutes. Alerts are delivered in real time via email, Slack, or PagerDuty the moment a threshold is crossed. The typical end-to-end latency from event to alert is under 90 seconds.

Can I monitor multiple AWS accounts?

Yes. The Solo plan covers 1 account, Team supports up to 10, Agency up to 50, and Enterprise is unlimited. All accounts appear in a single dashboard with per-account risk scores and consolidated alert feeds.

Is my AWS data secure?

All data is encrypted at rest with AES-256 and in transit with TLS 1.2+. Vigilare's control plane runs on AWS infrastructure in dedicated, isolated tenants. We never store raw AWS credentials. The platform is SOC 2 Type II compliant and we provide reports on request for Enterprise customers.

Is there a free trial?

Yes — all paid plans include a 14-day free trial. No credit card is required to get started. You can connect your first AWS account and see findings within 5 minutes of signing up.

What happens if I exceed my account limit?

We'll notify you as you approach your account limit and give you the option to upgrade. We won't stop monitoring your existing accounts without notice.

Can I cancel at any time?

Yes. There are no long-term contracts. You can cancel your subscription at any time from the billing settings page. Your access continues until the end of the current billing period.

AWS Serverless Monitoring: Observability for Lambda-Centric Architectures

Lambda architectures replace persistent server processes with thousands of short-lived function invocations. This model creates observability challenges that aren't present in traditional server-based deployments: a single "request" might trigger dozens of function invocations across multiple services; errors in one function propagate in ways that are hard to trace without distributed tracing; cold starts affect latency in ways that don't appear in average metrics.

Effective serverless monitoring requires instrumenting at multiple layers: Lambda-native metrics for function health, distributed tracing for end-to-end visibility, and application-level metrics emitted from function code for business logic observability.

Lambda Native CloudWatch Metrics

Lambda automatically publishes metrics to CloudWatch under the AWS/Lambda namespace without any configuration. The key metrics to monitor:

Invocations: Total count of function calls. Use this for traffic baselines and to detect unexpected usage spikes or drops. A sudden 0 invocation count for a function that's normally busy is as notable as a spike — it could indicate an upstream trigger stopped working.

Errors: Count of invocations that resulted in function execution errors (exceptions, memory exceeded, timeout). This doesn't include throttles (separate metric) or initialization errors. Monitor error rate as a percentage of invocations rather than absolute count, and set alarms at 1-5% error rate for production functions.

Throttles: Count of invocations rejected because the concurrency limit was reached. Any non-zero throttle count on a production function warrants investigation — it means requests were rejected, not just delayed. Create an alarm for throttles > 0 on business-critical functions.

Duration: Function execution time including p50, p95, and p99 percentiles. P95 and P99 are more important than average for understanding the tail latency experience. Functions timing out appear as errors with timeout message in CloudWatch Logs. If p99 duration is approaching the configured timeout, increase the timeout or optimize the function.

ConcurrentExecutions: Simultaneous in-progress invocations. Compare to your reserved and account concurrency limits. Sustained high concurrency approaching limits suggests you need to request a quota increase or optimize function performance to reduce duration.

AWS X-Ray for Distributed Tracing

X-Ray provides distributed tracing across Lambda functions and AWS services. When enabled, X-Ray traces each request through all participating components — API Gateway, Lambda, DynamoDB, SQS, external HTTP calls — and visualizes the trace as a service map and timeline. This makes it immediately visible which component is responsible for latency or which downstream service call is failing.

Enable X-Ray in the Lambda function configuration (requires the X-Ray execution role permission xray:PutTraceSegments and xray:PutTelemetryRecords). Instrument your function code with the X-Ray SDK to create custom subsegments for database calls, external HTTP calls, and other operations you want to trace separately. Without custom subsegments, X-Ray traces only show the function invocation itself; with instrumentation, you see the breakdown of time spent in each operation within the function.

X-Ray sampling controls cost. The default sampling rule captures 5% of traces at low traffic rates (first request per second plus 5% of additional requests). For latency-sensitive functions, create custom sampling rules that capture 100% of slow requests (duration > 500ms) even if overall sampling is lower.

Structured Logging for Application Observability

Lambda function stdout and stderr go to CloudWatch Logs. Unstructured log output (free-text log lines) is hard to query and correlate across invocations. Structured logging — JSON-formatted log output with consistent fields — enables Log Insights queries that surface application-level metrics.

Emit a structured log line at the end of each invocation with: request ID, function duration, outcome (success/failure), any relevant business metrics (orders processed, records updated), and error details if failed. CloudWatch Log Insights can then query these logs to calculate success rates, throughput, and error patterns without requiring custom metric instrumentation for every dimension you care about.

Lambda Powertools (available for Python, TypeScript, Java, .NET) provides a structured logger, metrics, and tracer that implement AWS best practices for Lambda observability. It significantly reduces the boilerplate for implementing structured logging and custom metrics. The structured logging module automatically includes function name, cold start indicator, and request ID in every log line.

Cold Start Detection and Management

Cold starts occur when Lambda needs to initialize a new execution environment — downloading code, initializing the runtime, and running initialization code outside the handler function. Cold starts add 100ms-5000ms latency on the affected invocation, depending on runtime (JVM is slowest; Python and Node.js are faster), code size, and VPC configuration (VPC functions have longer cold starts due to ENI setup).

Lambda Powertools marks each invocation as a cold start in logs and metrics. Track cold start rate as a percentage of invocations and monitor cold start duration separately from warm invocation duration. If your P99 latency is dominated by cold starts, consider provisioned concurrency for latency-sensitive functions, code optimization to reduce initialization time, or architecture changes that reduce invocation frequency (batching events before invoking Lambda).

Alerting for Serverless Applications

Alerts for serverless need to account for the invocation-based nature. Configure alarms on:

Error rate above threshold (not absolute error count — rate accounts for traffic variability)
Throttle count > 0 for critical functions
P99 duration above threshold for user-facing functions
Invocation count anomaly (unusual spikes or unexpected zero counts)
Dead letter queue depth > 0 (failed async invocations reaching the DLQ)

The dead letter queue alarm is particularly important for event-driven architectures. Failed async invocations (from SQS, SNS, EventBridge) end up in the DLQ and need investigation. A growing DLQ depth is an early signal of systematic failures that might not generate obvious user-facing errors.

FAQ

How do I correlate logs across multiple Lambda functions in a request chain?

Propagate a correlation ID through all function invocations in a request chain. The first function in the chain creates or receives a request ID; each subsequent function receives it in the event payload or HTTP headers and includes it in its own log output. CloudWatch Log Insights can then query for all log lines with the same correlation ID to reconstruct the full request journey. X-Ray trace IDs serve the same purpose for distributed traces.

Are Lambda function logs retained indefinitely?

By default, CloudWatch Log Groups for Lambda functions have no expiration (infinite retention), which means storage costs grow indefinitely. Configure log retention periods on all Lambda Log Groups to match your operational and compliance requirements. 30-90 days covers most operational needs; archive older logs to S3 with lower-cost storage classes if longer retention is required for compliance.

What's the best way to monitor Lambda functions across hundreds of functions in a large application?

For large Lambda fleets, per-function dashboards don't scale. Use CloudWatch Container Insights (Lambda view) and CloudWatch ServiceLens for fleet-level dashboards that aggregate across all functions. Tag Lambda functions with application and environment tags, then build dashboards filtered by tag to see application-level health. Lambda Insights provides enhanced function-level metrics (memory usage, initialization duration, CPU time) that complement the standard Lambda CloudWatch metrics.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

See Vigilare pricing Talk to us about securing your AWS Browse documentation →

Written by Vigilare Engineering

Platform Team