MSPMonitoringAWS

Monitoring Client AWS Accounts: Architecture and Alerting for MSPs

Vigilare Engineering

Platform Team · December 12, 2025 · 8 min read

Client AWS monitoring at scale is an operational challenge that standard AWS tooling doesn't fully solve out of the box. The native AWS monitoring tools — CloudWatch, GuardDuty, Config — are designed for single-account or single-organization use. Adapting them for multi-client MSP use requires additional architecture: aggregation pipelines that pull data from multiple accounts, isolation that keeps client data separate, and alerting that produces actionable notifications rather than noise.

This guide covers the architecture components and their tradeoffs. The goal is a monitoring setup that gives your operations team unified visibility without the need to log into individual client accounts for routine monitoring.

The Aggregation Layer

Centralized monitoring starts with getting data from client accounts to your monitoring infrastructure. Three primary channels:

CloudTrail log aggregation: Each client account's CloudTrail delivers logs to a central S3 bucket in your MSP log management account. Configure a cross-account S3 bucket policy that allows all managed accounts to write. CloudTrail in each client account is configured to use the centralized bucket as its delivery destination. This gives you all API activity from all client accounts in one location, queryable with Athena without logging into individual accounts.

CloudWatch cross-account metrics: AWS CloudWatch supports sharing metrics from a source account to a monitoring account. In each client account, configure a sharing policy granting your monitoring account permission to access CloudWatch data. In your monitoring account, add each client account as a linked source. You can then query and display metrics from all client accounts in a single CloudWatch dashboard.

GuardDuty delegated administration: Configure your MSP account as the GuardDuty administrator for each client organization. Findings from all member accounts appear in your GuardDuty console with the source account tagged. This is the most operationally efficient finding aggregation method — all security findings centralized with built-in account attribution.

Data Isolation and Multi-Tenancy

Aggregating data from multiple clients in shared infrastructure requires isolation controls. A security incident affecting your MSP monitoring account must not expose one client's data to another. Implement:

Separate S3 prefixes per client: Partition aggregated CloudTrail logs in S3 by account ID. The standard CloudTrail delivery path already includes the account ID in the prefix. Use S3 bucket policies and IAM policies that restrict access to specific prefixes for any automation that processes client-specific logs.

Separate CloudWatch log groups per client: When forwarding client CloudWatch Logs data to your monitoring account, use log group names that include the client's account ID. This makes per-client log queries straightforward and allows you to set separate retention policies per client without commingling data.

Tagging all aggregated data with account context: Any metric or finding stored in your monitoring account should be tagged with the source account ID and (optionally) a client identifier. This enables filtering and prevents confusion when a finding from client A appears in a view intended for client B.

Alert Routing and Suppression

Aggregating monitoring from dozens of clients creates alert volume that requires intelligent routing. Without proper suppression and routing, your operations team drowns in low-priority alerts and misses the important ones.

Build a routing layer between raw findings and your operations team. EventBridge in your monitoring account receives events from all client accounts (configure EventBridge event buses in each client account to forward to your central bus) and applies rules to classify and route them. High-severity GuardDuty findings route to PagerDuty for immediate response. Config compliance drift routes to a Slack channel for daily review. Billing anomalies route to the responsible account manager for client communication.

Per-client suppression rules are critical. Each client has unique infrastructure patterns that generate false positives: a security scanning service that triggers reconnaissance findings, a deployment automation role that generates unusual API call findings, a batch job that drives billing spikes. Build a suppression rule system that allows you to define client-specific suppressions without affecting other clients' alerting.

Client-Facing Dashboards

Most MSP clients want some visibility into their account health without managing AWS consoles themselves. Build client-facing dashboards that present key metrics in business terms:

  • Overall security posture score (aggregating findings by severity)
  • Recent findings and resolutions (with timestamps and disposition)
  • Compliance status against their applicable framework (SOC 2, ISO 27001, etc.)
  • Cost trends and month-over-month comparison
  • Availability metrics for key services

These dashboards can be built with Amazon QuickSight (connecting to aggregated data in S3 and CloudWatch), Grafana, or purpose-built tools. Keep the presentation focused on what clients care about — not raw AWS metrics, but translated business outcomes: "Your account has 0 critical security findings this month" and "AWS spending is on track for $X,XXX this month."

Incident Workflow for MSP Environments

Define the incident workflow from alert to resolution before you need it. Key steps: alert fires in your monitoring system, on-call engineer receives PagerDuty notification, engineer assumes cross-account role to the relevant client account, investigation and remediation, client notification (per your SLA tier), and post-incident documentation.

Build runbooks for common incident types — security runbooks for GuardDuty finding types, operational runbooks for capacity limits, cost runbooks for billing anomalies. Runbooks reduce mean time to resolve and allow less senior engineers to handle incidents that would otherwise require escalation.

Related Reading

FAQ

How do you handle monitoring for clients who don't want their data leaving their account?

Some clients (particularly those with data residency requirements or contractual restrictions) may not allow aggregation of their CloudTrail or metrics to your MSP account. In these cases, deploy monitoring infrastructure in the client's account that sends alerts to your monitoring systems without moving the underlying data. EventBridge in the client account can send alert events (not raw log data) to your monitoring account, preserving alerting capability while keeping log data in the client's environment.

What's a reasonable number of client accounts for one MSP operations engineer to monitor?

With good automation and centralized monitoring, one engineer can effectively monitor 20-50 accounts. The key variable is alert volume and incident rate — a portfolio of 40 well-architected accounts with low incident rates is manageable; 20 accounts with poor baseline configurations generating constant alerts is overwhelming. Baseline standardization and alert suppression tuning are the leverage points for increasing engineer-to-account ratios.

Should MSPs build their own monitoring tools or use products like Vigilare?

Building a full multi-tenant monitoring platform in-house requires substantial engineering investment — aggregation infrastructure, alert routing, client-facing dashboards, data isolation, and ongoing maintenance. Purpose-built MSP monitoring tools like Vigilare provide this infrastructure as a service, allowing MSP teams to focus on the client relationships and remediation work rather than monitoring platform development. The build-vs-buy calculus depends on your team's engineering capacity and the differentiation you want in your monitoring capability.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

Written by Vigilare Engineering

Platform Team