EC2OperationsAWS

AWS EC2 vCPU Limits: Managing the Instance Ceiling That Catches Teams Off Guard

Viktor B.

Co-founder & CEO · November 28, 2025 · 7 min read

EC2 vCPU limits are the most commonly encountered quota constraint for teams scaling AWS workloads. Unlike earlier instance count limits, AWS moved to a vCPU-based model where the limit applies to total vCPUs consumed within an instance family bucket, not the number of instances. This makes the limit more predictable in some ways but creates new footguns: launching one c5.24xlarge (96 vCPUs) can consume the same quota as 12 c5.2xlarges (8 vCPUs each), so the mix of instance sizes matters as much as the count.

The failure mode when limits are exceeded is a VcpuLimitExceeded error from the RunInstances API. In auto scaling scenarios, the Auto Scaling group quietly fails to add capacity, the application becomes overloaded, and nothing obviously points to a quota limit as the cause. The investigation usually starts with "why isn't auto scaling working?" before landing on the quota issue.

How vCPU Limits Are Structured

EC2 instance types are grouped into families, and vCPU limits apply per family. The main buckets are:

  • Standard instances (A, C, D, H, I, M, R, T, Z families): The largest bucket, covering general-purpose and compute-optimized instances. Default limit is typically 32 vCPUs in new accounts and increases with AWS account age and usage history.
  • High Memory instances (U family): Separate quota for the very large memory instances.
  • GPU instances (P, G, Inf families): Separate quotas per GPU family, with lower default limits reflecting the specialized nature and higher cost of GPU instances.
  • Spot instances: Spot usage has its own separate vCPU quota, separate from On-Demand.

Limits are per-region. The vCPU limit in us-east-1 is independent of the limit in eu-west-1. Multi-region architectures need quota planning for each region independently.

Finding Your Current Limits and Utilization

Check current limits and utilization in three places:

Service Quotas console: Navigate to AWS Service Quotas → EC2. Search for "Running On-Demand" to find the vCPU quotas for each instance family. The console shows your current limit, whether an increase is in progress, and — for supported quotas — a link to the CloudWatch utilization metric.

CloudWatch metrics: The AWS/EC2 namespace includes vCPU utilization metrics per instance family. These metrics show current vCPU consumption as an absolute number, which you compare against your quota to calculate utilization percentage. Create a metric math expression dividing usage by limit to get a 0-100% utilization gauge.

EC2 console dashboard: The EC2 service dashboard shows a "Limits" section with current usage and limits for key resources including vCPUs. This is the quickest way to check current state manually.

Setting Up Proactive Alerts

Create CloudWatch alarms at 70% utilization to trigger quota increase requests before you need more capacity. The alarm configuration:

Metric: AWS/Usage / ResourceCount
Dimension: Service=EC2, Resource=vCPU, Type=Resource, Class=Standard/A_C_D_H_I_M_R_T_Z
Threshold: [your quota * 0.7]
Period: 5 minutes
Evaluation Periods: 3

Wire the alarm to an SNS topic that notifies your infrastructure team. Include the current utilization value in the notification and a link to the Service Quotas console to streamline the increase request process. For teams with frequent scaling events, automate the increase request directly from the alarm — a Lambda function that submits a Service Quotas request for the appropriate quota when the alarm fires.

Requesting vCPU Limit Increases

Submit quota increase requests through Service Quotas rather than AWS Support. Service Quotas requests for common quotas (Standard instance vCPUs) are frequently auto-approved up to certain multipliers of the current limit. Requesting 2x your current limit for a quota you're actually using is almost always auto-approved. Requesting 10x requires human review and a business justification.

For new accounts that need large quotas immediately, the fastest path is a Business or Enterprise support plan combined with a well-justified support case explaining the workload requirements. AWS is generally accommodating for legitimate scaling needs — the review process exists to catch abuse (crypto mining, botnet infrastructure) rather than to obstruct legitimate usage.

When planning a migration or a new production launch, submit quota increase requests two weeks in advance. Waiting until the launch is imminent and a quota constraint blocks the go-live is entirely avoidable and creates unnecessary time pressure on the support team.

GPU Instance Limits Deserve Special Attention

P3 and P4 instance quotas (for ML training workloads) are set very low by default — often 0 or 8 vCPUs for new accounts. If you're planning any GPU workload, request an increase during your AWS account setup process, not when you're ready to run the first training job. GPU quota increases require business justification and take longer to approve than standard instance quotas.

The same applies to G-family instances used for inference and graphics workloads, and Inf instances (AWS Inferentia) for high-performance inference. Treat GPU quotas as a long-lead-time item in your capacity planning.

Multi-Account EC2 Capacity Planning

Each account in an AWS Organization has independent quotas. A quota increase in your production account doesn't help your staging account. For organizations with workload isolation across accounts, plan quota increases for each account based on that account's expected peak workload. Don't assume that historical averages will hold during incidents — incident response often involves launching additional diagnostic or mitigation infrastructure that consumes EC2 capacity beyond normal operating levels.

Track quota utilization centrally using AWS Service Quotas monitoring across all accounts. Vigilare aggregates service health data including quota utilization across your account inventory, giving you visibility into approaching limits before they cause scaling failures.

Related Reading

FAQ

Do Reserved Instances count against vCPU limits?

Yes. Reserved Instance capacity consumes vCPU quota when the instances are running. The reservation provides a billing discount and capacity reservation, but doesn't exempt the instances from quota accounting. If you have 500 vCPUs of Reserved Instances and a 1,000 vCPU limit, you have 500 vCPUs of headroom for additional On-Demand or Spot instances.

Do Spot Instances have separate vCPU limits?

Yes. Spot instance usage has its own vCPU quota, separate from On-Demand quotas. The Spot quota is typically more generous than On-Demand for the same instance family because spot capacity is inherently variable. Monitor both quotas independently if your workload uses a mix of On-Demand and Spot instances.

What happens to running instances if AWS lowers my quota?

AWS does not lower quotas for running workloads in a way that would cause running instances to be terminated. Quota reductions (which are rare and typically involve expired trial credits or account-level policy changes) affect the ability to launch new instances. If you need clarity on quota changes affecting your account, contact AWS Support directly.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

Written by Viktor B.

Co-founder & CEO