AWS Auto Scaling with Auto Scaling Groups & Load Balancers

Blog / Amazon Web Services · January 13, 2022 · Updated June 10, 2026 · 9 min read
AWS Auto Scaling with Auto Scaling Groups & Load Balancers

Combining an EC2 Auto Scaling Group (ASG) with an Application Load Balancer (ALB) gives you three things at once: elasticity (capacity follows demand), high availability (traffic is spread across healthy instances in multiple Availability Zones, and unhealthy ones are replaced automatically), and cost efficiency (you stop paying for idle servers when traffic drops). The load balancer answers requests at a single endpoint and routes them only to instances that pass health checks; the Auto Scaling Group decides how many instances exist and in which AZs.

The modern building blocks, as of 2026, are: a Launch Template (the reusable blueprint for new instances — Launch Configurations are legacy and should not be used for new work), an ALB with a Target Group, an Auto Scaling Group spanning multiple subnets/AZs, and one or more dynamic scaling policies (target tracking is the recommended default). This guide wires them together with AWS CLI snippets you can adapt to Terraform or CloudFormation.

Architecture: how the pieces fit together

Requests flow through the load balancer to a target group, which forwards them to instances the Auto Scaling Group manages across at least two AZs:

Client -> Route 53 (DNS)
       -> Application Load Balancer (public subnets, AZ-a + AZ-b)
           -> Listener (HTTPS:443, ACM certificate)
               -> Target Group (health check: HTTP /healthz)
                   -> EC2 Auto Scaling Group (private subnets, AZ-a + AZ-b)
                        min=2  desired=2  max=8

Key design rules that keep this resilient:

  • Span at least two Availability Zones. If one AZ has a problem, the ALB keeps serving from the other and the ASG launches replacements in a healthy AZ.
  • Keep instances in private subnets behind the ALB; only the ALB lives in public subnets. Use a NAT gateway for outbound traffic.
  • Use ELB health checks on the ASG, not just EC2 status checks, so a process that crashes (but leaves the instance "running") is detected and replaced.
  • Make instances stateless. Push sessions to ElastiCache/DynamoDB and uploads to S3 so any instance can serve any request and scaling-in never loses data.

Step 1 - Create a Launch Template

A Launch Template is the blueprint the ASG uses for every new instance: AMI, instance type, security groups, IAM instance profile, and user data. Templates support versioning (Launch Configurations do not), which is what makes rolling updates and instance-refresh possible.

aws ec2 create-launch-template \
  --launch-template-name web-app-lt \
  --version-description "v1 baseline" \
  --launch-template-data '{
    "ImageId": "ami-0abcdef1234567890",
    "InstanceType": "t3.small",
    "SecurityGroupIds": ["sg-0aaa1111"],
    "IamInstanceProfile": {"Name": "web-app-instance-role"},
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "web-app"}]
    }],
    "UserData": "'$(base64 -w0 bootstrap.sh)'"
  }'

The UserData script (bootstrap.sh) should be idempotent and fast: pull the latest app artifact, start the service, and expose a /healthz endpoint. For reproducible images, bake dependencies into the AMI with EC2 Image Builder and keep user data thin.

Step 2 - Create the ALB and Target Group

The Application Load Balancer terminates HTTP/HTTPS and routes on Layer 7 (host/path rules). The Target Group holds the instances and runs the health check that decides which targets receive traffic. (Use a Network Load Balancer instead only when you need Layer 4 throughput, static IPs, or extreme low latency. The Classic Load Balancer is legacy and should not be used for new deployments.)

# 1) Target group with an application-level health check
aws elbv2 create-target-group \
  --name web-app-tg \
  --protocol HTTP --port 80 \
  --vpc-id vpc-0abc123 \
  --target-type instance \
  --health-check-protocol HTTP \
  --health-check-path /healthz \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3 \
  --health-check-interval-seconds 15

# 2) Application Load Balancer across two public subnets (two AZs)
aws elbv2 create-load-balancer \
  --name web-app-alb \
  --type application \
  --subnets subnet-0aaa subnet-0bbb \
  --security-groups sg-0alb1234

# 3) HTTPS listener that forwards to the target group
aws elbv2 create-listener \
  --load-balancer-arn <ALB_ARN> \
  --protocol HTTPS --port 443 \
  --certificates CertificateArn=<ACM_CERT_ARN> \
  --default-actions Type=forward,TargetGroupArn=<TG_ARN>

Do not register instances manually. The Auto Scaling Group registers and deregisters targets for you as it scales, so the target group always reflects current capacity. Before instances exist, the listener will return 503 Service Unavailable — that is expected until Step 3 launches healthy targets.

Step 3 - Create the Auto Scaling Group

The ASG ties the launch template to the subnets and the target group. Set min / desired / max capacity, list two or more subnets in different AZs, and choose ELB health checks so unhealthy targets (not just stopped instances) are replaced.

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-app-asg \
  --launch-template 'LaunchTemplateName=web-app-lt,Version=$Latest' \
  --min-size 2 --desired-capacity 2 --max-size 8 \
  --vpc-zone-identifier "subnet-0priv-a,subnet-0priv-b" \
  --target-group-arns <TG_ARN> \
  --health-check-type ELB \
  --health-check-grace-period 90 \
  --default-instance-warmup 90

Notes:

  • --health-check-type ELB makes the ASG trust the target group's /healthz result, not just the EC2 hypervisor check.
  • --health-check-grace-period gives a new instance time to boot and pass health checks before the ASG can mark it unhealthy.
  • --default-instance-warmup (the modern replacement for per-policy warm-up and scaling cooldowns) tells scaling policies to ignore a new instance's metrics until it is warmed up, preventing over-scaling.
  • For non-critical, fault-tolerant tiers, a mixed instances policy with Spot capacity can cut compute cost substantially (Spot pricing is usage-based and varies — verify on aws.amazon.com, as of 2026).

Step 4 - Add scaling policies

Capacity changes are driven by scaling policies. For almost every web/API workload, start with target tracking: you pick a metric and a target value, and AWS manages the CloudWatch alarms and the math to hold the metric near that value. Two metrics dominate:

  • Average CPU utilization (ASGAverageCPUUtilization) — simple, good first signal.
  • ALB request count per target (ALBRequestCountPerTarget) — usually the best proxy for real load on a web tier, because it scales on actual traffic rather than a lagging CPU number.
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-app-asg \
  --policy-name tt-cpu-50 \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 50.0
  }'

To scale on traffic instead, swap in ALBRequestCountPerTarget and supply the ALB/target-group ResourceLabel. You can attach multiple target-tracking policies; the ASG scales out on whichever demands the most capacity and is conservative about scaling in.

Choosing a scaling policy type

Policy type How it decides Best for Watch out for
Target tracking Holds a metric (CPU, request count/target) at a target value; AWS manages the alarms The default for web/API tiers; set-and-forget Pick a metric that truly tracks load; one runaway metric can over-scale
Step scaling Add/remove N instances per CloudWatch alarm breach size Fine-grained control when target tracking is too coarse More tuning; you own the alarm thresholds and step sizes
Simple scaling One adjustment per alarm, then a cooldown Legacy / very simple cases Cooldown blocks further action; superseded by target/step
Scheduled scaling Changes capacity at known times Predictable peaks (business hours, batch windows, sales) Static — pair with dynamic scaling for surprises
Predictive scaling ML forecasts load from history and pre-provisions Cyclical daily/weekly traffic with warm-up cost Needs ~24h+ of history; combine with dynamic for accuracy

A common production pattern: predictive scaling to pre-warm capacity ahead of the daily curve, target tracking to react to the unexpected, and scheduled scaling for known events such as a marketing launch.

Rolling deploys with instance refresh

To ship a new AMI or launch-template version without downtime, use instance refresh. The ASG replaces instances in batches while keeping a configurable percentage healthy and in service, draining connections from the ALB target group as it goes.

aws autoscaling start-instance-refresh \
  --auto-scaling-group-name web-app-asg \
  --preferences '{
    "MinHealthyPercentage": 90,
    "InstanceWarmup": 90,
    "SkipMatching": true
  }'

SkipMatching avoids replacing instances that already run the desired configuration. For higher-stakes releases, drive instance refresh from your CI/CD pipeline (CodeDeploy or a GitHub Actions step) so rollbacks are a one-command reversion to the previous launch-template version.

Lifecycle hooks let you run actions during launch or terminate transitions — for example, registering with a service mesh on LAUNCHING, or flushing logs and deregistering gracefully on TERMINATING before the instance disappears.

Observability and load testing

Scaling is only as good as the signals behind it. Watch these in CloudWatch:

  • ALB: RequestCountPerTarget, TargetResponseTime, HTTPCode_Target_5XX_Count, HealthyHostCount / UnHealthyHostCount.
  • ASG: GroupDesiredCapacity, GroupInServiceInstances, plus the CPU/request metrics your policies track.
  • Enable group metrics on the ASG and access logs on the ALB (to S3) for request-level debugging.

Validate the loop before you rely on it: generate load (k6, Locust, or hey) against the ALB DNS name and confirm the desired capacity climbs, new targets pass /healthz and join the target group, and capacity scales back in after the warm-up window. Wire HTTPCode_Target_5XX_Count and UnHealthyHostCount alarms to SNS so regressions page you instead of surprising users. For a deeper walkthrough of the load-balancer side, see Configuring and Testing a Load Balancer in AWS EC2.

When to reach for serverless or containers instead

EC2 Auto Scaling is the right tool when you need full control of the instance, GPUs, specialized AMIs, or long-lived stateful processes. For many newer workloads, other compute models scale faster and cut operational overhead:

  • AWS Fargate (with ECS or EKS) — serverless containers; you scale tasks/pods, not VMs, and pay only for what runs. Great for stateless services that deploy as containers.
  • Karpenter (on EKS) — provisions right-sized nodes in seconds based on pending pods; often replaces the node-group ASG for Kubernetes clusters.
  • AWS Lambda — for spiky, event-driven, or bursty work, scaling is automatic and granular with no capacity to manage at all.

The honest trade-off: containers and serverless reduce instance toil but add their own learning curve and cold-start/limit considerations. EC2 ASGs remain a solid, well-understood choice for traditional web/app tiers and migrations that are not yet containerized — and you can mix models as you modernize.

How MicroPyramid helps

We have spent 12+ years and delivered 50+ projects building and operating systems on AWS for startups and enterprises across the US, UK, Australia, Singapore and Europe. We design multi-AZ Auto Scaling architectures, automate them with Terraform/CloudFormation, set sensible scaling policies and instance-refresh pipelines, and tune them for cost — including Graviton, Spot and right-sizing — so you pay for the capacity you actually use. Explore our AWS consulting services, cloud migration services and server maintenance services, or read our AWS cost and performance optimization guide.

Frequently Asked Questions

What is the difference between an Auto Scaling Group and a load balancer?

They solve different problems and work together. An Auto Scaling Group controls how many EC2 instances exist and replaces unhealthy ones, while an Application Load Balancer distributes incoming requests across whatever healthy instances currently exist. The ASG registers its instances into the load balancer's target group automatically, so as the group scales out or in, the load balancer's pool updates without manual work.

Should I use a Launch Template or a Launch Configuration?

Use a Launch Template. Launch Configurations are legacy, do not support versioning, and lack newer features such as mixed instances policies and the latest instance types. Launch Templates are required for capabilities like instance refresh and are the AWS-recommended default for all new Auto Scaling Groups in 2026.

Which scaling policy should I start with?

Start with target tracking on either average CPU utilization or, better for web tiers, ALB request count per target. It is the simplest to operate because AWS manages the underlying CloudWatch alarms. Add scheduled scaling for known peaks and predictive scaling once you have a stable daily or weekly pattern; reach for step scaling only when you need finer manual control.

How do I deploy a new version without downtime?

Publish a new Launch Template version with the updated AMI, then run an instance refresh with a MinHealthyPercentage (e.g. 90%) so the ASG replaces instances in batches while keeping the service available. Connection draining on the target group lets in-flight requests finish, and you can roll back by pointing the ASG at the previous launch-template version.

EC2 health check vs ELB health check - which should the ASG use?

Use the ELB health check type for any app behind a load balancer. EC2 status checks only confirm the hypervisor and instance are running; they will not catch an app process that has crashed or hung. The ELB check hits your application endpoint (for example /healthz), so the ASG replaces instances that are technically "running" but not actually serving traffic.

When should I use Fargate or EKS instead of EC2 Auto Scaling?

Choose Fargate or EKS when your workload is containerized and you want to scale tasks/pods instead of managing VMs and AMIs, or Lambda for spiky event-driven work. Stick with EC2 Auto Scaling Groups when you need OS-level control, specialized or GPU instances, long-lived stateful processes, or you are migrating an existing VM-based app that is not yet containerized.

Share this article