A production-grade AWS Lambda function is one that starts fast, costs little, fails safely, leaks nothing, and tells you exactly what happened when something goes wrong. Getting there is less about clever code and more about a handful of disciplines: keep the deployment package small and initialise expensive clients once, run on ARM64 with right-sized memory, make handlers idempotent with dead-letter queues for failures, grant the execution role only the permissions it actually uses, pull secrets from Secrets Manager or SSM instead of environment variables, and emit structured logs you can query.
This guide collects the practices we apply across serverless workloads at MicroPyramid, updated for 2026 runtimes and AWS features. Each section is something you can act on today.
Performance: tame cold starts
A cold start is the latency you pay when Lambda has to create a fresh execution environment, download your code, start the runtime, and run your initialisation code before the first invocation. Warm invocations skip all of that. The goal is to make cold starts rare, and cheap when they do happen.
The biggest levers, in order of impact:
- Initialise outside the handler. Code in the global scope runs once per execution environment, not once per invocation. Create your
boto3clients, database connection pools, and config objects there so warm invocations reuse them. AWS also gives the init phase a burst of full CPU, so heavy setup is cheaper there than inside the handler. - Keep the deployment package small. Less code to download and import means a faster cold start. Bundle only what you ship, prune unused dependencies, and move large shared dependencies into a Lambda layer rather than duplicating them in every function.
- Lazy-init what you rarely use. If a function only sometimes calls a heavy SDK or service, import and construct it on first use rather than at module load.
- Right-size memory. Memory and CPU scale together on Lambda, so adding memory often makes a function both faster and cheaper per invocation. Use AWS Lambda Power Tuning to find the sweet spot rather than guessing.
When cold starts still hurt a latency-sensitive path, you have two managed options to reduce or remove them.
| Mitigation | How it works | Best for | Trade-off |
|---|---|---|---|
| Init outside handler + small package | One-time setup reused across warm invokes | Every function (free, always do it) | None — table stakes |
| Lambda SnapStart | AWS snapshots an initialised environment and restores from it; supported for Java, Python and .NET in 2026 | Predictable latency without paying for idle capacity | Snapshot caveats: randomness/connections must be re-initialised carefully |
| Provisioned Concurrency | Keeps N environments pre-warmed and ready | Strict P99 latency, predictable traffic spikes | You pay for the warm capacity whether used or not |
| On-demand only | No mitigation | Background/async work where cold starts don't matter | Variable first-call latency |
For most synchronous APIs, start with SnapStart (no extra charge for the warm capacity) and reach for Provisioned Concurrency only when you need guaranteed sub-cold-start latency at known traffic levels.
Cost: pay for work, not for waiting
Lambda bills on GB-seconds — allocated memory multiplied by execution duration — plus per-request charges. Two principles cut the bill the most.
- Run on ARM64 (Graviton). AWS Graviton-based ARM64 functions deliver up to roughly 34% better price-performance than x86 for most workloads, and the per-GB-second rate is lower. For pure Python and Node.js code there is usually nothing to change but the architecture flag. Make ARM64 your default and only fall back to x86 if a dependency ships no ARM wheel/binary.
- Never sleep inside a Lambda. Paying for a function to
time.sleep()while it waits on a slow downstream call, a human approval, or a fixed delay is pure waste. Use Step Functions (includingWaitstates and the.waitForTaskTokencallback pattern) to orchestrate long or multi-step flows, so you only pay Lambda for the actual compute, not the waiting.
| Dimension | x86_64 | ARM64 (Graviton) |
|---|---|---|
| Price-performance | Baseline | Up to ~34% better for typical workloads |
| Per-GB-second cost | Higher | Lower |
| Compatibility | Universal | Needs ARM-compatible deps (most have them) |
| Recommended default | Only if a dep lacks ARM support | Yes — start here |
Beyond architecture, right-sizing memory is a cost lever too: a function with more memory can finish so much faster that total GB-seconds drop. Always measure both speed and cost together — that is exactly what AWS Lambda Power Tuning reports.
Reliability: idempotency, retries, and dead letters
Lambda's invocation model means a single logical event can run your handler more than once. Asynchronous invocations and stream/queue triggers retry on failure, and SQS delivery is at-least-once. If your handler is not idempotent, retries can double-charge a card, send duplicate emails, or write a record twice.
- Make handlers idempotent. Use a natural idempotency key (order id, request id) and a store such as DynamoDB to record "already processed". Powertools for AWS Lambda ships an idempotency utility that does this for you with a decorator.
- Configure dead-letter queues (DLQs) or on-failure destinations. For async and SQS-triggered functions, route exhausted-retry events to a DLQ so nothing is silently lost. Alarm on DLQ depth.
- Tune retries deliberately. Set maximum retry attempts and event age on async invokes instead of relying on defaults. For synchronous, client-facing invocations (API Gateway, Lex), cap retries to avoid amplifying load or cost during an incident.
- Set realistic timeouts. The default is short and the max is 15 minutes. Set the timeout just above your real P99 duration — too long wastes money on a stuck call; too short kills legitimate work. For anything regularly approaching 15 minutes, move it to Step Functions, Fargate, or ECS.
Security: least privilege by default
Every Lambda runs with an IAM execution role. The single most common mistake is attaching a broad policy (or worse, * actions on * resources). Scope the role to the exact actions and ARNs the function uses — nothing more. If you are new to writing scoped policies, our walkthrough of AWS IAM roles and policies covers the building blocks.
- Least-privilege execution role. Grant only the specific actions on the specific resource ARNs the function touches. Tools like IAM Access Analyzer can generate a tightened policy from observed CloudTrail activity.
- Keep secrets out of environment variables. Lambda environment variables are visible to anyone with
lambda:GetFunctionConfigurationand are not a secret store. Read database passwords, API keys, and tokens from AWS Secrets Manager or SSM Parameter Store at runtime — the Lambda extension for both caches values in-process so you are not hammering the API on every invoke. - Use a VPC only when you need one. Attaching a function to a VPC is required to reach private resources (an RDS instance in private subnets, for example) but adds configuration and networking overhead. If your function only talks to public AWS APIs, leave it out of the VPC. When you do need private access, use VPC endpoints to reach AWS services without a NAT gateway.
- Encrypt and validate. Enable encryption in transit to downstream services and validate/parse every event payload — never trust the shape of an incoming event.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadOneBucketPrefix",
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-app-uploads/incoming/*"
},
{
"Sid": "WriteOneTable",
"Effect": "Allow",
"Action": ["dynamodb:PutItem", "dynamodb:GetItem"],
"Resource": "arn:aws:dynamodb:us-east-1:111122223333:table/Orders"
},
{
"Sid": "ReadOneSecret",
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:us-east-1:111122223333:secret:prod/orders/db-*"
}
]
}Observability: know what happened
You cannot fix what you cannot see. Treat observability as part of the function, not an afterthought.
- Log structured JSON, not free text. Emit one JSON object per log line with a correlation id, the operation, and key fields. CloudWatch Logs Insights can query JSON natively, so structured logs turn "grep the logs" into real queries and dashboards.
- Use Powertools for AWS Lambda. The Logger, Tracer, and Metrics utilities (available for Python, Node.js, Java, and .NET) give you structured logging, X-Ray tracing, and custom CloudWatch metrics with almost no boilerplate.
- Enable AWS X-Ray. Distributed tracing shows where time goes across API Gateway, Lambda, and downstream services — invaluable for diagnosing latency in event-driven systems.
- Alarm on the signals that matter. Errors, throttles, duration approaching timeout, concurrency near the account limit, and DLQ depth. Configure these as CloudWatch alarms before you go live, not after the first incident.
Architecture: small, configurable, and codified
- Single-purpose functions. A function that does one thing is easier to reason about, secure with a tight IAM role, right-size, and observe. Resist the "one giant Lambda with an if/else router" pattern.
- Configuration via environment, not code. Inject bucket names, table names, and feature flags as environment variables (with secrets coming from Secrets Manager/SSM) so the same artifact promotes cleanly from dev to staging to prod.
- Infrastructure as code. Define functions, roles, triggers, and alarms with AWS SAM, AWS CDK, or Terraform. IaC makes deployments repeatable, reviewable, and reversible — and lets you version the whole stack, which is far more useful than versioning a function in isolation.
- Lambda layers for shared code. Put common dependencies and shared utilities in a layer to keep individual deployment packages small and consistent.
- Choose the right compute. Lambda is ideal for short, event-driven, spiky workloads. For long-running, steady-throughput, or container-heavy work, Fargate or ECS is usually cheaper and simpler.
| Use case | Best fit | Why |
|---|---|---|
| Short, event-driven, spiky traffic | Lambda | Scale to zero, pay per invocation |
| Steady, high-throughput service | Fargate / ECS | Cheaper at sustained load, no 15-min cap |
| Long-running or multi-step workflow | Step Functions + Lambda | Orchestrate without paying Lambda to wait |
| Container-heavy / custom runtime | Fargate / ECS or Lambda container image | Full control over the runtime image |
A handler that follows the rules
The example below puts the core practices together: clients and config initialised once outside the handler, secrets read (and cached) from SSM rather than baked into environment variables, structured JSON logging, input validation, and idempotency keyed on the event. It targets the Python 3.13 runtime on ARM64 — avoid retired runtimes such as Python 2.7, 3.6, or 3.8, and the long-deprecated boto library; use boto3.
import json
import logging
import os
import boto3
# --- init once per execution environment (reused on warm invokes) ---
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Clients created here are reused across warm invocations.
ssm = boto3.client("ssm")
dynamodb = boto3.resource("dynamodb")
TABLE_NAME = os.environ["ORDERS_TABLE"] # config from env
table = dynamodb.Table(TABLE_NAME)
# Read the secret once, not on every invocation.
_DB_PASSWORD = ssm.get_parameter(
Name=os.environ["DB_PASSWORD_PARAM"], WithDecryption=True
)["Parameter"]["Value"]
def _log(level, message, **fields):
"""Emit one structured JSON log line."""
logger.log(level, json.dumps({"message": message, **fields}))
def handler(event, context):
order_id = event.get("order_id")
if not order_id:
_log(logging.WARNING, "missing order_id", request_id=context.aws_request_id)
return {"statusCode": 400, "body": "order_id is required"}
# Idempotency: skip if we have already processed this order.
existing = table.get_item(Key={"order_id": order_id}).get("Item")
if existing and existing.get("status") == "processed":
_log(logging.INFO, "duplicate ignored", order_id=order_id)
return {"statusCode": 200, "body": "already processed"}
table.put_item(Item={"order_id": order_id, "status": "processed"})
_log(logging.INFO, "order processed", order_id=order_id,
request_id=context.aws_request_id)
return {"statusCode": 200, "body": json.dumps({"order_id": order_id})}Deploying that function as infrastructure-as-code keeps the runtime, architecture, memory, and role explicit and reviewable. Here is the equivalent AWS SAM template fragment — note the ARM64 architecture and a modern runtime.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
ProcessOrderFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.handler
Runtime: python3.13
Architectures:
- arm64 # Graviton: better price-performance
MemorySize: 512 # right-size with Lambda Power Tuning
Timeout: 10 # just above real P99, not the 900s max
Environment:
Variables:
ORDERS_TABLE: !Ref OrdersTable
DB_PASSWORD_PARAM: /prod/orders/db-password
Policies: # least-privilege, scoped to this table
- DynamoDBCrudPolicy:
TableName: !Ref OrdersTable
- SSMParameterReadPolicy:
ParameterName: prod/orders/db-passwordApply these patterns consistently and your serverless estate becomes faster, cheaper, and far easier to operate. With 12+ years of AWS work across 50+ delivered projects, our team at MicroPyramid builds and hardens serverless systems end to end — from cold-start tuning and ARM64 migrations to least-privilege IAM and observability. If you want a second set of eyes on a Lambda workload, our AWS consulting services and cloud migration services cover exactly this.
Frequently Asked Questions
What is the single most effective way to reduce Lambda cold starts?
Initialise expensive objects — SDK clients, database connections, configuration — in the global scope outside the handler, and keep your deployment package small. This setup runs once per execution environment and is reused on every warm invocation, and the init phase even gets a CPU burst. When that is not enough for a latency-sensitive API, add Lambda SnapStart (supported for Python, Java, and .NET in 2026) or Provisioned Concurrency.
Should I use ARM64 (Graviton) or x86 for Lambda?
Default to ARM64. Graviton-based functions deliver up to roughly 34% better price-performance than x86 and bill at a lower per-GB-second rate, and for typical Python or Node.js code you usually only change the architecture flag. Fall back to x86 only when a dependency ships no ARM-compatible binary.
Which Lambda runtimes should I use in 2026?
Use current, supported runtimes — Python 3.13 or 3.12, and Node.js 22 or 20 are good defaults. Avoid retired runtimes such as Python 2.7, 3.6, and 3.8, which no longer receive security updates. Also drop the long-deprecated boto library in favour of boto3.
How do I make a Lambda function idempotent?
Derive an idempotency key from the event (an order id or request id), and record processed keys in a fast store like DynamoDB so repeated deliveries become no-ops. This matters because async invocations, SQS, and streams can deliver the same event more than once. Powertools for AWS Lambda provides an idempotency utility that handles this with a decorator.
Where should I store secrets for a Lambda function?
In AWS Secrets Manager or SSM Parameter Store, read at runtime — never in plain environment variables, which are visible to anyone who can read the function configuration. The Lambda extensions for both services cache values in-process, so you get secure secrets without an API call on every invocation.
When should I use Lambda versus Fargate or ECS?
Use Lambda for short, event-driven, spiky workloads that benefit from scaling to zero and per-invocation billing. Move to Fargate or ECS for long-running or steady, high-throughput services where the 15-minute limit is a constraint or sustained compute is cheaper on containers. For long multi-step flows, orchestrate Lambdas with Step Functions instead of letting a function sleep while it waits.