Combining AWS Lambda, Amazon S3, and DynamoDB gives you a fully serverless, event-driven pipeline: a file lands in an S3 bucket, S3 fires an event that invokes a Lambda function, the function reads or processes the object, and writes a structured record into DynamoDB. There are no servers to manage, you pay only for what runs, and the whole thing scales automatically from zero to thousands of concurrent invocations.
Use this pattern when you need to react to objects as they arrive — ingesting uploads, parsing CSV or JSON files, generating thumbnails, extracting metadata, or building a searchable index of everything in a bucket. If your processing is steady-state and runs 24/7, a container or a long-lived service may be cheaper; but for spiky, event-triggered work, this S3 → Lambda → DynamoDB pattern is hard to beat.
This guide walks through a production-ready version end to end with modern Python 3.13 and boto3, including the least-privilege IAM policy, idempotency, error handling, and the pitfalls (recursive triggers, Decimal errors, timeouts) that trip people up. At MicroPyramid we have shipped serverless data pipelines on AWS for 12+ years across 50+ projects, and the practices below are the ones we actually deploy.
Architecture overview
The flow has three moving parts and one important security boundary:
- Amazon S3 stores the incoming objects and emits an event notification the instant an object is created (
s3:ObjectCreated:*). - AWS Lambda is invoked by that event. The event payload tells your code which bucket and key changed; your handler reads the object and transforms it.
- Amazon DynamoDB receives the result via
put_item— a fast, fully managed key-value/document store that scales with your traffic.
The security boundary is the Lambda execution role: a narrowly scoped IAM role that grants the function permission to read from that one bucket and write to that one table — nothing more. Get this right and the rest is straightforward.
┌──────────┐ ObjectCreated ┌─────────────┐ put_item ┌──────────────┐
│ Amazon │ ────────────────► │ AWS Lambda │ ───────────► │ DynamoDB │
│ S3 │ event payload │ (Python) │ record │ table │
└──────────┘ └─────────────┘ └──────────────┘
│ logs
▼
CloudWatch Logs
Prerequisites
- An AWS account with permission to create Lambda functions, S3 buckets, DynamoDB tables, and IAM roles.
- The AWS CLI v2 configured with an IAM user or role (never the account root user, and never long-lived root access keys).
- Basic Python familiarity. Lambda ships boto3 in the runtime, so you usually do not need to package the AWS SDK yourself.
Security first: do not create access keys for the root account, and do not paste credentials into your function code. Lambda gets its permissions from its execution role automatically via temporary, rotated credentials — that is the entire point of using a role.
Create the S3 bucket and DynamoDB table
Create a bucket (names are globally unique) and a DynamoDB table with a simple primary key. We will key the table on file_key (the S3 object key) so each processed object maps to exactly one item — which also gives us natural idempotency.
# Region is set here; reuse the same region for every resource in this pipeline.
AWS_REGION=ap-south-1
BUCKET=my-ingest-bucket-2026
TABLE=ProcessedFiles
# Create the S3 bucket (us-east-1 omits the LocationConstraint flag)
aws s3api create-bucket \
--bucket "$BUCKET" \
--region "$AWS_REGION" \
--create-bucket-configuration LocationConstraint="$AWS_REGION"
# Block all public access on the bucket (recommended default)
aws s3api put-public-access-block \
--bucket "$BUCKET" \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
# Create an on-demand (pay-per-request) DynamoDB table keyed on file_key
aws dynamodb create-table \
--table-name "$TABLE" \
--attribute-definitions AttributeName=file_key,AttributeType=S \
--key-schema AttributeName=file_key,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region "$AWS_REGION"PAY_PER_REQUEST (on-demand) billing means you pay per read/write with no capacity planning — a good default for unpredictable, event-driven workloads.
The Lambda function
Every Lambda has an entry point with the signature lambda_handler(event, context). For an S3 trigger, event["Records"] is a list — a single invocation can carry more than one record, so always iterate. Each record exposes the bucket name and object key:
- Bucket:
record["s3"]["bucket"]["name"] - Key:
record["s3"]["object"]["key"]
Critical gotcha: the object key in the event is URL-encoded. A file named my report.csv arrives as my+report.csv, and folder/2026.csv may contain %2F. Always decode it with urllib.parse.unquote_plus before you call S3, or you will get NoSuchKey errors on anything with spaces or special characters.
The handler below streams the object from S3 (no need to download to /tmp), counts the rows of a CSV as a trivial "transformation", and writes a record to DynamoDB using the resource-level Table API, which accepts native Python types directly.
import os
import csv
import json
import logging
from datetime import datetime, timezone
from decimal import Decimal
from urllib.parse import unquote_plus
import boto3
from botocore.exceptions import ClientError
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Reuse clients across invocations: they are created once when the
# execution environment is initialised, not on every request.
s3 = boto3.client("s3")
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["TABLE_NAME"])
def lambda_handler(event, context):
"""Triggered by S3 ObjectCreated events. Reads each object and
writes a metadata record to DynamoDB. Returns a partial-batch
failure response so only failed records are retried."""
batch_item_failures = []
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
# The key is URL-encoded in the event payload; always decode it.
key = unquote_plus(record["s3"]["object"]["key"])
# Skip "folder" placeholder objects.
if key.endswith("/"):
continue
try:
obj = s3.get_object(Bucket=bucket, Key=key)
size_bytes = obj["ContentLength"]
# Example transformation: count CSV rows by streaming the body.
row_count = 0
if key.lower().endswith(".csv"):
body = obj["Body"].read().decode("utf-8")
row_count = sum(1 for _ in csv.reader(body.splitlines()))
# put_item is idempotent for our schema: re-processing the same
# key simply overwrites the item, so retries are safe.
table.put_item(
Item={
"file_key": key, # partition key (String)
"bucket": bucket,
"size_bytes": Decimal(size_bytes), # numbers -> Decimal
"row_count": Decimal(row_count),
"processed_at": datetime.now(timezone.utc).isoformat(),
}
)
logger.info("Processed s3://%s/%s (%d bytes)", bucket, key, size_bytes)
except ClientError as exc:
logger.exception("Failed to process s3://%s/%s", bucket, key)
# Surface the failure so Lambda retries only this record.
event_id = record.get("responseElements", {}).get("x-amz-request-id", key)
batch_item_failures.append({"itemIdentifier": event_id})
return {"batchItemFailures": batch_item_failures}A few things worth calling out:
- Clients are created at module scope, outside the handler, so they are reused across warm invocations instead of being rebuilt every time.
- Numbers go in as
Decimal. The DynamoDB resource mapsdecimal.Decimalto the Number type. Passing a raw PythonfloatraisesTypeError: Float types are not supported. Use Decimal types instead. - The table name comes from an environment variable (
TABLE_NAME), not a hardcoded string — so the same code deploys to dev, staging, and prod unchanged.
Wiring the S3 trigger (event notification)
For S3 to invoke your function, two things must be in place: S3 needs permission to call Lambda, and the bucket needs an event notification configured.
If you set this up in the Lambda console by adding an S3 trigger, both steps happen for you. To do it explicitly:
# 1. Allow S3 to invoke this specific function (resource-based permission)
aws lambda add-permission \
--function-name s3-to-dynamodb \
--statement-id s3invoke \
--action lambda:InvokeFunction \
--principal s3.amazonaws.com \
--source-arn "arn:aws:s3:::$BUCKET" \
--source-account "$(aws sts get-caller-identity --query Account --output text)"
# 2. Tell the bucket to notify Lambda on object creation
aws s3api put-bucket-notification-configuration \
--bucket "$BUCKET" \
--notification-configuration '{
"LambdaFunctionConfigurations": [{
"LambdaFunctionArn": "arn:aws:lambda:'$AWS_REGION':ACCOUNT_ID:function:s3-to-dynamodb",
"Events": ["s3:ObjectCreated:*"]
}]
}'You can also filter notifications by key prefix or suffix (for example, only uploads/ objects ending in .csv) inside the Filter block of the configuration — useful for keeping unrelated writes from invoking your function.
The IAM execution role (least privilege)
This is the part people most often get wrong by reaching for AdministratorAccess or a wildcard Resource: "*". Don't. Scope the policy to exactly the actions and ARNs this function needs:
s3:GetObjecton objects inside the one bucket (arn:aws:s3:::my-ingest-bucket-2026/*).dynamodb:PutItemon the one table (arn:aws:dynamodb:REGION:ACCOUNT_ID:table/ProcessedFiles).- CloudWatch Logs permissions so the function can write logs.
The function does not need s3:PutObject, dynamodb:Scan, list permissions, or access to any other resource — so it does not get them. Here is the policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadSourceObjects",
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-ingest-bucket-2026/*"
},
{
"Sid": "WriteProcessedRecords",
"Effect": "Allow",
"Action": "dynamodb:PutItem",
"Resource": "arn:aws:dynamodb:ap-south-1:ACCOUNT_ID:table/ProcessedFiles"
},
{
"Sid": "WriteLogs",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:ap-south-1:ACCOUNT_ID:log-group:/aws/lambda/s3-to-dynamodb:*"
}
]
}Attach this policy to the role that the Lambda assumes (its trust policy allows lambda.amazonaws.com to assume it). If you are new to roles and policies, our walkthrough on AWS IAM roles and policies covers the trust-policy/permissions-policy split in detail. For larger estates, a structured cloud setup or migration usually starts with exactly this kind of per-function scoping.
Writing to DynamoDB: put_item and the Decimal rule
There are two ways to write to DynamoDB with boto3:
- Resource API (
boto3.resource("dynamodb").Table(...)) — accepts native Python types. This is what we used above and what you almost always want. - Client API (
boto3.client("dynamodb")) — requires the verbose, low-level wire format like{"file_key": {"S": "report.csv"}}. Avoid it unless you need an operation the resource API does not expose.
The single most common error is the Decimal rule: DynamoDB stores all numbers as the Number type, and boto3's resource API refuses Python float to protect you from silent precision loss. Convert numbers to decimal.Decimal before writing, and if you build the item from parsed JSON, convert during deserialization:
import json
from decimal import Decimal
# Parse a JSON object body and make it DynamoDB-safe in one step:
# parse_float=Decimal turns every JSON number into a Decimal.
payload = json.loads(body, parse_float=Decimal)
table.put_item(Item={"file_key": key, **payload})
# Going the other way (reading back), Decimals are JSON-unfriendly.
# Use a custom encoder when you need to serialise an item to JSON:
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, Decimal):
# int if it has no fractional part, else float
return int(o) if o % 1 == 0 else float(o)
return super().default(o)
json.dumps(item, cls=DecimalEncoder)Idempotency, retries, and error handling
S3 event delivery is at-least-once: under rare conditions, the same object can trigger your function more than once. Design the write so that re-processing is harmless. Keying DynamoDB on the object key (as we did) makes put_item naturally idempotent — a duplicate event simply rewrites the same item. If you need stricter once-only semantics, add a ConditionExpression="attribute_not_exists(file_key)" so the second write fails cleanly instead of overwriting.
For failures, lean on the platform rather than swallowing exceptions:
- Asynchronous retries: S3-invoked Lambdas run asynchronously. AWS automatically retries a failed invocation twice (three attempts total) over a few minutes.
- Dead-letter queue (DLQ) / on-failure destination: Configure an SQS queue or SNS topic as the function's on-failure destination so events that still fail after retries are captured for inspection or replay — never silently dropped.
- Partial batch failures: When multiple records arrive in one invocation, return
{"batchItemFailures": [...]}(as the handler does) so only the failed records are retried, not the whole batch.
A note on the original "log the filename and move on" approach: silently catching every exception and returning success hides real failures. Let exceptions that you cannot handle propagate so the retry/DLQ machinery can do its job.
Testing and CloudWatch logs
Test with a realistic S3 event before going live. In the Lambda console, the s3-put sample test event mirrors the real payload shape; just edit the bucket name and key. From the CLI:
aws lambda invoke \
--function-name s3-to-dynamodb \
--payload fileb://test-s3-event.json \
response.json
Everything your function prints or logs goes to Amazon CloudWatch Logs under /aws/lambda/<function-name>. Use the structured logging module (as above) rather than bare print so you get levels and timestamps. To follow logs live during a test:
aws logs tail /aws/lambda/s3-to-dynamodb --follow
For a faster local edit-deploy-test loop, the AWS SAM CLI (sam local invoke) can run the handler against a sample event on your machine without a deploy.
S3 event notifications vs EventBridge
S3 can deliver events two ways. The built-in notification is simplest; EventBridge is more powerful when you outgrow it.
| Aspect | S3 event notifications | Amazon EventBridge (S3 events) |
|---|---|---|
| Setup | Configure directly on the bucket | Enable EventBridge on the bucket, add a rule |
| Targets per event | One destination per event type | Many targets (fan-out) from one event |
| Filtering | Prefix / suffix only | Rich content-based pattern matching |
| Other AWS services as source | S3 only | 200+ services, custom events |
| Replay / archive | No | Yes (event archive & replay) |
| Best for | A single, simple S3 → Lambda hop | Routing, fan-out, multi-consumer pipelines |
Start with native notifications for a single Lambda. Move to EventBridge when you need multiple consumers, content-based routing, or replay.
DynamoDB vs RDS for this pipeline
| DynamoDB | RDS (PostgreSQL/MySQL) | |
|---|---|---|
| Model | Key-value / document | Relational (SQL, joins) |
| Scaling | Automatic, serverless (on-demand) | Provisioned instance; manual scaling |
| Lambda fit | Excellent — no connection pool, scales with Lambda | Connections can exhaust under burst; use RDS Proxy |
| Best for | High-volume writes, simple access by key | Complex queries, joins, transactions |
For "write one record per file, look it up by key" — the workload here — DynamoDB is the natural fit. Reach for RDS when you need ad-hoc relational queries across the data.
Common pitfalls
- Recursive triggers. If your function writes back into the same bucket that triggers it, the new object fires another event — an infinite loop that can run up a real bill fast. Avoid it: write outputs to a different bucket, or scope the trigger with a prefix/suffix filter so processed objects do not re-trigger. AWS now has a built-in recursive-loop detection that can stop runaway Lambda↔S3 loops, but design to avoid them in the first place.
- The
Decimalerror.Float types are not supported. Use Decimal types instead.— convert numbers todecimal.Decimal(and parse JSON withparse_float=Decimal). - Forgetting to URL-decode the key. Spaces and special characters in filenames cause
NoSuchKeyunless youunquote_plusthe key. - Timeouts. The Lambda default timeout is 3 seconds; the maximum is 15 minutes (900 seconds). Set the timeout to cover your slowest realistic object, and raise memory (which also raises CPU) for large files. For anything that may exceed 15 minutes, hand off to AWS Step Functions or AWS Batch.
- Cold starts on large packages. Keep deployment packages lean; the bundled boto3 means you rarely need to ship the SDK yourself.
- Over-broad IAM. A wildcard
Resource: "*"is the most common security finding. Scope to the exact bucket and table ARNs.
Frequently Asked Questions
How does Lambda get triggered by S3?
You configure an event notification on the S3 bucket for s3:ObjectCreated:* and point it at your Lambda function, plus a resource-based permission allowing S3 to invoke the function. When a matching object is created, S3 invokes Lambda asynchronously and passes a JSON event in event["Records"] containing the bucket name and object key. Your handler iterates that list and processes each record.
How do I give Lambda access to S3 and DynamoDB safely?
Through the function's IAM execution role, scoped to least privilege — never long-lived access keys in code, and never Resource: "*". Grant s3:GetObject on the specific bucket ARN (arn:aws:s3:::your-bucket/*) and dynamodb:PutItem on the specific table ARN, plus CloudWatch Logs write permissions. Lambda automatically uses temporary, rotated credentials from that role, so no secrets ever live in your code.
Why am I getting a "Float types are not supported" Decimal error?
DynamoDB stores every number as its Number type, and boto3's resource-level API rejects Python float to avoid silent precision loss. Convert numbers to decimal.Decimal before calling put_item — e.g. Decimal(str(value)) — and when parsing JSON, use json.loads(body, parse_float=Decimal) so all numeric values become Decimal automatically.
How do I avoid recursive S3 triggers?
A recursive loop happens when a function triggered by a bucket writes a new object back into the same bucket, which fires another event. Avoid it by writing outputs to a different bucket, or by adding a prefix/suffix filter to the notification so processed files do not re-trigger the function. AWS also provides built-in recursive-loop detection as a safety net, but you should design the data flow to prevent loops rather than rely on it.
Should I use S3 event notifications or EventBridge?
Use S3 event notifications for a simple, single S3 → Lambda hop — they are built in and require no extra services. Switch to Amazon EventBridge when you need to fan out one event to multiple targets, do content-based filtering beyond prefix/suffix, or archive and replay events. EventBridge is more flexible; native notifications are simpler. Start simple and graduate when the requirements demand it.
Do I need to download the S3 file to disk in Lambda?
Usually no. s3.get_object() returns a streaming body you can read directly in memory, which is faster and avoids filling Lambda's limited /tmp space. Only download to /tmp (default 512 MB, configurable up to 10 GB ephemeral storage) when a tool requires a real file path, such as some image or media libraries. For objects larger than memory, stream and process in chunks.
How big a file can this pattern handle?
Lambda can run up to 15 minutes with up to 10 GB of memory and 10 GB of /tmp ephemeral storage, so it comfortably handles most uploads and data files. For objects that need longer processing, or multi-gigabyte transforms, stream the object in chunks or hand the work to AWS Step Functions, AWS Batch, or AWS Glue. If you are designing a high-throughput pipeline and want a second opinion on the architecture, our AWS consulting team can help.
Related reading
- Easy and fast way to implement an AWS Lambda service — a gentle first Lambda.
- AWS Lambda best practices — performance, packaging, and cost tips.
- Paginating S3 objects using boto3 — for batch-processing an existing bucket, not just new uploads.
Event-driven serverless pipelines are a core part of how we build on AWS at MicroPyramid. If you want this pattern hardened for production — observability, DLQs, idempotency, and tight IAM — that is exactly the kind of work our AWS consulting services cover.