Listing the objects in an S3 bucket sounds trivial until your bucket grows past a thousand keys. Amazon S3's list_objects_v2 API returns at most 1000 keys per call — no matter how many objects actually match. Ask for a bucket with 50,000 objects and a single call hands you the first 1000 and a flag that says "there's more". If you don't paginate, you silently process a fraction of your data.
This guide is a complete, 2026-accurate reference for listing and paginating S3 objects with boto3 (the AWS SDK for Python). We cover the manual ContinuationToken loop (so you understand what's happening), the idiomatic boto3 Paginator (what you should actually use), folder-style listing with Prefix and Delimiter, controlling result size with PaginationConfig, server-side-style filtering with JMESPath, the resource API alternative, and async listing for very large buckets.
Set up an S3 client (use IAM roles, never hardcoded keys)
Create a client from the default credential chain. On EC2, ECS, or Lambda this resolves to the IAM role attached to the compute; locally it uses your AWS SSO session or ~/.aws/credentials. Never paste access keys into source code — hardcoded keys leak through git history and are the single most common cause of compromised AWS accounts.
import boto3
# The client picks up credentials automatically from the default chain:
# IAM role (EC2/ECS/Lambda) -> AWS SSO -> environment vars -> ~/.aws/credentials.
# Do NOT pass aws_access_key_id / aws_secret_access_key in code.
s3 = boto3.client("s3", region_name="us-east-1")The problem: why list_objects_v2 stops at 1000
A bare call returns a response dict whose Contents list holds up to 1000 objects. Two response fields tell you whether the listing is complete:
IsTruncated—Truewhen there are more keys than were returned.NextContinuationToken— an opaque cursor you pass back to fetch the next batch.
MaxKeys lets you lower the page size (e.g. for testing) but cannot raise it above 1000 — that ceiling is enforced by S3, not boto3.
response = s3.list_objects_v2(Bucket="my-bucket", Prefix="logs/")
print(len(response.get("Contents", []))) # 1000 at most, even with 50k objects
print(response["IsTruncated"]) # True -> there is more to fetch
print(response.get("NextContinuationToken")) # opaque cursor for the next pageOption 1 — The manual ContinuationToken loop (educational)
This is the mechanism every higher-level helper wraps. You call list_objects_v2, collect Contents, and while IsTruncated is true, feed NextContinuationToken back in as ContinuationToken. Worth understanding once — but you rarely need to write it by hand.
import boto3
s3 = boto3.client("s3")
keys = []
continuation_token = None
while True:
kwargs = {"Bucket": "my-bucket", "Prefix": "logs/"}
if continuation_token:
kwargs["ContinuationToken"] = continuation_token
response = s3.list_objects_v2(**kwargs)
# .get(..., []) guards an empty bucket/prefix: no "Contents" key is returned.
for obj in response.get("Contents", []):
keys.append(obj["Key"])
if response.get("IsTruncated"):
continuation_token = response["NextContinuationToken"]
else:
break
print(f"Found {len(keys)} objects")Option 2 — The boto3 Paginator (the right way)
boto3 ships paginators that own the token bookkeeping for you. Call get_paginator("list_objects_v2"), then iterate paginate(...). Each iteration yields one page (one underlying API call); you just read Contents. This is the idiomatic, recommended approach — less code, no token bugs.
import boto3
s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
page_iterator = paginator.paginate(Bucket="my-bucket", Prefix="logs/")
keys = []
for page in page_iterator:
for obj in page.get("Contents", []): # guard the empty case here too
keys.append(obj["Key"])
print(f"Found {len(keys)} objects")Handling an empty bucket or prefix
When nothing matches, S3 omits the Contents key entirely — page["Contents"] raises KeyError. Always read it with page.get("Contents", []) (as above). The paginator still yields one empty page, so your loop runs safely and keys stays [].
Folder-style listing with Prefix and Delimiter
S3 has no real folders — keys are flat strings like images/2026/cat.jpg. To list one "level" as if it were a directory, combine Prefix with Delimiter="/". S3 then collapses everything under each sub-path into CommonPrefixes (your "sub-folders"), while Contents holds only the files directly at that level.
s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
pages = paginator.paginate(Bucket="my-bucket", Prefix="images/", Delimiter="/")
for page in pages:
# "Sub-folders" one level under images/ e.g. images/2026/
for cp in page.get("CommonPrefixes", []):
print("dir: ", cp["Prefix"])
# Files that live directly in images/ (not in a sub-path)
for obj in page.get("Contents", []):
print("file:", obj["Key"])Control the result set with PaginationConfig
paginate() accepts a PaginationConfig dict to bound the work without manual slicing:
MaxItems— total number of items to return across all pages, then stop.PageSize— keys requested per API call (capped at 1000 by S3).StartingToken— resume from a token captured in a previous run.
This is how you cap LIST traffic. S3 LIST requests are billed per 1000, so blindly walking a huge bucket on every run adds up — fetch only what you need.
paginator = s3.get_paginator("list_objects_v2")
page_iterator = paginator.paginate(
Bucket="my-bucket",
Prefix="logs/",
PaginationConfig={
"MaxItems": 5000, # stop after at most 5000 keys
"PageSize": 1000, # keys per API call (must be <= 1000)
"StartingToken": None, # resume point from a prior run, if any
},
)
# After consuming the iterator, capture a resume token for next time:
token = page_iterator.resume_tokenFilter efficiently with JMESPath .search()
A paginator's .paginate(...) result has a .search() method that runs a JMESPath expression across every page and yields matching items lazily — no nested loops, no intermediate lists. It's the cleanest way to project or filter fields. (The filtering happens client-side as pages arrive; list_objects_v2 itself only filters by Prefix server-side.)
paginator = s3.get_paginator("list_objects_v2")
pages = paginator.paginate(Bucket="my-bucket", Prefix="logs/")
# Keys of non-empty objects, flattened across all pages, streamed lazily.
non_empty_keys = pages.search("Contents[?Size > `0`][].Key")
for key in non_empty_keys:
print(key)
# Other handy expressions:
# "Contents[?ends_with(Key, '.json')].Key" -> only .json files
# "Contents | sort_by(@, &LastModified)[-1].Key" -> most recently modifiedAlternative: the resource API Bucket.objects.filter()
The higher-level resource interface paginates transparently behind a lazy collection. bucket.objects.filter(Prefix=...) yields ObjectSummary objects you can iterate directly — attribute access (obj.key, obj.size) instead of dict keys, and you never see a token. It's the most Pythonic option, though the client paginator gives you finer control and is more widely used in production code.
import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket("my-bucket")
# .filter() handles pagination internally; iterate as a normal generator.
for obj in bucket.objects.filter(Prefix="logs/"):
print(obj.key, obj.size)
# .all() lists everything; .limit(n) caps it; both still paginate under the hood.
first_100 = list(bucket.objects.filter(Prefix="logs/").limit(100))Listing huge buckets asynchronously
For buckets with millions of objects, or when listing many prefixes concurrently, the synchronous loop becomes the bottleneck. aioboto3 wraps boto3 with async/await so you can paginate without blocking — useful inside async web apps or fan-out jobs.
import asyncio
import aioboto3
async def list_keys(bucket, prefix=""):
session = aioboto3.Session()
async with session.client("s3") as s3:
paginator = s3.get_paginator("list_objects_v2")
async for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get("Contents", []):
print(obj["Key"])
asyncio.run(list_keys("my-bucket", "logs/"))Which approach should you use?
| Approach | Pagination | Style | Best for |
|---|---|---|---|
Manual ContinuationToken loop |
You manage it | Dict access | Learning the API; custom checkpoint/resume logic |
get_paginator("list_objects_v2") |
Automatic | Dict access | The default choice for almost everything |
Paginator + JMESPath .search() |
Automatic | Expression | Filtering/projecting fields with minimal code |
Resource Bucket.objects.filter() |
Automatic | Attribute access | Pythonic iteration over object attributes |
Rule of thumb: reach for the Paginator by default, add JMESPath .search() when you need to filter, and keep the manual loop only when you need bespoke resume/checkpoint behaviour.
Related reading
- How to mount an S3 bucket on local disk — when a filesystem mount beats the SDK, and when it doesn't.
- Using AWS Lambda with S3 and DynamoDB — event-driven processing of the keys you list.
- Amazon AWS IAM roles and policies — the least-privilege
s3:ListBucketsetup that powers these calls. - CORS with Amazon S3 and CloudFront — serving the objects you've catalogued to the browser.
We've shipped AWS-backed Python systems for startups and enterprises for 12+ years across 50+ projects. If you're wiring up large-scale S3 workflows, our AWS consulting services and cloud migration services teams can help you do it cost-efficiently and securely.
Frequently Asked Questions
Why does list_objects_v2 only return 1000 keys?
The 1000-key cap is a hard limit enforced by the Amazon S3 service itself, not by boto3. Every list_objects_v2 (and the older list_objects) call returns at most 1000 objects in Contents and sets IsTruncated to True when more exist. You fetch the rest by passing NextContinuationToken back as ContinuationToken, or by letting a paginator do it for you. MaxKeys can only lower this number, never raise it.
Should I use the Paginator or a manual ContinuationToken loop?
Use the Paginator. s3.get_paginator("list_objects_v2") handles the ContinuationToken bookkeeping for you, eliminating a common class of bugs where you forget to check IsTruncated and silently process only the first 1000 keys. Write the manual loop only when you need custom behaviour the paginator doesn't expose, such as persisting a resume checkpoint between separate process runs.
How do I list only one "folder" in S3?
S3 keys are flat strings, but you can emulate a single directory level by passing both Prefix (the folder path) and Delimiter="/". S3 then returns immediate sub-folders in CommonPrefixes and the files at that level in Contents, instead of recursively returning every key beneath the prefix. Drop the delimiter to list everything under the prefix recursively.
How do I filter S3 results efficiently?
list_objects_v2 only filters server-side by Prefix, so narrow your prefix as much as possible first to reduce the number of keys returned (and LIST requests billed). For anything beyond a prefix — size, extension, last-modified — use the paginator's JMESPath .search() method, e.g. an expression like Contents[?ends_with(Key, '.json')].Key, which filters lazily across pages with minimal Python.
What's the difference between list_objects_v2 and the resource API?
list_objects_v2 is a low-level client call returning raw response dicts; you read page["Contents"] and manage pagination via tokens or a paginator. The resource API's bucket.objects.filter() is a higher-level abstraction that paginates transparently and yields ObjectSummary objects with attribute access (obj.key, obj.size). Use the resource API for concise, Pythonic iteration; use the client paginator for finer control.
How do I handle buckets with millions of objects?
Always paginate (never a single call), narrow with Prefix, and use PaginationConfig with MaxItems/PageSize plus StartingToken to checkpoint and resume long runs. Remember LIST requests are billed per 1000, so listing a massive bucket repeatedly adds up — cache or store results when you can. For high concurrency, list multiple prefixes in parallel with aioboto3, or offload very large inventories to S3 Inventory reports instead of live LIST calls.