Advanced MongoDB queries combine rich query operators (comparison, logical, element, array, and evaluation) inside find() with the aggregation pipeline ($match, $group, $lookup, $unwind, $facet, $setWindowFields, and more) to filter, join, reshape, and compute over documents on the server. On MongoDB 7.x/8.x the aggregation framework is the modern, fully supported way to run complex reads and analytics — mapReduce is deprecated and should be avoided in new code.
This guide shows the operators and pipeline stages you actually need in production, with runnable examples in both mongosh (the MongoDB shell) and PyMongo (Python), plus the indexing rules that keep them fast.
Key takeaways
find()is for filtering and projection; the aggregation pipeline is for joins, grouping, reshaping, and analytics.- Group query operators by job: comparison (
$gt,$in, ...), logical ($and,$or, ...), element ($exists,$type), evaluation ($regex,$expr,$jsonSchema), and array ($all,$elemMatch,$size). - Query nested fields and arrays with dot notation (
"address.city","items.sku"); use$elemMatchwhen several conditions must match the same array element. - Design compound indexes with the ESR rule — Equality, Sort, Range — and validate every query with
explain("executionStats"). - Prefer covered queries (served entirely from an index) and avoid
$whereand unanchored$regex, which force collection scans.
Sample data
The examples below use two collections — users and orders — in a shop database. A user and an order document look like this:
// users collection
{
_id: ObjectId("6500a1..."),
name: "Asha Rao",
email: "asha@example.com",
age: 29,
status: "active",
city: "Hyderabad",
tags: ["mongodb", "python"],
address: { country: "IN", city: "Hyderabad", pin: "500081" },
scores: [82, 91, 77]
}
// orders collection
{
_id: ObjectId("6500b2..."),
userId: ObjectId("6500a1..."),
status: "shipped",
total: 250,
items: [
{ sku: "BOOK-01", qty: 1, price: 200 },
{ sku: "PEN-09", qty: 5, price: 10 }
],
placedAt: ISODate("2026-05-02T10:15:00Z")
}Query operators at a glance
Inside a find() filter, operators start with $ and sit inside the field's value object — for example { age: { $gte: 18 } }. A bare { age: 18 } is just shorthand for { age: { $eq: 18 } }. These are the operators you will reach for most:
| Group | Operators | What they do |
|---|---|---|
| Comparison | $eq $ne $gt $gte $lt $lte $in $nin |
Match by equality, range, or set membership |
| Logical | $and $or $nor $not |
Combine or negate conditions |
| Element | $exists $type |
Match on field presence or BSON type |
| Evaluation | $regex $expr $mod $jsonSchema $text |
Pattern match, compare fields, modulo, schema validation |
| Array | $all $elemMatch $size |
Match array contents, a single element, or length |
Comparison and logical operators
Comparison operators filter by value; logical operators combine those conditions. Remember that multiple keys in one filter object are an implicit AND — you only need an explicit $and when you repeat the same operator on the same field.
// mongosh — comparison
db.users.find({ age: { $gte: 18, $lt: 30 } }) // ages 18..29
db.users.find({ status: { $in: ["active", "trial"] } }) // set membership
db.users.find({ status: { $ne: "banned" } }) // not equal
// logical: active users who are either under 25 OR in Hyderabad
db.users.find({
status: "active",
$or: [{ age: { $lt: 25 } }, { city: "Hyderabad" }]
})
// $nor: neither banned nor under-age
db.users.find({ $nor: [{ status: "banned" }, { age: { $lt: 18 } }] })# PyMongo — the same queries (operators are identical; use Python dicts)
from pymongo import MongoClient
db = MongoClient("mongodb://localhost:27017").shop
db.users.find({"age": {"$gte": 18, "$lt": 30}})
db.users.find({"status": {"$in": ["active", "trial"]}})
db.users.find({
"status": "active",
"$or": [{"age": {"$lt": 25}}, {"city": "Hyderabad"}],
})Element and evaluation operators
Use $exists and $type to handle MongoDB's flexible schema, and the evaluation operators for pattern matching, field-to-field comparison, and validation.
$exists— match documents where a field is present (or absent).$type— match by BSON type ("string","int","array","date", ...).$regex— pattern-match strings; anchor with^so an index can be used.$expr— use aggregation expressions insidefind(), e.g. compare two fields of the same document.$mod— match whenfield % divisor == remainder.$jsonSchema— validate documents against a JSON Schema (also used as a collection validator).
// field presence and BSON type
db.users.find({ phone: { $exists: false } })
db.users.find({ age: { $type: "int" } })
// anchored, case-insensitive regex — ^ lets an index help
db.users.find({ name: { $regex: /^asha/i } })
// $expr: compare two fields in the SAME document
db.orders.find({ $expr: { $gt: ["$total", "$minSpend"] } })
// $mod: orders whose total is an even number
db.orders.find({ total: { $mod: [2, 0] } })
// $jsonSchema: documents that do not satisfy the email rule
db.users.find({
$jsonSchema: {
required: ["email"],
properties: { email: { bsonType: "string", pattern: "@" } }
}
})Querying arrays and embedded documents
Reach into embedded documents and arrays with dot notation, and quote the path: "address.city", "items.sku".
{ "address.city": "Hyderabad" }— match a field inside an embedded document.{ tags: "mongodb" }— matches when thetagsarray contains that value.$all— the array must contain all listed values (order-independent).$size— the array has an exact length.$elemMatch— several conditions must match the same array element. Without it,{ "items.qty": { $gte: 5 }, "items.price": { $gt: 100 } }may match across different elements.
// mongosh
db.users.find({ "address.city": "Hyderabad" }) // embedded field
db.users.find({ tags: { $all: ["mongodb", "python"] } }) // contains both
db.users.find({ tags: { $size: 2 } }) // exactly 2 tags
// $elemMatch: one order item with qty >= 5 AND price > 100
db.orders.find({
items: { $elemMatch: { qty: { $gte: 5 }, price: { $gt: 100 } } }
})Projections, sorting, paging, and cursors
A projection controls which fields come back: include with 1 or exclude with 0 (you cannot mix include and exclude, except to drop _id). For arrays, project with $slice or the positional $elemMatch projection.
find() returns a cursor — a server-side pointer that streams results in batches and fetches nothing until you iterate it. Chain .sort(), .skip(), and .limit() on the cursor (MongoDB applies sort, then skip, then limit). For large result sets prefer range-based (keyset) paging (_id > lastSeen) over large skip() values, which still walk the skipped documents. Use .collation() for locale-aware, case-insensitive sorting and matching.
// projection: include name + email, drop _id
db.users.find({ status: "active" }, { _id: 0, name: 1, email: 1 })
// $slice: return only the first 2 scores
db.users.find({}, { scores: { $slice: 2 } })
// positional projection: only the matching tag element
db.users.find({ tags: "mongodb" }, { "tags.$": 1 })
// sort -> skip -> limit (page 3, 20 per page), newest age first
db.users.find().sort({ age: -1 }).skip(40).limit(20)
// case-insensitive sort via collation (strength: 2)
db.users.find().sort({ name: 1 }).collation({ locale: "en", strength: 2 })
// keyset paging — faster than a large skip()
db.users.find({ _id: { $gt: lastId } }).sort({ _id: 1 }).limit(20)# PyMongo — projection, sort, paging, and cursor iteration
cursor = (
db.users.find({"status": "active"}, {"_id": 0, "name": 1, "email": 1})
.sort("age", -1)
.skip(40)
.limit(20)
)
for doc in cursor: # the cursor streams batches lazily
print(doc["name"])
# count without pulling documents
db.users.count_documents({"status": "active"})The aggregation pipeline
For anything beyond filtering — joins, grouping, computed fields, analytics — use the aggregation pipeline: an ordered array of stages, each transforming the stream of documents and passing it to the next. This is the modern, supported path on MongoDB 7.x/8.x; mapReduce is deprecated and the legacy group()/distinct() helpers are far less capable (see group vs aggregation framework vs mapReduce in MongoDB).
| Stage | Purpose |
|---|---|
$match |
Filter documents (place early; uses indexes) |
$project / $addFields / $set / $unset |
Reshape, add, or drop fields |
$group |
Aggregate with $sum, $avg, $min, $max, $push |
$sort / $limit / $skip |
Order and paginate |
$lookup |
Left-outer join to another collection |
$unwind |
Flatten an array into one document per element |
$facet |
Run several sub-pipelines in one pass |
$bucket / $bucketAuto |
Group values into ranges (histograms) |
$setWindowFields |
Window functions (running totals, ranks) |
$merge / $out |
Write results back to a collection |
Put $match and $sort as early as possible so they can use indexes before the pipeline materializes documents.
// Revenue per status for orders placed in 2026, highest first
db.orders.aggregate([
{ $match: { placedAt: { $gte: ISODate("2026-01-01") } } },
{ $group: {
_id: "$status",
revenue: { $sum: "$total" },
orders: { $sum: 1 },
avgTotal:{ $avg: "$total" }
} },
{ $sort: { revenue: -1 } }
])Joins with $lookup and flattening with $unwind
$lookup performs a left-outer join, placing matches into an array field; $unwind then flattens an array so each element becomes its own document — useful before grouping by joined or nested fields.
// Join each order to its user, then keep order + customer name
db.orders.aggregate([
{ $lookup: {
from: "users",
localField: "userId",
foreignField: "_id",
as: "user"
} },
{ $unwind: "$user" }, // user[] -> single user doc
{ $project: { _id: 0, total: 1, status: 1, customer: "$user.name" } }
])
// Total quantity sold per SKU (unwind the items array first)
db.orders.aggregate([
{ $unwind: "$items" },
{ $group: { _id: "$items.sku", qtySold: { $sum: "$items.qty" } } },
{ $sort: { qtySold: -1 } }
])Facets, buckets, window functions, and writing results
$facetruns multiple sub-pipelines over the same input in a single query — ideal for dashboards (a page of results, a total count, and price buckets at once).$bucket/$bucketAutobuild histograms by grouping values into ranges.$setWindowFieldsadds window functions (running totals, moving averages,$rank,$denseRank) without collapsing rows the way$groupdoes.$mergeand$outpersist pipeline output to a collection — the basis of on-demand materialized views.
// $facet: a page of orders + a total count + price bands, in one pass
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $facet: {
page: [{ $sort: { placedAt: -1 } }, { $limit: 20 }],
total: [{ $count: "count" }],
byBand: [{ $bucket: {
groupBy: "$total",
boundaries: [0, 100, 500, 1000],
default: "1000+",
output: { n: { $sum: 1 } }
} }]
} }
])
// $setWindowFields: running revenue per user, ordered by date
db.orders.aggregate([
{ $setWindowFields: {
partitionBy: "$userId",
sortBy: { placedAt: 1 },
output: {
runningTotal: {
$sum: "$total",
window: { documents: ["unbounded", "current"] }
}
}
} }
])# PyMongo — aggregation pipeline (revenue per status)
from datetime import datetime
pipeline = [
{"$match": {"placedAt": {"$gte": datetime(2026, 1, 1)}}},
{"$group": {
"_id": "$status",
"revenue": {"$sum": "$total"},
"orders": {"$sum": 1},
}},
{"$sort": {"revenue": -1}},
]
for row in db.orders.aggregate(pipeline):
print(row["_id"], row["revenue"])Performance: indexes, explain, and the ESR rule
Advanced queries are only as fast as their indexes. Three rules cover most cases:
- Index for your query shape. Build a compound index ordered by the ESR rule — Equality, Sort, Range: equality (exact-match) fields first, then the field you
sorton, then range fields ($gt/$lt) last. - Measure, don't guess. Run
explain("executionStats")and read the winning plan:IXSCAN(good) vsCOLLSCAN(a full scan),totalDocsExaminedvsnReturned, and whether an in-memorySORTstage appears. - Aim for covered queries. When an index contains every field the query needs (filter + projection), MongoDB serves it from the index alone and reports
totalDocsExamined: 0.
Avoid the slow paths: $where (runs arbitrary JavaScript per document, no index), unanchored $regex (/foo/ scans, while /^foo/ can use an index), and negation operators like $ne/$nin, which are rarely selective. For real search, use a text index or Atlas Search instead of regex — see full-text search in MongoDB.
// Compound index following ESR: equality (status), sort (placedAt), range (total)
db.orders.createIndex({ status: 1, placedAt: -1, total: 1 })
// Inspect the query plan
db.orders.find({ status: "shipped", total: { $gt: 100 } })
.sort({ placedAt: -1 })
.explain("executionStats")
// Covered query: the index has every field, projection excludes _id
db.users.createIndex({ status: 1, email: 1 })
db.users.find({ status: "active" }, { _id: 0, status: 1, email: 1 })
.explain("executionStats") // expect totalDocsExamined: 0Related reading
- MongoDB CRUD operations with Python (PyMongo) — the insert, update, and delete foundations behind these queries.
- group vs aggregation framework vs mapReduce in MongoDB — why the pipeline replaced the legacy helpers.
- Full-text search in MongoDB — text indexes and Atlas Search beyond
$regex.
Modernizing a slow MongoDB workload or moving data between engines? Our database migration services team tunes schemas, indexes, and aggregation pipelines for production scale.
Frequently Asked Questions
What is the difference between find() and the aggregation pipeline in MongoDB?
find() filters documents and projects fields — it is the fastest path for simple reads. The aggregation pipeline passes documents through ordered stages ($match, $group, $lookup, $unwind, and more) to join, group, compute, and reshape data on the server. Use find() for filtering and projection; use aggregation for joins, analytics, and transformations.
When should I use $elemMatch in a MongoDB query?
Use $elemMatch when two or more conditions must match the same element of an array. A plain filter like { "items.qty": { $gte: 5 }, "items.price": { $gt: 100 } } can be satisfied by different array elements, whereas { items: { $elemMatch: { qty: { $gte: 5 }, price: { $gt: 100 } } } } requires a single element to satisfy both conditions.
What is the ESR rule for MongoDB indexes?
ESR stands for Equality, Sort, Range — the recommended field order for a compound index. Put fields used in exact-match (equality) conditions first, then the field you sort on, then range fields ($gt, $lt) last. This lets one index serve both the filter and the sort, avoiding a separate in-memory sort step.
Is mapReduce still supported in MongoDB?
mapReduce still runs but is deprecated and is slower and harder to maintain than the aggregation pipeline. For all new analytics and transformation work on MongoDB 7.x/8.x, use the aggregation pipeline ($group, $facet, $setWindowFields, $merge), which is optimized and actively developed.
How do I query nested fields and arrays in MongoDB?
Use dot notation with a quoted path: { "address.city": "Hyderabad" } matches an embedded field, and { "items.sku": "BOOK-01" } matches inside an array of sub-documents. A bare value against an array field ({ tags: "mongodb" }) matches when the array contains that value; use $all, $size, or $elemMatch for richer array conditions.
How can I tell whether my MongoDB query uses an index?
Run the query with .explain("executionStats") and read the winning plan. IXSCAN means an index is used; COLLSCAN means a full collection scan. Compare totalDocsExamined to nReturned — close numbers are healthy. A covered query reports totalDocsExamined: 0 because it is served entirely from the index.