MONGODB GROUP() VS MAPREDUCE VS AGGREGATION Framework
2022-07-21
The group() command, Aggregation Framework and MapReduce are collectively aggregation features of MongoDB.group(): Group Performs simple aggregation operations on a collection documents. Group is similar to GROUP_BY in mysql. Output format : Returns result set inline. Sharding: Its not support in shared environment. Limitations:
Will not group into a result set with more than 20,000 keys.(from mongo 2.2 version, in before versions limit is up to 10,000 keys)
Results must fit within the limitations of a BSON document (currently 16MB).
Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
MapReduce():
Can be used for incremental aggregation over large collections.
There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.
Output format: MapReduce provides inline, new collection, merge, replace, reduce output options. Sharding: Its supports for both shared and non-shared collections as input and output.If output collection does not exists then MapReduce creates and shards the collection on _id field. Limitations:
In MapReduce inline output collection we can't perform find(), sort(), limit() operations.
A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons: (1)The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code.(2)The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you're saving the output in a collection, the result set must then be converted from JSON back to BSON.
Aggregation Framework:
New feature in the MongoDB 2.2.0 production release
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as match, project, sort, group, limit, skip, unwind and geonear.
Output format: Returns result set inline. Sharding: Its supports for both shared and non-shared input collections.When operating with shared collections,It push all operations up to first $group or $sort to all shards,The remaining operations from first $group or $sort are run as second pipeline on shared results.
Designed with specific goals of improving performance and usability.
Pipeline operators can be repeated as needed.
Aggregation frame work is 10 times faster than MapReduce.
Limitations:
If any single aggregation operation consumes more than 10 percent of system RAM
Output from the pipeline cannot exceed the BSON document size limit.
The aggregation pipeline cannot operate on values of the following types: symbol, Minkey, Maxkey, DBRef, Code, CodeWScope