group() vs aggregation framework vs MapReduce in mongodb

Reading Time : ~ .

The group() command, Aggregation Framework and MapReduce are collectively aggregation features of MongoDB. group(): Group Performs simple aggregation operations on a collection documents. Group is similar to GROUP_BY in mysql. Output format : Returns result set inline. Sharding: Its not support in shared environment. Limitations:

  • Will not group into a result set with more than 20,000 keys.(from mongo 2.2 version, in before versions limit is up to 10,000 keys)
  • Results must fit within the limitations of a BSON document (currently 16MB).
  • Takes a read lock and does not allow any other threads to execute JavaScript while it is running.

MapReduce():

  • Can be used for incremental aggregation over large collections.
  • There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.

Output format: MapReduce provides inline, new collection, merge, replace, reduce output options. Sharding: Its supports for both shared and non-shared collections as input and output.If output collection does not exists then MapReduce creates and shards the collection on _id field. Limitations:

  • In MapReduce inline output collection we can't perform find(), sort(), limit() operations.
  • A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
  • The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons: (1)The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code.(2)The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you're saving the output in a collection, the result set must then be converted from JSON back to BSON.

Aggregation Framework:

  • New feature in the MongoDB 2.2.0 production release
  • Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as match, project, sort, group, limit, skip, unwind and geonear.

Output format: Returns result set inline. Sharding: Its supports for both shared and non-shared input collections.When operating with shared collections,It push all operations up to first $group or $sort to all shards,The remaining operations from first $group or $sort are run as second pipeline on shared results.

  • Designed with specific goals of improving performance and usability.
  • Pipeline operators can be repeated as needed.
  • Aggregation frame work is 10 times faster than MapReduce.

Limitations:

  • If any single aggregation operation consumes more than 10 percent of system RAM
  • Output from the pipeline cannot exceed the BSON document size limit.
  • The aggregation pipeline cannot operate on values of the following types: symbol, Minkey, Maxkey, DBRef, Code, CodeWScope
    By Posted On
SENIOR DEVELOPER at MICROPYRAMID

Need any Help in your Project?Let's Talk

Latest Comments
Related Articles
Full text search in mongodb Nikhila Mergu

Full text search is a custom implementation created by the MongoDB developers as a specific index type

Full text search as an index type when ...

Continue Reading...
MongoDB CRUD operations with Python (Pymongo) Rakesh babu Podishetty

MongoDB with Python - Connection establishment, Create, Update, Retrieve and Delete operations explained with sample code.

Continue Reading...
Advanced Querying in MongoDB Nikhila Mergu

Advanced Queries of MongoDB: Inserting records to the database and retrieving data from database.
1. Wrapped Queries: Like, sort, limit, count.
2. Query Using Modifiers: ...

Continue Reading...

Subscribe To our news letter

Subscribe to our news letter to receive latest blog posts into your inbox. Please fill your email address in the below form.
*We don't provide your email contact details to any third parties