group() vs aggregation framework vs MapReduce in mongodb

Reading Time : ~ .

The group() command, Aggregation Framework and MapReduce are collectively aggregation features of MongoDB. group(): Group Performs simple aggregation operations on a collection documents. Group is similar to GROUP_BY in mysql. Output format : Returns result set inline. Sharding: Its not support in shared environment. Limitations:

  • Will not group into a result set with more than 20,000 keys.(from mongo 2.2 version, in before versions limit is up to 10,000 keys)
  • Results must fit within the limitations of a BSON document (currently 16MB).
  • Takes a read lock and does not allow any other threads to execute JavaScript while it is running.

MapReduce():

  • Can be used for incremental aggregation over large collections.
  • There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.

Output format: MapReduce provides inline, new collection, merge, replace, reduce output options. Sharding: Its supports for both shared and non-shared collections as input and output.If output collection does not exists then MapReduce creates and shards the collection on _id field. Limitations:

  • In MapReduce inline output collection we can't perform find(), sort(), limit() operations.
  • A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
  • The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons: (1)The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code.(2)The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you're saving the output in a collection, the result set must then be converted from JSON back to BSON.

Aggregation Framework:

  • New feature in the MongoDB 2.2.0 production release
  • Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as match, project, sort, group, limit, skip, unwind and geonear.

Output format: Returns result set inline. Sharding: Its supports for both shared and non-shared input collections.When operating with shared collections,It push all operations up to first $group or $sort to all shards,The remaining operations from first $group or $sort are run as second pipeline on shared results.

  • Designed with specific goals of improving performance and usability.
  • Pipeline operators can be repeated as needed.
  • Aggregation frame work is 10 times faster than MapReduce.

Limitations:

  • If any single aggregation operation consumes more than 10 percent of system RAM
  • Output from the pipeline cannot exceed the BSON document size limit.
  • The aggregation pipeline cannot operate on values of the following types: symbol, Minkey, Maxkey, DBRef, Code, CodeWScope
    By Posted On
SENIOR DEVELOPER at MICROPYRAMID

Need any Help in your Project?Let's Talk

Latest Comments
Related Articles
MongoDB CRUD operations with Python (Pymongo) Rakesh babu Podishetty

MongoDB with Python - Connection establishment, Create, Update, Retrieve and Delete operations explained with sample code.

Continue Reading...
Advanced Querying in MongoDB Ramya Ambati

Advanced Queries of MongoDB: Inserting records to the database and retrieving data from database.
1. Wrapped Queries: Like, sort, limit, count.
2. Query Using Modifiers: ...

Continue Reading...
Full text search in mongodb Nikhila Mergu

Full text search is a custom implementation created by the MongoDB developers as a specific index type

Full text search as an index type when ...

Continue Reading...

Subscribe To our news letter

Subscribe to our news letter to receive latest blog posts into your inbox. Please fill your email address in the below form.
*We don't provide your email contact details to any third parties