MongoDB Aggregation pipeline operation

MongoDB performs the Aggregation operations where group values from multiple documents are combined and perform various operations to return a single result. MongoDB provides three ways to perform aggregation:

  • Aggregation Pipeline
  • Map-Reduce Function
  • Single Purpose Aggregation methods

The blog covers the Aggregation pipeline usage in detail.

Aggregation Pipeline in MongoDB

The Aggregation pipeline is a framework where documents are processed through multi-stage pipeline and perform various operations to transform the documents to return the aggregated results.  

Points to Consider for MongoDB Aggregation Pipeline

  1. Aggregation Pipeline consist of stages
  2. Each Pipeline Stage performs operation on the documents 
  3. Pipeline stage can produce multiple documents for the received input documents or can perform document filtration
  4. Pipeline stages can appear multiple times in the pipeline
  5. Pipeline stages – $0ut , $merge and $geoNear cannot appear multiple times
  6. Aggregated pipelines can be performed on shared collections

Customer Order Document Aggregation Example

Lets consider that we need to perform the aggregation on the CustomerOrders document and we need to apply the $match and $group stages to return the aggregated results.

Requirement: 

  1. Match the Customer Orders based on Customer Id with status =’InProgress”
  2. Group the aggregated documents based on Customer Id and Total Amount

aggregation-pipeline-example

We can apply the below 2 pipeline stages to perform document aggregation operation

  1. First Stage : Perform $match to filter the customer orders based on status value
  2. Second Stage: Perform $group based on CustId to calculate the aggregated total order amount

Aggregation Syntax:

db.collection.aggregate()  and db.aggregate() methods allows pipeline stages to be performed

db.collection.aggregate( [ { <stage> }, ... ] )
CustomerOrders Aggregation MongoDB Query:

db
.CustomerOrders .aggregate([ { $match: { status: "InProgress" } }, { $group: { _id: "$Custid", total: { $sum: "$Amount" } } } ])

MongoDB Aggregation Pipeline Expressions

Expressions in the MongoDB Aggregation operations helps in evaluating the values by executing the  expression at run-time.  Aggregation operations can perform nested expression which includes

  • Field Paths: allows to access fields in the input documents using $ as prefix . For instance customer.name is the field in the customerDetails Document , then can be accessed using $customer.name
  • Literals:  MongoDB parses string literals with $ as a path to a field.  MongoDB parses Numeric/ Boolean literals in expression objects
  • System Variables: “$<field>” is equivalent to “$$CURRENT.<field>” where the CURRENT is a system variable defaults to the root of the current object
  • Expression Objects:  allows to array of fields with expression as an object 
  • Expression Operators:  Operator expressions are similar to functions which allows array of argument 

MongoDB Aggregation Pipeline Behavior

The aggregation pipeline stages perform operations on a single collection with multiple documents. MongoDB uses the Query Planner to determine if the indexes can be used to improve the pipeline performance. The below given pipeline stage can use index  while processing / scanning the documents in the collection.

$match : uses an index to filter documents if it occurs at the beginning of a pipeline

$sort: uses an index if $sort it is not preceded by a $project, $unwind, or $group stage

$group: uses an index to find the first document in each group based on the given criteria:

  • $group stage is preceded by a $sort stage that sorts the field to group by
  • an index on the grouped field which matches the sort order and
  • the only accumulator used in the $group stage is $first

$geoNear: uses an index  when appear as the first stage in an aggregation pipeline

MongoDB Aggregation Pipeline Easy Filtering

The aggregation pipelines stages with the usage of $sort, $skip and $limit might be helpful when a subset of the documents required from the Collection. 

These stages allows to filter the documents when used at the beginning of the pipeline. 

When a $match pipeline stage is followed by a $sort stage at the start of the pipeline , then optimizer consider it as a single query with a sort operation. In such cases and index can be used also

MongoDB Aggregation Pipeline Limitations

The below given are the limitation for the aggregation operations when used with aggregate commands

  • Result Size restrictions: The aggregate commands when executed returns either the cursor or results in a collection where each document in the result set is set to BSON Document Size (maximum BSON document size is 16 MB) and exceeding this limit will return the error.
  • Memory restrictions: The MongoDB pipeline stages can use 100 MB of maximum RAM and exceeding this limit will return the error. For handling large document processing , use allowDiskUse option to enable aggregation pipeline stages to write data to temporary files 

MongoDB Aggregation Pipeline Optimization

The Aggregation Pipeline Optimization helps in improving the overall pipeline performance. The Aggregation operations passes through the optimization phase where the MongoDB optimizer transforms the aggregation pipeline using the explain option and db.collection.aggregate() method

Refer the below blog for details on MongoDB Aggregation pipeline optimization examples

MongoDB Aggregation Pipeline Optimization Examples