MongoDB Aggregation Pipeline Optimization

The Aggregation Pipeline Optimization helps in improving the overall pipeline performance. The Aggregation operations passes through the optimization phase where the MongoDB optimizer transforms the aggregation pipeline using the explain option and db.collection.aggregate() method.

Aggregation Pipeline Optimization Types

The below given are the optimization types available for Aggregation Pipeline:

  • Project Optimization
  • Pipeline Sequence Optimization
  • Pipeline Coalescence Optimization

Aggregation Pipeline – Project Optimization

The Project Optimization approach allows to determine if the subset of fields in the documents can help in achieving the required results and thus reduces the field data to be passed for the aggregation pipeline

Aggregation Pipeline –  Pipeline Sequence Optimization

The Pipeline sequence optimization allows to perform optimization on the aggregation pipeline which includes projection stage or sequence followed by $match operation. The below given are the possible pipeline sequence optimization types :

($project or $unset or $addFields or $set) + $match Sequence Optimization
  • The aggregation pipeline which includes projection stage ($project or $unset or $addFields or $set )  followed by $match operation
  • The aggregation pipeline could have multiple projection stages
  • There could be multiple $match stage including match filters
  • MongoDB performs optimization by executing $match filters before the projection stage which does not depend

 Consider the Aggregation Pipeline Stages with Sequence Optimization. The $match filter is applied at the end on the required fields from the projection stage . In such cases when the data volume is large , more processing time is consumed and $match filter is applied on the complete document data.

{ $addFields: {
maxSalary: { $max: "$salary" },
minSalary: { $min: "$salary" }
} },
{ $project: {
_id: 1, name: 1, times: 1, maxSalary: 90000, minSalary: 10000,
avgSalary: { $avg: ["$maxSalary", "$minSalary"] }
} },
{ $match: {
name: "Mohit Sharma",
maxSalary: { $lt: 90000},
minSalary: { $gt: 20000 },
avgSalary: { $gt: 45000 }
} }

Now, lets apply the $match filters on the fields which are not dependent on the Projection stage field data to apply sequence optimization 

{ $match: { name: "Mohit Sharma"} },
{ $addFields: {
maxSalary: { $max: "$salary" },
minSalary: { $min: "$salary" }
} },
{ $match: { maxSalary: { $lt: 90000 }, minSalary: { $gt: 20000 } } },
{ $project: {
_id: 1, name: 1, times: 1, maxSalary: 90000, minSalary: 10000,
avgSalary: { $avg: ["$maxSalary", "$minSalary"] }
} },
{ $match: { avgSalary: { $gt:45000 } } }

Multiple $ match filters are applied to filter the stage data based on Name, filter the stage data based on minSalary and maxSalary  and then $match filter which applicable on projection stage data.  

Here , each $match filter is reducing the document which are not applicable based on $match filter and thus improves the overall performance for the aggregation pipeline.

$sort + $match Sequence Optimization

If a sequence with $sort is followed by a $match, the $match moves before the $sort to minimize the No. of objects to sort

Consider the example of MongoDB Aggregation pipeline stage with $sort

{ $sort: { salary : -1 } },
{ $match: { status: 'Active' } }
After the optimization phase
{ $match: { status: 'Active' } },
{ $sort: { salary : -1 } }
$redact + $match Sequence Optimization
  • If the MongoDB aggregation pipeline has the $redact stage followed by $match stage, then it allows the aggregation move the portion of the $match stage before the $redact stage.
  • If  $match stage is added at the start of a pipeline, then aggregation can use an index or can query the collection to filter the documents in the pipeline

Consider the example of MongoDB Aggregation pipeline with below given stage

{ $redact: { $cond: { if: { $eq: [ "$Role", 5 ] }, then: "MANAGER", else: "DEVELOPER" } } },
{ $match: { department: "Finance", location: { $eq: "USA" } } }

In the optimization phase, the MongoDB optimizer transforms it to :


{ $match: { department: "Finance"} },
{ $redact: { $cond: { if: { $eq: [ "$Role", 5 ] }, then: "MANAGER", else: "DEVELOPER" } } },
{ $match: { department: "Finance", location: { $eq: "USA" } } }
$project/$unset + $skip Sequence Optimization

In this case of sequence optimization if a sequence with $project or $unset followed by $skip, then $skip moves before the $project

Consider the example of MongoDB aggregation pipeline with below given stages:

{ $sort: { salary : -1 } },
{ $project: { status: "Active" , name: 1 } },
{ $skip: 5 }

In the optimization phase, the MongoDB optimizer transforms it to :

{ $sort: { salary : -1 } },
{ $skip: 5 },
{ $project: { status: "Active" , name: 1 } }

 

Aggregation Pipeline –  Pipeline Coalescence Optimization

In this optimization scenario, the coalescence occurs after any sequence ordering optimization by placing a pipeline stage before its predecessor.


{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }

The optimizer calescence the $limit with $sort 

{
"$sort" : {
"sortKey" : {
"id" : -1,
},
"limit" : NumberLong(5)
}
},
{ "$project" : {
"id" : 1,
"status" : "OPEN",
"name" : 1
}
}
$sort + $limit Coalescence

Consider an scenario where a $sort precedes a $limit, then optimizer can coalesce the $limit into the $sort if no intervening stages (e.g. $unwind, $group) modify the number of documents

$limit + $limit Coalescence

Consider an scenario where a $limit is followed by $limit, then both $limit stages can coalesce into a single $limit considering the smaller amount from the two $limit

{ $limit: 5},
{ $limit: 12 }

The optimizer coalesce into a single $limit with the smaller $limit value

{ $limit: 5}
$skip + $skip Coalescence

Consider an scenario where a $skip is followed by $skip, then both $skip stages can coalesce into a single $skip 

{ $skip: 1 },
{ $skip: 7 }

The optimizer coalesce into a single $skip where the skip amount 8 is the sum of the two initial limits 1 and 7

{ $skip: 8}
$match + $match Coalescence

Consider an scenario where a $match is followed by $match, then both $match stages can coalesce into a single $match with an $and

{ $match: { course: "Data Science" } },
{ $match: { status: "Enrolled" } }

The optimizer coalesce into a single $match with an $and

{ $match: { $and: [ { course: "Data Science" }, { "status" : "Enrolled" } ] } }
$lookup + $unwind Coalescence

Consider an example where a $unwind is followed by $lookup and operates on the as field of the $lookup, then optimizer coalesce the $unwind into the $lookup stage to avoid large document creation

{
$lookup: {
from: "FacultyCollectionData",
as: "resultingArray",
localField: "StudentCourse",
foreignField: "AssignedFaculty"
}
},
{ $unwind: "$resultingArray"}

The optimizer coalesce the $unwind into the $lookup stage

{
$lookup: {
from: "FacultyCollectionData",
as: "resultingArray",
localField: "StudentCourse",
foreignField: "AssignedFaculty",
unwinding: { preserveNullAndEmptyArrays: false }
}
}
$sort + $skip + $limit Sequence

Consider a pipeline contains a sequence of $sort followed by a $skip followed by a $limit

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

Then optimizer will transforms it into:

{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(15)
//MongoDB increases the $limit amount with the $skip amount when uses reordering

}
},
{
"$skip" : NumberLong(10)
}

 

 

One thought on “MongoDB Aggregation Pipeline Optimization

Comments are closed.