MongoDB Aggregation Pipeline bucket & bucketAuto Stage

The The $bucket  and $bucketAuto stage in the aggregate method in the MongoDB Aggregation Pipeline to allow the incoming document of a collection to be categorized into groups called buckets. 

Point to Consider for $bucket Stage

  • The incoming document of a collection to be categorized into buckets
  • Each document in the bucket is applied with the groupby expression , specified by boundaries in the bucket
  • A default value is specified when the documents in the bucket having groupBy values outside of the boundaries
  • A default value is specified when the documents in the bucket having different BSON type than the values in boundaries
  • The buckets arranges the input document using $sort if the groupBy expression resolves to an array or a document
  • At least one document should be placed to form bucket. 
$bucket Syntax
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
$bucket document fields

The $bucket document contains the below given fields:

Field  Type Description
groupBy expression expressions are used to groupby the documents. Each input document either should be groupby using expression or document value should be specified with the boundaries range
Boundaries Array

indicates an array of values (values must be in ascending order and all of the same type) based on the groupBy expression that specify the boundaries for each bucket. The adjacent pair of values indicates lower boundary and upper boundary for the bucket.
Example for boundary with lover and upper ranges
An array of [ 4, 8, 15 ] creates two buckets:

[4,8) with inclusive lower bound 4 and exclusive upper bound 8.
[8, 15) with inclusive lower bound 8 and exclusive upper bound 15

default Literal

indicates the _id of an additional bucket that contains all documents whose groupBy expression result does not fall into a bucket specified by boundaries.

The default value can be of a different type than boundaries types and should be less than the lowest boundaries value, or greater than or equal to the highest boundaries value

output Document

indicates the fields to include in the output documents with _id field (in case of default value) using accumulator expressions

<outputfield1>: { <accumulator>: <expression1> },

<outputfieldN>: { <accumulator>: <expressionN> }

$bucketAuto Stage:

The $bucketAuto is similar to $bucket for grouping the incoming document but in the $bucketAuto it automatically determines the bucket boundaries to evenly distribute the documents into the specified number of buckets.

Point to Consider for $bucketAuto Stage

  • Allows for grouping the incoming document 
  • Automatically determines the bucket boundaries to evenly distribute the documents into the specified number of buckets
  • The _id.min field indicates the inclusive lower bound for the bucket
  • The _id.max field indicates the exclusive upper bound for the bucket
  • The final bucket in the series will have inclusive upper bound
  • The count field that contains the number of documents in the bucket

$bucketAuto Syntax

 {
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
granularity: <string>
}
}

The $bucketAuto contains the below given fields:

Field  Type Description
groupBy expression used to groupBy the incoming documents 
Buckets integer 32-bit integer that indicates the bucket count into which input documents are grouped
Output document

indicates the fields in the output documents with _id field by using accumulator expressions. The default count is to be added explicitly in the output document.

output: {
<outputfield1>: { <accumulator>: <expression1> },

count: { $sum: 1 }
}

Granularity string

indicates the preferred number series to calcualte the boundary edges end on preferred round numbers or their powers of 10. It can be applicable if the all groupBy values are numeric and none of them are NaN.

Supported value for granularity are:

“R5”
“R10”
“R20”
“R40”
“R80”
“1-2-5”
“E6”
“E12”
“E24”
“E48”
“E96”
“E192”
“POWERSOF2”

The documents are ordered using $sort before determining the bucket boundaries if the groupBy expression refers to an array or document

 


MongoDB Aggregation Pipeline addFields Stage

The $addFields stage is one of the stages to be used in the MongoDB Aggregation Pipeline stages. The $addFields stage  allows to add new fields in the document. The generated output document  contains the existing fields and new fields added using $addFields stage

Point to Consider for $addFields Stage:

  • $addFields appends new fields to existing documents
  • An aggregation operation can include one or more $addFields stages
  • $addFields can be added to embedded documents having arrays using dot notation
  • $addFields can be added to an existing array field using $concatArrays

$addFields Syntax: 

{ $addFields: { <newField1>: <expression1>, <newField2>: <expression2>,... } }
Aggregation Pipeline with $addFields  example

Lets consider a collection – studentMarks with the below given documents. 

studentMarks
{
_id: 1,
subject: "Computer Science",
student: "Mohit Sharma",
assignment: [ 14, 17 ],
test: [ 18, 12 ],
extraCredit: 15
}
{
_id: 2,
subject: "Computer Science",
student: "Rohan Kapoor",
assignment: [ 18,16 ],
test: [ 14,16 ],
extraCredit: 14
}

We need to add 3 new $addFields as assignmentTotal , testTotal, creditTotal to be added in the output document.

db.studentMarks.aggregate( [
{
$addFields: {
assignmentTotal: { $sum: "$assignment" } ,
testTotal: { $sum: "$test" }
}
},
{
$addFields: { totalMarks:
{ $add: [ "$assignmentTotal", "$testTotal", "$extraCredit" ] } }
}
] )

The operation returns the output documents which includes the 3 new fields 

{
"_id": 1,
"subject": "Computer Science",
"student": "Mohit Sharma",
"assignment": [ 14, 17 ],
"test": [ 18, 12 ],
"extraCredit": 15
"assignmentTotal" : 31,
"testTotal" : 30,
"totalMarks" : 76
}
{
"_id": 2,
"subject": "Computer Science",
"student": "Rohan Kapoor",
"assignment": [ 18,16 ],
"test": [ 14,16 ],
"extraCredit": 14,
"assignmentTotal" : 34,
"testTotal" : 30,
"totalMarks" : 78
}
Adding Fields to an Embedded Document

Embedded documents can be added with new fields using dot notation.  Consider the below example for carModels with the provided fields in the document

{ _id: 1,
model: "Ford",
specs: { capacity: 5, wheels: 4 , doors:4}
}

{ _id: 2,
model: "Toyota",
specs: { capacity: 5, wheels: 2 , doors: 2 }
}

Add the new field gear into the embedded documents

db.carModels.aggregate( [
{
$addFields: {
"specs.gear": "automatic"
}
}
] )

The Aggregation operation includes the new field : gear in the output document

{ _id: 1,
model: "Ford",
specs: { capacity: 5, wheels: 4 , doors:4, gear: "automatic"}
}

{ _id: 2,
model: "Toyota",
specs: { capacity: 5, wheels: 2 , doors: 2 , gear: "automatic"}
}
Overwriting an existing field

If $addFields includes the existing field then the value provided in the $addField will replace the existing field value. Consider the below record for collection studentBranch

{ _id: 1, name: "Mohit Sharma", batch: "Computer Science" }

The $addField includes name as ‘John Smith” 

db.studentBranch.aggregate( [
  {
    $addFields: { "branch": "Java Programming" }
  }
] )

Then the aggregation operation changes the branch value for the student

{ _id: 1, name: "Mohit Sharma", batch: "Java Programming" }
Add $addField to an Array 

$addFields allow to add new element into an Array using the $concatArrays

 $concatArrays returns the concatenated array as the result

{ $concatArrays: [ <array1>, <array2>, ... ] }

consider the collection item with the below given documents

{ "_id" : 1, item: [ "icecream" ], type: [ "butterscotch", "strawberry" ] }
{ "_id" : 2, item: [ "shakes"] , type: ["apple", "banana" ] }

Add new element to type  as “chocolate”

 db.items.aggregate([
{ $match: { _id: 1 } },
{ $addFields: { type: { $concatArrays: [ "$type", [ "chocolate" ] ] } } }
])

The aggregation operation includes “chocolate” as type

{ "_id" : 1, item: [ "icecream" ], type: [ "butterscotch", "strawberry" , "chocolate"] }
{ "_id" : 2, item: [ "shakes"] , type: ["apple", "banana" ] }

$set stage:  The $set is an alias for $addFields 

{ $set: { <newField1>: <expression1>, <newField2>: <expression2>,... } }

Point to Consider for $set Stage:

  • $set appends new fields to existing documents
  • An aggregation operation can include one or more $set stages
  • $set can be added to embedded documents having arrays using dot notation
  • $set can be added to an existing array field using $concatArrays