Basics of aggregation queries in MongoDB?

MongoDB provides a wonderful feature to combine many queries to get combined data. That is aggregations queries.

Why to use these aggregation queries ?

Basic CRUD operations can give all the data with specified filter from collection, but what if we want data which satisfies some conditions. We can get the data and then query these conditions in whatever language we are using in code. But this increase response time from DB as a lot of data need to be transferred over network and all that data further need to go through the required computation on local. Instead we can make aggregation queries to make all the computation in Mongo servers and fetch required data only. This way we can reduce the load on application server.

How to use aggregation queries?

Basically there are three types of aggregation queries

  • Pipeline Queries
  • Single Queries
  • MapReduce Queries

Pipeline queires are like a set of operations/stages arranged in a pipeline kind of mechanism. All the queries in the pipeline executes one after the other utilising the data generated from the previous operations/stages.

Single Queries are kind of pre-defined set of functions in MongoDB which we can directly use on a collection to get the data.

MapReduce is a way we can execute MapReduce technique on MongoDB collections.

Here I will focus on Pipeline queries. I will be using the MongoDB Compass tool for generating all these queries.

I have a database of student. This database has three collections, Students details collection students_db , student exam result with marks and ranks in two subjects subject_1_db . Collection students_db has headers name, age, gender and roll_num for id. Collection subject_1_db has student name, roll_num, marks, rank, gender but some students didn’t take the exam, some students didn’t fill the gender columns.

So lets say I want to get the count of all the students.

I can get this data from students_db collection. So my base collection is fixed.

Now I open my base collection and Navigate to aggregation section in Mongo Compass.

Here I add a stage and from the options that I can use in Mongo Queries I select count and select the field roll_num/name which give the count of the occurance of that particular field in the whole collection documents. This will be our students count. We can directly see the data on the right side of query. Here I have a total of 14 students.

Now I want to seperate all the students with age 12. It is similar to filter in mongo queries. Here we use something called match for that.

I add a stage, select match and enter “age : 12” key value pair in the parameters needed. Take care about the datatype of value you check for when entering data. We can see the data with on the right hand side.

Now I want to perform another operation on the data I got from previous query. I want to get only name and gender of all the data as I already know the age from my query.

I add another stage after my query and select project for give name and gender to be projected.

Why do I see _id even though I restricted to name and gender ?

Basically _id is something that is used to distinguish different documents in MongoDB, so if you don’t want to get _id in your data then you need to specifically mention it in your project statement. So now I add “_id : 0” to my project statement.

OK, enough with introductions, let’s go to exam marks collections.

Now I want to know all those students who scored more than 20 marks in subject_1.

OK, I select match and then give the condition that marks should be grater or equal to 20. But some data don’t have gender, and there is no age in the data, but we have this data in students_db collection.

Normally I would get the students_db data to some variable in code and then subject_1 data to another variable, check which data should be mapped to which one and finally make a combined data in the format I need with all the data needed.

MongoDB Aggregation tools make all this simple for us. We have something called lookup which is similar to JOIN in SQL Tables.

Now I select lookup, give the name of the collection which we should use for lookup, select local variable and foreign variable which should match and what should the data be named as. This will give an array of documents that matches the field specified.

Hey wait but all I need is just the age and correct gender data. Why am I getting all this ?

Because we didn’t specify what we want, basically no project statement/ operation is specified on the new collection that we are performing the lookup operation on.

How do I specify what exactly I want?

One way is to use project again on whatever data we get from that collection, but it is like adding another stage, we are getting extra data and then removing it. Instead we can do a pipeline as a operation in lookup, yup , a pipeline inside a pipeline. We use let key word to create a variable and store the data value we need to match with, then mention that we are using a pipeline with the keyword in parameters, value will be a list of all the instructions that we need to give to get the data.

We might need to add this manually as we don’t have select from a menu.

OK, why do you see ‘$’ a lot?

It is to specify that we are using the inbuilt functionality of the Mongo .

Why is the structure of match statement different ?

We are using a pipeline inside a pipeline, so the scope in which we apply the mongo filter checks need to change, for that we use expr .

I am able to see the data I need perfectly, nice.

How do I convert all this complex structure into query to use in my application?.

Mongo Compass provides a very good tool for this also. Just click on the convert button on right of save and you can see the aggregated query statement and the converted form of that query in the language that you select from the drop-down menu. We can also get the import statements and driver syntax from here easily.

We have many more things in Mongo Aggregations we can use to get the data perfectly structured. Please refer to this link to read more about Mongo Aggregation Queries.