jayjayswal
10/7/2019 - 10:31 AM

Cheat Code - aggregation

//For the aggregation in MongoDB, you should use aggregate() method.
db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)


select by_user, count(*) from mycol group by by_user
db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])


db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
//Inserts the value to an array in the resulting document.
db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
//Inserts the value to an array in the resulting document but does not create duplicates.	
db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
// Gets the first document from the source documents according to the grouping. 
// Typically this makes only sense together with some previously applied “$sort”-stage.
db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
// Gets the last document from the source documents according to the grouping. 
// Typically this makes only sense together with some previously applied “$sort”-stage.
db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])
Pipeline Concept
In UNIX command, shell pipeline means the possibility to execute an operation on some input 
and use the output as the input for the next command and so on. 
MongoDB also supports same concept in aggregation framework. 
There is a set of possible stages and each of those is taken as a set of documents as an input and 
produces a resulting set of documents 
(or the final resulting JSON document at the end of the pipeline). 
This can then in turn be used for the next stage and so on.

Following are the possible stages in aggregation framework −
$project − Used to select some specific fields from a collection.
$match − This is a filtering operation and thus this can reduce the amount of documents that are given as input to the next stage.
$group − This does the actual aggregation as discussed above.
$sort − Sorts the documents.
$skip − With this, it is possible to skip forward in the list of documents for a given amount of documents.
$limit − This limits the amount of documents to look at, by the given number starting from the current positions.
$unwind − This is used to unwind document that are using arrays. When using an array, the data is kind of pre-joined and this operation will be undone with this to have individual documents again. Thus with this stage we will increase the amount of documents for the next stage.
//As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations. MapReduce is generally used for processing large data sets.

db.collection.mapReduce(
   function() {emit(key,value);},  //map function
   function(key,values) {return reduceFunction}, {   //reduce function
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)

// map is a javascript function that maps a value with a key and emits a key-value pair
// reduce is a javascript function that reduces or groups all the documents having the same key
// out specifies the location of the map-reduce query result
// query specifies the optional selection criteria for selecting documents
// sort specifies the optional sort criteria
// limit specifies the optional maximum number of documents to be returned


{
   "post_text": "tutorialspoint is an awesome website for tutorials",
   "user_name": "mark",
   "status":"active"
}

// Now, we will use a mapReduce function on our posts collection to select all the active posts, group them on the basis of user_name and then count the number of posts by each user using the following code −
db.posts.mapReduce( 
   function() { emit(this.user_id,1); }, 
	
   function(key, values) {return Array.sum(values)}, {  
      query:{status:"active"},  
      out:"post_total" 
   }
)

db.posts.mapReduce( 
   function() { emit(this.user_id,1); }, 
   function(key, values) {return Array.sum(values)}, {  
      query:{status:"active"},  
      out:"post_total" 
   }
	
).find()