indexes

It’s more important to have the index in memory than the data.

To create an index:

db.students.ensureIndex({ student_id: 1 })

To create a multi-key index:

db.students.ensureIndex({ student_id: 1, 'class': 1 })

Inserts that are legit:

db.students.insert({student_id: 1, class: ['a', 'b']})
db.students.insert({student_id: [1, 2], class: 'science'})

Inserts that are not legit with a multi-key index:

db.students.insert({student_id: [1, 2], class: ['a', 'b']})

Creating a unique index

db.students.ensureIndex({student_id:1}, {unique: 1})

removing duplicates when creating unique index: use dropDups

db.students.ensureIndex({student_id:1}, {unique: true, dropDups: true})

Sparse Indexes

used when key is missing from document.

{a:1, b:2, c:5}
{a:2, b:3, c:6}
{a:3, b:4} // c: null
{a:4, b:5} // c: null

sparse index will only create index entries for documents that contain key that is indexed. In the above example, the bottom two document would not be added to the sparse index. So they would never be returned by queries that use the sparse index.

To create a sparse index:

db.things.ensureIndex({c:1}, {unique: true, sparse: true})

Create indexes in the background

By default creating indexes happens in the foreground, but blocks writers. Mongo has a per database lock.

You can run the index creation in the background, but this is slower. You can run one background job at a time.

db.things.ensureIndex({c:1}, {background: true})

Index Stats

db.students.stats()
db.students.totalIndexSize()
db.students.getIndexes()

index cardinality

Providing Hints

Hint help mongo use a specific index for a given query.

db.students.find({a:1,b:2,c:3}).hint({c:1}) // use c index
db.students.find({a:1,b:2,c:3}).hint({$natural: 1}) // no index

efficiency of index use

$lt, $gt, don’t work well * regex only use index when anchored to start. e.g. /^hello/

geospatial indexes

2D model:

db.places.ensureIndex({location: '2d', type: 1})

db.places.find({location: { $near: [x, y] }}).limit(20)

Geospatial Spherical

2D sphere index

db.places.ensureIndex({'location':'2dsphere'})
  
// longitude: -130, latitude: 39
db.stores.find({ 
  location: { 
    $near:{ 
      $geometry: { 
        type: "Point",  
        coordinates: [-130, 39]
      }, 
      $maxDistance:1000000 
    } 
  } 
})

full text search index

Add a text index then use the $text operator.

db.sentences.ensureIndex({words: 'text'})
db.sentences.find({ $text: {$search: 'dog'} })

low slow queries

Automatically logs queries taking longer than 100ms to mongod log.

profiler

write entries to system.profile collection for queries that take longer than a specified time.

Level 0: off Level 1: log slow queries Level 2: log all queries

db.getProfilingLevel()
db.getProfilingStatus()
db.setProfilingLevel(1, 4) // level 1, 4 ms limit
  1. Indexes are critical
  2. Use explain
  3. Use Hints
  4. Use Profiler

mongotop

where is mongo spending it’s time.

mongostat

named after iostat. more systemy level info. idx miss % is the miss rate to the index in memory.

sharding

split large collection amongst servers. single mongos router. mongos routes to several mongod. shard key decides which server to route to.

inserts must have the shard key update, remove, find will broadcast to all shards unless the shardkey is in the query.

aggregation pipeline

To get a count of all the products by each manufacturer.

db.products.aggregate([
  {
    $group: {
      _id: '$manufacturer',
      products_count: { $sum: 1 }
    }
  }
  ])

add to set

add to set to aggregate a set of results from each document in the group by

db.zips.aggregate([
  {
    $group: {
      _id: "$city",
      postal_codes: { $addToSet: '$postal_code' }
    }
  }
  ])

push

similar to add to set. does not guarantee unique items like in a set.

db.products.aggregate([{
    $group: {
      _id: '$manufacturer',
      categories: { $push : '$category' }
    }
  }])

max

returns the max result for each group.

db.products.aggregate([{
    $group: {
      _id: '$manufacturer',
      max_price: { $max : '$price' }
    }
  }])

double grouping

run an aggregation stage more than once.

db.fun.find()
  { \_id: 0, a: 0, b: 0 c: 21 }
  { \_id: 1, a: 0, b: 0 c: 54 }
  { \_id: 2, a: 0, b: 1 c: 52 }
  { \_id: 3, a: 0, b: 1 c: 17 }
  { \_id: 4, a: 1, b: 0 c: 22 }
  { \_id: 5, a: 1, b: 0 c: 5 }
  { \_id: 6, a: 1, b: 1 c: 87 }
  { \_id: 7, a: 1, b: 1 c: 97 }

  db.fun.aggregate([{
      $group: { 
        _id: { a: '$a', b: '$b' },
        c: { $max: '$c' }
      }
    },
    {
      $group: { 
        _id: '$_id.a',
        c: { $min: '$c' }
      }
    }
  ])

$project

reshape a document through pipeline. remove keys, add keys, reshape the keys. use simple functions on keys.

used to clean documents, and eliminate or cherry pick certain keys. used to filter and drop a bunch of data from a query.

db.products.aggregate([
    {
      $project: {
        _id: 0,
        maker: { $toLower: '$manufacturer' },
        details: { 
          category: '$category', 
          price: { '$multiply' ['$price', 10] }
        },
        item:  '$name'
      }
    }
  ])

$match

performs filtering. n to 1 operation. see if each doc matches the criteria. if it does it will move to next stage in pipeline.

db.zips.aggregate([
  {
    $match: {
      state: 'CA'
    }
  },
  {
    $group: {
      _id: '$city',
      population: { $sum: '$pop' },
      zip_codes: { $addToSet: "$_id" }
    }
  }
])

$sort

disk and memory based sort. limit of 100MB in memory sort. before or after the grouping stage.

db.zips.aggregate([{
  $sort: {
    population: -1 //descending
  }
}])

$skip and $limit

need to sort first, then skip, then limit.

db.zips.aggregate([
    { $sort: { city: 1 } },
    { $skip: 10 },
    { $limit: 10 }
  ])

$first and $last

get first or last value in each group as the aggregate pipeline processes.

db.zips.aggregate([
{
  $group: { 
    _id: { state: '$state', city: '$state' },
    population: { $sum: '$pop' }
  }
},
{
  $sort: { "_id.state": 1, population: -1 }
},
{
  $group: {
    _id: '$_id.state',
    city: { $first: '$_id.city' },
    population: { $first: '$population' }
  }
},
{
  $sort: { "\_id": 1 }
}
])

$unwind

unwind to un-join data, then re-join in a different way.

{ a:1, b:2, c: ['apple', 'pear', 'orange'] }
  // unwind '$c'
  { a:1, b:2, c: 'apple'}
  { a:1, b:2, c: 'pear'}
  { a:1, b:2, c: 'orange'}

limitations

db.posts.aggregate([ { $project: { comments: 1 } }, { $unwind: ‘$comments’ }, { $group: { _id: ‘$author’, comment_count: { $sum: 1 } } }, { $limit: 1 } ])

comments powered by Disqus