Along my journey of learning to use MongoDB, I faced many small difficulties, many of which I have just learned how to “solve” after talking to someone from the Mongo team directly. So, here I am trying to write down some of the things I wish to learn before when I have just gotten in touch with this database.

Indexes

One of the most important topics when dealing with a non-relational database is the creation of indexes that fill your needs and do not encumber your server. A bad index creation can lead to poor database performance, cost you more than expected, and make you lose some hours debugging some poor query that should work properly but isn’t.

On MongoDB, indexes are always read from left to right. What does it mean? Well, as a short answer, it acts like a binary tree. The more left the parameter is, the more in the basis of the tree it is.

Also, keep in mind that the field of the index should be an attribute with the most unique/distinct value. This affects the querying performance.

If I create an index with 5 attributes entity, I should have in mind that the further left parameter should be the more embracing one. Here’s an example:

class MyDocumentEntity {
name: string;
establishment: string;
client: string;
size: string;
extension: string;
}

 

class MyDocumentEntity {
name: string;
establishment: string;
client: string;
size: string;
extension: string;
}

And here’s my index in MongoDB:

Index in Atlas

If I run a query like so:

{ name: 'any', establishment: 'any', client: 'any', size: 'any', extension: 'any' }

The result should given in a matter of milliseconds. Even if I remove size and extension from the query, the server should return the results in milliseconds as well; But If I remove just the size parameter from the query, the index should not be used and lead to an unexpected performance. This is because Mongo will use the results up to the parameter order defined in the index and the rest it will be read to verify if matches the query. As a second example, my query is the following:

{ name: 'any', establishment: 'any', extension: 'any' }

When executing the query, Mongo will filter documents that match the name and establishment as fast as in the blink of an eye, but for the extension filter, Mongo will run through every document that matches the first two parameters and check if the extension matches; With a small collection, this could not be a problem at the first seen but is certainly an exponential problem (imagine a collection with 8M+ documents and making Mongo run through half of the number — I guess you got my point).

Ah, by creating a nice index, you can remove or adjust some indexes to use the same and not lose performance by any means.

Additionally, when I was talking to a Mongo consultant, he explained to me that we need to take care of the number of indexes that are been created and then gave us these magical numbers:

  • bellow 10 indexes you’ll be fine;

  • 10 to 20 indexes you can create, but take a look at the existing ones and try to group them into one;
    above 20 indexes you must delete some of them.

  • If you create a lot of indexes, you are going to affect the writing operations of your application.

As a bonus tip from this topic: if you use the Mongo Atlas, you can check the “Performance Advisor” tab to check for suggestions such as index creations, index drops, and schema improvements.

Aggregation pipelines

Querying on MongoDB seems to be very easy and indeed it is. But, when if you are looking forward to performing more complex queries you might need to use aggregation pipelines. Pipelines are easy to use although sometimes can be very tricky, so here are some of the things that I’ve learned how to build a nice aggregation pipeline with no problems.

One of the most used/common stages is the search operator.

{
index: 'collection',
text: {
query: 'searched-value',
path: 'which-fields-should-be-used-in-the-search'
}
}

Using this operator allows us to perform textual, autocomplete, numeric and so many other types of searches. Do it so, you can directly use one of those operators, like so:

{
  $search: {
    "index": "collection"
    "text": {
      "query": "value-searched",
      "path": "field-to-search"
    }
  }
}

But if your needs are to perform a search in many fields with many different types of value (Eg.: search for some value in a numeric and a textual field) you must use the compound operator which allows us to create a combination of searches in one take. For example, to perform a search combining a numeric, a textual, and an autocomplete search:

{
  "$search": {
    "index": "collection",
    "compound": {
      "filter": [
        {
          "text": {
            "query": "some-id",
            "path": "tenant"
          }
        }
      ],
      "should": [
        {
          "text": {
            "query": "banana",
            "path": {
              "wildcard": "*"
            }
          }
        },
        {
          "autocomplete": {
            "query": "banana",
            "path": "my-field"
          }
        },
        {
          "near": {
            "path": "value",
            "origin": 100,
            "pivot": 2
          }
        }
      ],
      "minimumShouldMatch": 1
    }
  }

Explaining what is happening in this query:
The compound accepts must, mustNot, should and filter. The used one is should which you use in clauses that you prefer to match.

Filter allows to perform a query before executing the search. This one is extremely important to reduce the number of documents that Mongo needs to run through and check if match with the search. A well-designed filter can increase drastically the performance of your query.

The minimumShouldMatch parameter in MongoDB's text search allows you to control the minimum number of terms from the query that must match a document for it to be included in the results. This is useful for adjusting the precision of your search.

And what’s the difference between a text search and an autocomplete search? Well, a text search focuses on the entire word match, and autocomplete focuses on return results that match partially with the given word.

For the numeric search, it was used the near operator witch returns documents based on a calculated score. Sometimes it can lead to some unexpected results, documents that are not even close to the search number. To solve that you might need to adjust your pipeline to only return documents with equal or higher scores from a threshold value (results with a score below this value will be ignored in the result). Here’s an example:

[
   {
      "$search":{
         "index":"your-collection",
         "compound":{
            "should":[
               {
                  "near":{
                     "path":"value",
                     "origin":5000,
                     "pivot":1
                  }
               }
            ]
         }
      }
   },
   {
      "$project":{
         "_id":1,
         "score":{
            "$meta":"searchScore"
         }
      }
   },
   {
      "$match":{
         "score":{
            "$gte":0.5
         }
      }
   }
]

Count documents

If you are developing some table and just need to know how many results are there in your collection, try to use estimatedDocumentCount instead of the count operator. As described in the docs: does not take a query filter and instead uses metadata to return the count for a collection

For my usage understating, this estimatedDocumentCount just gets the number of documents from collection metadata, which is updated regularly (is not real-time).

Get unique values

As an addition to the previous topic, if you need distinct values from a collection try the distinct operator. I was able to lower the server tier just by changing some queries in the application that had been developed, so it makes a difference here. Here’s an example:

db.products.distinct("someField", { key: "value" }) // Second parameter are some conditions to be matched

Mongo Atlas

Atlas offers a lot of tools to help you with your daily development, As I commented earlier, to check the “Performance Advisor” tab for suggestions for indexes, there are two more tabs that are extremely useful:

  • Query Insights:

A series of graphs and tables for a better understanding of executed queries and a visual outcome of outliers queries. Including the specs for each query.

  • Real-Time

If you are debugging some slow behaviors in your application, this page is useful for you because you can see in real-time which queries are running and which ones are taking more time than expected.

 

Conclusion

In this article, we explored the elements of MongoDB database such as database optimization, focusing on indexing, aggregation pipelines, counting documents, and the use of Atlas. By mastering and properly applying these strategies you can greatly improve the efficiency and scalability of your MongoDB applications.

Also, remember that database optimization is an ongoing process, so continuously monitor your application’s performance, analyze query logs, and make adjustments as needed.

Well, that’s it! These were some of the things that I wished to know earlier when I was first timing using MongoDB.

Hope to help someone =)

Written by: Karl Harder / MongoDB Enthusiast and Developer

Let's team up!

Tarmac partners with Databricks to redefine data engineering and AI workflows.
Find out more Close Button