CRUD Operations in MongoDB Using Python

Publicado el - Última modificación el

MongoDB is a NoSQL-like document store database, developed and open sourced by 10gen. The initial release of the database server was in 2007 and it was made open source in 2009. When MongoDB started to spread, its biggest advantage was that it had a schema-less object structure, which stored JSON-like object structure in their proprietary BSON format.

Many developers say, MongoDB is very good for prototyping and it’s great for small websites, because development is fast. Meanwhile, there are big portals too, which are using MongoDB as their data store, like Foursquare, SourceForge and New York Times (according to Wikipedia).

MongoDB has a huge fan club (at the time of writing the article it’s the fifth most widely adopted database engine with a score of 246.5 listed on http://db-engines.com) and there are many different APIs to use it with; some of the most widely adopted ones are created for Java, .NET, node.js, C++, Ruby and of course Python.

In this article I'll present the CRUD operations using pymongo API, the official Python API for MongoDB. In case you are not familiar with MongoDB, they have a very good online interactive tutorial at http://try.mongodb.org webpage. 

What are CRUD operations?

The acronym CRUD stands for Create, Read, Update and Delete. These operations are considered to be the four basic functionalities of a repository (a.k.a data storage). There are people who extend these basic functionalities with Search, the acronym changes to SCRUD.

CRUD operations can be mapped directly to database operations:

  • Create matches insert
  • Read matches select
  • Update matches update
  • Delete matches delete

CRUD operations with pymongo

Prerequisites

If you want to work with MongoDB, Python and PyMongo, you will have to install all three. Here are the links (default configuration and how to install is explained):

  • MongoDB installation (Ubuntu, MacOSX, Windows)
  • Python – linux distribution and MacOSX comes with python preinstalled, on Windows you will need to install it on your own, here is the download page for all versions
  • PyMongo – you can find the installation page here

The Data Model

Python is a good choice to work with MongoDB, because in Python the dictionary data structure has JSON format and MongoDB also stores JSON-like data, so there is no need for data conversion when storing data to collections (MongoDB collections are the equivalent of tables in relational databases).

The data model which I will use is the Project class:

from bson.objectid import ObjectId

class Project(object):
    """A class for storing Project related information"""

    def __init__(self, project_id=None, title=None, description=None, price=0.0, assigned_to=None):        
        if project_id is None:
            self._id = ObjectId()
        else:
            self._id = project_id
        self.title = title
        self.description = description
        self.price = price
        self.assigned_to = assigned_to

    def get_as_json(self):
        """ Method returns the JSON representation of the Project object, which can be saved to MongoDB """
        return self.__dict__
    

    @staticmethod    
    def build_from_json(json_data):
        """ Method used to build Project objects from JSON data returned from MongoDB """
        if json_data is not None:
            try:                            
                return Project(json_data.get('_id', None),
                    json_data['title'],
                    json_data['description'],
                    json_data['price'],
                    json_data['assigned_to'])
            except KeyError as e:
                raise Exception("Key not found in json_data: {}".format(e.message))
        else:
            raise Exception("No data to create Project from!")

 

The class is very simple, the constructor assigns values to the class attributes. The project as a model (and as a class also) has _id, title, description, price and assigned_to attributes. The _id field is special, it’s used by MongoDB to uniquely identify an entry (document) in a collection. In case the _id field is not added to the document, MongoDB will create an _id and will add it to the structure. The pymongo API has a special python implementation for this structure, called ObjectId. If invoked without any parameters it generates a new identification number.

In python classes the attributes assigned to a class are stored within an internal dictionary, named __dict__:

    def get_as_json(self):
        """ Method returns the JSON representation of the Project object, which can be saved to MongoDB """
        return self.__dict__

So in this case the method get_as_json(self) is a helper method, which returns the __dict__ attribute of the class. Since python dictionaries have JSON representation, the result of get_as_json(self) can be stored directly to MongoBD.

There is a @staticmethod defined in the class. Python’s static methods are basically the same as static methods in any other object oriented programming language, these can be invoked using the class name. The method def build_from_json(json_data) will help to create new instances of Project class when loading data from MongoDB.

Repository with CRUD operations

The class ProjectsRepository implements the CRUD operations using pymongo:

from pymongo import MongoClient
from bson.objectid import ObjectId
from project import Project

class ProjectsRepository(object):
    """ Repository implementing CRUD operations on projects collection in MongoDB """

    def __init__(self):
        # initializing the MongoClient, this helps to 
        # access the MongoDB databases and collections 
        self.client = MongoClient(host='localhost', port=27017)
        self.database = self.client['projects']


    def create(self, project):
        if project is not None:
            self.database.projects.insert(project.get_as_json())            
        else:
            raise Exception("Nothing to save, because project parameter is None")


    def read(self, project_id=None):
        if project_id is None:
            return self.database.projects.find({})
        else:
            return self.database.projects.find({"_id":project_id})


    def update(self, project):
        if project is not None:
            # the save() method updates the document if this has an _id property 
            # which appears in the collection, otherwise it saves the data
            # as a new document in the collection
            self.database.projects.save(project.get_as_json())            
        else:
            raise Exception("Nothing to update, because project parameter is None")


    def delete(self, project):
        if project is not None:
            self.database.projects.remove(project.get_as_json())            
        else:
            raise Exception("Nothing to delete, because project parameter is None")
The MongoClient class from pymongo API helps to create a connection and manage data in the MongoDB database. In the constructor I create a new instance of the MongoClient class, passing in the host and port of the MongoDB server (since mine was installed locally and I used host='localhost' and the port=27017 – these are the predefined values of the MongoClient class, but I added them here so you have an example how can this be customized). The databases from the MongoDB server can be accessed same way as dictionary values are in python, ex: client['projects']. The create(self, project), update(self, project) and delete(self, project) methods receive a parameter of type Project.

The create(self, project) method

Uses the insert() method of the pymongo’s collection API, passing as parameter the JSON representation of the Project class. In case the project parameter does not have any value it raises an Exception. The projects.insert() method in traditional SQL would be insert into projects(_id, title, description, price, assigned_to) values (…).  The insert() method can  raise an OperationFailure error in case there were some errors during save.

 def create(self, project):
        if project is not None:
            self.database.projects.insert(project.get_as_json())            
        else:
            raise Exception("Nothing to save, because project parameter is None")

The read(self, project_id) method

Uses the find() method from pymongo API. This gets a project_id as parameter and queries the database for the project with the given id, otherwise it will return all the items in the database. The projects.find({}), in normal SQL language would be select * from  projects. In case a project_id is available the SQL would be select * from projects where _id=project_id.

 def read(self, project_id=None):
        if project_id is None:
            return self.database.projects.find({})
        else:
            return self.database.projects.find({"_id":project_id})

The update(self, project) method

Uses the save() method from pymongo API.  The save method is special, because it’s behavior depends on the data it gets as parameter. If the passed in JSON contains an _id field, it will look-up the object with that _id in the collection and it will update the fields which were changed. In case the passed in JSON does not have an _id it will insert the value in the collection and that will receive a new _id. The save method can be matched with SQL’s update or insert operations, depending on the scenario.

 def update(self, project):
        if project is not None:
            # the save() method updates the document if this has an _id property 
            # which appears in the collection, otherwise it saves the data
            # as a new document in the collection
            self.database.projects.save(project.get_as_json())            
        else:
            raise Exception("Nothing to update, because project parameter is None")

The delete(self, project) method

Uses the remove() API method from pymongo. Please be attentive when using this method, it’s affect cannot be reverted. In case the remove() method is invoked with empty JSON or without any parameter all the documents from the collection will be deleted. In SQL the remove() method is equal to delete from projects, or in case there is an project_id available delete from projects where _id=project_id.

def delete(self, project):
        if project is not None:
            self.database.projects.remove(project.get_as_json())            
        else:
            raise Exception("Nothing to delete, because project parameter is None")

I created a simple console application which demonstrates how the ProjectsRepository can be used. The code within main.py is executed only when it is launched as a main program, this can be checked using the following if statement:

if __name__ == '__main__':
        # the script is executed as main, do something here

Demo Code

The demo code is simple. I created five methods, four methods for testing the CRUD operations and one main() method which glues the steps together.

from projects_repository import ProjectsRepository
from project import Project


def load_all_items_from_database(repository):
    print("Loading all items from database:")
    projects = repository.read()
    at_least_one_item = False
    for p in projects:
        at_least_one_item = True
        tmp_project = Project.build_from_json(p)
        print("ID = {} | Title = {} | Price = {}".format(tmp_project._id,tmp_project.title, tmp_project.price))
    if not at_least_one_item:
        print("No items in the database")


def test_create(repository, new_project):
    print("\n\nSaving new_project to database")
    repository.create(new_project)
    print("new_project saved to database")
    print("Loading new_project from database")
    db_projects = repository.read(project_id=new_project._id)
    for p in db_projects:
        project_from_db = Project.build_from_json(p)
        print("new_project = {}".format(project_from_db.get_as_json()))


def test_update(repository, new_project):
    print("\n\nUpdating new_project in database")
    repository.update(new_project)
    print("new_project updated in database")
    print("Reloading new_project from database")
    db_projects = repository.read(project_id=new_project._id)
    for p in db_projects:
        project_from_db = Project.build_from_json(p)
        print("new_project = {}".format(project_from_db.get_as_json()))


def test_delete(repository, new_project):
    print("\n\nDeleting new_project to database")
    repository.delete(new_project)
    print("new_project deleted from database")
    print("Trying to reload new_project from database")
    db_projects = repository.read(project_id=new_project._id)
    found = False
    for p in db_projects:
        found = True
        project_from_db = Project.build_from_json(p)
        print("new_project = {}".format(project_from_db.get_as_json()))

    if not found:
        print("Item with id = {} was not found in the database".format(new_project._id))


def main():
    repository = ProjectsRepository()

    #display all items from DB
    load_all_items_from_database(repository)

    #create new_project and read back from database
    new_project = Project.build_from_json({"title":"Wordpress website for Freelancers", 
        "description":"This should be a very simple website, based on wordpress with functionalities for Freelancers", 
        "price":250, 
        "assigned_to":"John Doe"})
    test_create(repository, new_project)

    #update new_project and read back from database
    new_project.price = 350
    test_update(repository, new_project)

    #delete new_project and try to read back from database
    test_delete(repository, new_project)

if __name__ == '__main__':
    main()

The main.py can be executed with: python main.py and the output should be something similar to:

greg@earth ~/ $ python main.py 
Loading all items from database:
ID = 54953f0a8524880d021cd856 | Title = Wordpress website for Freelancers | Price = 250
ID = 54953f408524880d0db9a8a1 | Title = Wordpress website for Freelancers | Price = 250
ID = 54953f788524880d14d4e590 | Title = Wordpress website for Freelancers | Price = 250

Saving new_project to database
new_project saved to database
Loading new_project from database
new_project = {'price': 250, '_id': ObjectId('549546f68524880e86271de3'), 'assigned_to': u'John Doe', 'description': u'This should be a very simple website, based on wordpress with functionalities for Freelancers', 'title': u'Wordpress website for Freelancers'}

Updating new_project to database
new_project updated in database
Reloading new_project from database
new_project = {'price': 350, '_id': ObjectId('549546f68524880e86271de3'), 'assigned_to': u'John Doe', 'description': u'This should be a very simple website, based on wordpress with functionalities for Freelancers', 'title': u'Wordpress website for Freelancers'}

Deleting new_project to database
new_project deleted from database
Trying to reload new_project from database
Item with id = 549546f68524880e86271de3 was not found in the database

Working with pymongo is very easy and very fast, once you get familiar with the basic 6-7 methods of the API you can build very robust and dynamic applications which can serve as backend for webpages or desktop applications.

 

The code can be accessed on GitHub gists:

Publicado 22 diciembre, 2014

Greg Bogdan

Software Engineer, Blogger, Tech Enthusiast

I am a Software Engineer with over 7 years of experience in different domains(ERP, Financial Products and Alerting Systems). My main expertise is .NET, Java, Python and JavaScript. I like technical writing and have good experience in creating tutorials and how to technical articles. I am passionate about technology and I love what I do and I always intend to 100% fulfill the project which I am ...

Siguiente artículo

2015 Content Marketing Trends