Innovate anywhere, anytime withruncode.io Your cloud-based dev studio.
Django

Django - Database access optimization

2023-06-17

Django Queryset is generally lazy in nature. It will not hit the database until it  evaluates the query results.

Example:

queryset = User.objects.all() # It won't hit the database
print (queryset)  # Now, ORM turns the query into raw sql  and fetches results from the database

1.Caching and QuerySets

Generally Django stores the query results in the catche when it fetches the results for the first time.

Example: Get all users first names list  and emails list from the database.

first_names_list = [ user.first_name for user in User.objects.all()]
emails_list = [ user.email for user in User.objects.all()]

above code hits the database twice. To avoid the extra request to the database we can use the Django's database cache in the following way.

users_list = User.objects.all()    # No database activity
first_names_list =  [user.first_name for user in users_list]    # Hits the database and stores the results in the cache
emails_list = [user.email for user in users_list]    # uses the results from the cache.

other way is

users_list = User.objects.all()
db = dict(users_list.values_list("first_name", "email"))
first_names_list, emails_list = db.keys(), db.values()

Note: Querysets are not cached if query is not evaluated.

Eaxmple: If you want to take a subset/part of the query results

queryset = Users.objects.all()
first_five = queryset[:5]    # Hits the database
first_ten = queryset[:10]    # Hits the database

If you want to use the cache in the above situation you can do it in the following way

queryset = Users.objects.all()
bool(queryset)  # queryset is evaluated  and results are cached
first_five = queryset[:5]    #  uses the cache
first_ten = queryset[:10]    # uses the cache

2. Complex Database Queries with "Q" objects

Django Queries uses "AND" caluse to fetch results when using filter.

Example:

User.objects.filter(email__contains="adam", first_name="Adam")
# equivalent SQL Qurery is
SELECT * FROM user_table WHERE email LIKE '%adam%'  AND first_name=Adam;

If you want to filter the users  email starts with "an" or "sa"

Simple Example:

users_list = User.objects.all()
filtered_users = []
for user in users_list:
    if user.email.startswith("an")  or  user.email.startswith("sa"):
        filtered_users.append(user)

In above example it gets all the records from the table. Insted, we can fetch the records that we required by using the "Q" object.

Example:

from django.db.models import Q
users_list = User.objects.filter(Q(email__startswith="an") | Q(email__startswith="sa"))
# equivalent SQL Qurery is
SELECT * FROM user_table WHERE email LIKE 'an%' OR email LIKE 'sa%'

Q object usage:

~Q(email__startswith="an")  # email don't starts with "an" 
# SQL  equivalent
|  = OR
& = AND
~ = NO

We can use parenthesis with "Q" object

Example:

Model.objects.filter((Q(key1="value1") & ~Q(key2="value2")) | (~Q(key3="value"))

3. Create Multiple Objects at once with Bulk Create

Simple Example: To create 4 users

users_details = [
        ("Dian", "[email protected]"),
        ("Ravi", "[email protected]"),
        ("Santa", "[email protected]"),
        ("Shera", "[email protected]")
]
for first_name, email in users_details:
     User.objects.create(first_name=first_name, email=email)

Above example hits/request the database 4 times to create 4 users, but we can create 4 users with single database hit/request.

Example:

instance_list = [User(first_name=first_name, email=email) for first_name, email in users_details]
User.objects.bulk_create(instance_list)

4. Update Multiple Objects or a filtered Queryset at once with update

Let us consider Employee model as

from django.db import models

DESIGNATIONS = (("Designer", "Designer"),
                ("Developer", "Developer"),
                ("HR", "HR"))

class Employee(models.Model):
    name = models.CharField(max_length=30)
    email = models.EmailField(max_length=30)
    designation = models.CharField(max_length=30, choices=DESIGNATIONS)
    salary = models.DecimalField(default=20000)

After one year manager wants to increase the salary for Developers at an amount of 5000

SimpleExample:

developers = Employee.objects.filter(designation="Developer")
for developer in developers:
    developer.salary = developer.salary + 5000
    develope
r.save()

Above example hits the database for several times[i.e number of developers]. we can do this with a single database hit/request.

Example:

from django.db.models import F

amount = 5000
developers = Employee.objects.filter(designation="Developer")
developers.update(salary=F("salary")+amount)

5. Select only required fields in the query from the database to decrease querying time 

We use this when we need only the data[ie. fields values]  we dont get access to the model functions and relational objects. We can do this by using QuerySet.values() and QuerySet.values_list()

QuerySet.values() returns the list of dictionaries. Each dictionary represents an instance of object.

QuerySet.values_list() returns the list of tuples. Each tutple represents an instance of object. order of values in the tuple is id followed by the fields in the model.

from .models import Employee
# create 2 objects
Employee.objects.create(name="Ravi", email="[email protected]", designation="Developer", salary=30000)
Employee.objects.create(name="Santa", email="[email protected]", designation="Designer", salary=40000)

queryset = Employee.objects.all()    # you can also use filter 
# Example for  QuerySet.values()
users_dict_list = queryset.values()
# above line is equivalent to
users_dict_list = [
    {
        "id": 1,
        "name": "Ravi",
        "email": "[email protected]",
        "designation": "Developer",
        "salary": 30000
     },
    {
        "id": 2,
        "name": "Santa",
        "email": "[email protected]",
        "designation": "Designer",
        "salary": 40000
     },
]
# To  get only required fields data ie.  "name", "salary"
users_dict_list = queryset.values("name", "salary")
# above line is equivalent to
users_dict_list = [
    {
        "name": "Ravi",
        "salary": 30000
     },
    {
        "name": "Santa",
        "salary": 40000
     },
]
# Example for  QuerySet.values_list()
users_tuple_list = queryset.values_list()
# above line is equivalent to
users_tuple_list = [
     (1, "Ravi", "[email protected]", "Developer", 30000),
     (2, "Santa", "[email protected]", "Designer", 40000),
]
# To  get only required fields data ie.  "name", "salary"
users_tuple_list = queryset.values_list("name", "salary")
# above line is equivalent to
users_tuple_list = [
     ("Ravi", 30000),
     ("Santharao", 40000),
]
# We can also get list of values of a single field in the model by setting "flat=True"
users_names_list = queryset.values_list("name", flat=True)     # it works only for single field
# above line is equivalent to
users_names_list = ["Ravi", "Santa"]

Don't hit/request the database for related objects. Fetch all related objects in a single query using select_related and prefetch_related Let us consider the following models

from django.db import models

class Address(models.Model):
       city = models.CharField(max_length=100)
       state =  models.CharField(max_length=100)
       pin =  models.CharField(max_length=100)

class Person(models.Model):
       name = models.CharField(max_length=100)
       email = models.EmailField(max_length=100)
       present_address = models.ForeignKey(Address)
       previous_address = models.ForeignKey(Address)

class Book(models.Model):
       name = models.CharField(max_length=100)
       author = models.ForeignKey(Person)
       publishers = models.ManyToManyField(Person)

usage of select_related

# without select related
person = Person.objects.get(id=1)
present_address = person.present_address  # Hits the database.
previous_address = person.previous_address  # Hits the database.
# total database hits = 3
# with select related
person = Person.objects.select_related().get(id=1)
present_address = person.present_address # Doesn't hit the database.
previous_address = person.previous_address # Doesn't hit the database.
# total database hits = 1
# you can also select the specific related objects
person = Person.objects.select_related("present_address").get(id=1)
present_address = person.present_address # Doesn't hit the database.
previous_address = person.previous_address # Hits the database.

Limitaions of select_related

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

Usage of prefetch_related

# without prefetch related
book = Book.objects.get(id=1)
author = book.author # Hits the database.
publishers = book # Hits the database.
# total database hits = 3
# with prefetch related
book = Book.objects.prefetch_related().get(id=1)
author = book.author  # Doesn't hit the database.
publishers = book.publishers.all() # Doesn't hit the database.
# total database hits = 1
# you can also select the specific related objects
book = Book.objects.prefetch_related("publishers").get(id=1)
author = book.author  # Doesn't hit the database.
publishers = book.publishers.all() # Hits the database.

Advantage over select_related

prefetch_related does a separate lookup for each relationship, and does the ‘joining’ in Python. prefetch_related allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey

Limitaions of prefetch_related

prefetching of related objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.

6. use queryset.count() if you require only count but not queryset objects

count = Book.objects.filter(author_id=5).count()   # It returns the count(number of objects/records) only. So, operation is very fast.

7. use queryset.exists() if you want to know whether objects exists or not only

is_exists = Book.objects.filter(author_id=5).exists()   # It returns the boolean value(True/False) only. So, operation is very fast.

8. Provide index to fields in database models & provide default ordering

If one of the models accessed very freequently use the indexes to the database tables.

class Book(models.Model):
       name = models.CharField(max_length=100, db_index=True)
       author = models.ForeignKey(Person)
       published_on =  models.DateField()
       publishers = models.ManyToManyField(Person)

      class Meta:
           index_together = ["author", "published_on"]
           ordering = ['-published_on']