Django search with haystack and whoosh

Reading Time : ~ .

Whoosh

Haystack is a Django plugin to allow text search, while Whoosh is a pure Python search backend and it provides a library of classes and functions for indexing text and then searching the index. It allows you to develop custom search engines for your content.

First I installed Whoosh and Haystack using pip.

sudo pip install Whoosh

sudo pip install django-haystack
then in settings.py, I added haystack application, inproject application 
HAYSTACK_SITECONF = 'drumcoder.search_sites'

HAYSTACK_SEARCH_ENGINE = 'whoosh'

HAYSTACK_WHOOSH_PATH = '/home/user/web/drumcoder/index.whoosh'

INSTALLED_APPS = (

    #...

    'haystack',

    'blog',

    #...

)
 
The HAYSTACK_WHOOSH_PATH is the location at which the index files will be stored.
search_sites.py

Next, inside my Django site, at the same level as settings.py, I created a new search_sites.py:

import haystack
haystack.autodiscover()

This causes haystack to look for search_indexes.py inside each app on the Django site.

blog/search_indexes.py

This fill maps the Django model object to the search engine.

import datetime

from haystack.indexes import *

from haystack import site

from drumcoder.blog.models import BlogEntry

class BlogEntryIndex(SearchIndex):

    text = CharField(document=True, use_template=True)

   def get_queryset(self):

        """

        This is used when the entire index for model is updated, and should only include

        public entries

        """

        return BlogEntry.objects.filter(publication_date__lte=datetime.datetime.now(), status=BlogEntry.LIVE_STATUS)

site.register(BlogEntry, BlogEntryIndex)

This is a similar mechanism to the admin.py setup. The key thing here is the text attribute, which will contain the text to index. We have use_template=True here, so the tempalte contains text that we wish to index.

templates/search/indexes/blog/blogentry_text.txt

This file can be placed anywhere in the template directories for your site, but it needs to be inside a folder named after the application (we're dealing with BlogEntry objects, which are in the Blog app), and the model object (BlogEntry). So in our case, we will need a blogentry_text.txt template inside blog, inside indexes, inside search.

`-- templates

    |-- index.html

    `-- search

        |-- indexes

        |   `-- proj(app name)

        |       `-- new_text.txt

        `-- search.html

text file will contain the following items

{{ object.title }}

{{ object.body }}

Here we're going to create the text index to include the title and body attributes from the BlogEntry.

templates/search/search.html
 

Next, we need a template for the search page. Create the search.html template inside a search directory.

{% extends 'template.html' %}

{% block title %}Search{% endblock %}

{% block content %}

Search

  {% if query %}      
     Results     
       {% for result in page.object_list %}        
            {{ result.object.title }}     
       {% empty %}        
           No results found.     
       {% endfor %}   
  {% endif %}
   
 

urls.py

Finally in the code, we need to add the urls. At the top level urls.py add:

(r'^search/', include('haystack.urls')),

Create the index

The final thing to do is to create the index. Run ./manage.py rebuild_index to create the new search index.

Once that is done, go to /search/ on your server and have a play.

Indexing Links

Once you have the above working, it is trivial to add more fields to the index. For example, to add an index for the links application, we need two more files:

links/search_indexes.py

This maps the Link object in the same way as we did for BlogEntry above. As there is no status field or publication date on links, we don't have to worry about limiting the queryset.

import datetime

from haystack.indexes import *

from haystack import site

from drumcoder.links.models import Link

class LinkIndex(SearchIndex):

    text = CharField(document=True, use_template=True)

site.register(Link, LinkIndex)

templates/search/indexes/links/link_text.txt

Here we define the fields from Link we want to include in the search index.

{{ object.title }}

{{ object.description }}

{{ object.url }}

Updating the Index

There are a couple of ways to update the index. ./manage.py update_index will add new entries to the index. ./manage.py rebuild_index will recreate the index from scratch. An alternative approach that works fine for low volumes of change, like I have here, is to change the superclass of each Index to RealTimeSearchIndex instead of just SearchIndex:

import datetime

from haystack.indexes import *

from haystack import site

from drumcoder.blog.models import BlogEntry

class BlogEntryIndex(RealTimeSearchIndex):

    text = CharField(document=True, use_template=True)

    def get_queryset(self):

        """Used when the entire index for model is updated."""

        return BlogEntry.objects.filter(publication_date__lte=datetime.datetime.now(), status=BlogEntry.LIVE_STATUS)

site.register(BlogEntry, BlogEntryIndex)

This will automatically update the index when a BlogEntry is created or changed.

    By Posted On
SENIOR DEVELOPER at MICROPYRAMID

Need any Help in your Project?Let's Talk

Latest Comments
Related Articles
Autocomplete with Django-Haystack and Elasticsearch with single letter querying. Ashwin Kumar

Django's haystack provides autocomplete functionality. To do autocomplete effectively, the search backend(elasticsearch in this case) uses n-grams (essentially a small window passed over the string). ...

Continue Reading...
Celery Flower to monitor task queue Swetha Naretla

Celery is a task queue that is to built an asynchronous message passing system. It can be used as a bucket where programming tasks can ...

Continue Reading...
Integrate Django-Oscar-Accounts with Django-Oscar Shirisha Gaddi

This package uses double-entry bookkeeping where every transaction is recorded twice (once for the source and once for the destination). This ensures the books always ...

Continue Reading...

Subscribe To our news letter

Subscribe to our news letter to receive latest blog posts into your inbox. Please fill your email address in the below form.
*We don't provide your email contact details to any third parties