A Guide to Django full-text search with Sphinx and django-sphinx
Recently, I came upon an open-source SQL full-text search engine called Sphinx. More recently, I came upon the blog of django-sphinx developer David Cramer, and his excellent guide to setting up sphinx and how to integrate it with Django.
David's post was an excellent starting point for getting a successful search set up on my latest Django/MySQL project. That said, I ran into a handful of issues that weren't covered in the original post, and in the spirit of the Internets (and hopefully w/David's blessing), here's everything I've learned about Sphinx and Django-sphinx.
Requisites & Environments
I got it up and running on both my dev and prod machines. My dev is my mac laptop running Leopard (OSX 10.5), and prod is an Ubuntu 8.10 Intrepid Ibex install. I'm going to do my best to point out the idiosyncrasies between the two as I proceed here.
Hi! I'm a note for OSX users. See the icon? Just sayin'. Also, OSX doesn’t come with the libraries that are necessary for the build processes we're going to go through. Before proceeding, you should obtain Apple’s Xcode package and install it (download and install Xcode 3.0 or look install it from your OSX installation discs).
You’ll also want to have enabled your OSX root account (You can read how to do that here).
Hi, I'm a note for Ubuntu users. Keep that in mind. Also, if you haven't already, you may need to install the mysql-devel package on your server (it's not always installed by default). Running sudo aptitude install libmysqlclient15-dev did the trick for me.
Please ping me (via comments) if you run into issues not covered in this guide. I'd like to keep adding to this as issues come up.
Install Sphinx
I went with Sphinx's latest stable release, which is 9.8.1 at the time of this writing. Keep in mind that Sphinx is a server that we're trying to install, configure, and run so that Django can make requests to it.
OSX doesn't come with wget. You can use curl instead, like this: curl -O http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
Once you've got the package, extract, configure, make, and make install it.
If all goes well with your configure, then you should be good to go to the next section: Install django-sphinx.
If not, you might see something like this:
This means that Sphinx wants to support MySQL, and it can't find the libs it needs.
Again, installing the mysql client dev package with sudo aptitude install libmysqlclient15-dev will probably get you what you need.
To get this up on OSX, I followed this short guide for installing Sphinx on OSX 10.5 leopard by Mr. Clinton R. Nixon from Viget Labs. It's really excellent, and Sphinx was up and running by the end of it. Check it out, and come back when you're ready.
Assuming your machine has the mysql dev package installed, and you're still getting a configuration error, you can specify a location for it to look for mysql stuffs. First I'm going to show you how to find out where your current mysql binary is located, and from there we can tell Sphinx what's up.
Check it out:
Thar she blows. Let's plug in the path up to bin/ to see if we strike gold:
You should be all set at this point. If not, leave a comment below, or check out the Sphinx site for a more-detailed installation guide for 9.8.1 here.
Install django-sphinx
These notes are also found on the django-sphinx project page.
If you're one of those easy install people, then you can just sudo easy_install djangosphinx your way to glory.
The rest of us can grab a fresh svn checkout, and install it like so:
Configure Django to use Sphinx
The author of django-sphinx rightly notes that sphinxapi.py (which is installed with django-sphinx) needs to be on your PYTHONPATH. If things start going wonky, this should be among the things to check for.
Edit your settings.py
This tells django-sphinx which version of Sphinx it's talking to. There's a reference for these on the django-sphinx project page.
[edit] Also be sure to add 'djangosphinx' to your INSTALLED_APPS. (thanks, Tomas)
Edit your models.py
I've been plowing ahead thinking you know what you want to index for searching. In my case, I have an django app called editorial (that's part of a news web site). I want to index content contained in a model called Story. That content lives in fields like Title, Tease, Body, and so on. At this point, you should have similar information on hand to proceed.
At the top of the models.py that contains Story, I added this:
Then I added an attribute to my Story model called search and pointed it to the SphinxSearch class that I imported.
The SphinxSearch class requires a name for the index you want to create (I called mine editorial_stories), and a accepts a dict of fields and weights. I jumped right in and weighted some of the fields from Story because weighting fields is badass. This is optional.
Save it, and let's move on.
Create a Sphinx configuration
This is where I got a little lost, so I'm going to step through this carefully (if verbosely).
The author of django-sphinx created a clever bit of functionality that lets you generate Sphinx configuration files based on how you set up your Django app/model. It may just have been me, but my checkout generated broken Sphinx config files that only got me 1/2 way to a proper setup. Here's how it went:
Which generated something like this:
This is very helpful, however, I found that trying to run Sphinx off of this configuration file resulted in failure:
You may also experience an unknown key name 'FROM' error. Just nudge the sql_query parameter so that it's all on one line.
In spite of this setback, we can create a functional sphinx.conf by combining a sample conf from my Sphinx dir with the output of the sphinx_generate_config bit. I've already done it here: Sphinx config.
Check it out and replace the source and index items at the top with your output from sphinx_generate_config.
At this point, you can update all of the many paths referred to in the configuration file. I used /usr/local/sphinx as my base directory, and created log/ and data/ directories. Don't worry if the pid and log files don't already exist. As long as the directories are available, you should be fine on that front.
Also, Here's a list of stopwords for your enjoyment. Save them to your sphinx dir as stopwords.txt and make sure your config reflects the location of this file.
If all of your paths are updated, you should be ready to run searchd for the last time.
Yay! Now stop it with a searchd --stop command (don't worry, we'll start it again later). For now, you can optimize your sql_query to exclude fields that your users won't be searching. I changed mine like this:
Update: You may see an error like this:
If you do, it's because the paths to your index in your sphinx config are bogus. Make sure the location that you're pointing to exists, and that it's accessible to searchd/indexer.
You can also use a -config flag when running searchd to point to a specific config file. Like this: searchd -config=/path/to/sphinx.conf
Create your index
Now that Sphinx is configured correctly (that is, searchd can run w/out error), you need to create the index that you've configured in the sphinx.conf.
To do so, we use an app that come with Sphinx called indexer.
I just plug in my index name, and we're good to go.
Test your search
Ok, fire up Sphinx again, and then open up the Django shell for your project.
The interactive python app I'm using is called iPython.
The samples below are also illustrated on David's original post.
Very excellent work, David.
Sample Django code
The very last part of implementing django-sphinx was getting it integrated into my Django app. Here are the different snippets of code I used to get on my feet:
I added this line to my app's urls.py. It takes a search query and funnels it to a view called search_results.
Here's the view. I wrote it to accept queries as request vars or user-hacked searches. I did this because it will be accepting searches from forms throughout the site (/search/?query=searchterms), and I didn't want it to break if some enterprising URL hacker prodded it (/search/bills).
Also note that I'm populating the template context with search_meta. This isn't really necessary, but it's cool data, and for dev purposes, I wanted to dump out as much search and result meta that I could.
Here's the search_results.html referred to by the view above (simplified for brevity).
I hope this was helpful to you. I'll update this post as I receive feedback.
On your way out, I'd like to recommend that you check out David's In-Depth django-sphinx tutorial.
If you enjoyed this post, then tweet about it!