Update Your DjangoSites Screenshots When You Want

You can now update your DjangoSites website screenshot by simply editing your listing and ticking the box that says "Re-do Screenshot".

To edit a website, first log in to DjangoSites using your username or OpenID. Once logged in, click 'My Sites' in the navigation bar, then click on the website you wish to have updated.

From the website detail page, click the 'Edit' link next to your username. Make any changes you wish to make, click the Re-do Screenshot checkbox, and click 'Save'.

It's important to note that when you edit a website, it becomes un-verified so it will not be publicly visible. I manually review all submissions and edits, so it might take me a few days to verify and publish your listing after you edit it.

This is something that has been on the books for a while, and I'd like to thank Richard Barran for prodding me into getting the change done.

While you're editing your listing, you can update your deployment details to help with our Django Deployment Survey. The results will be published shortly for all to see - thank you to those who have responded so far!

Django Site of the Week: The A.V. Club

For various reasons I haven't been able to post a Django Site of the Week for quite a while now, which is a little embarrassing. I've had this little interview sitting in my inbox for over a month, and I've only just been able to write it up and get it published.

Just about everybody online has come across The Onion at some stage. Whether you have interpreted a satirical news story as the truth, or just read it for a laugh, The Onion is a staple of online publishing.

A less-satirical yet still enjoyable feature of The Onion is The A.V. Club. Originally part of the printed Onion newspaper and then a PHP-powered Drupal based website, The A.V. Club is now running on Django - and the team who built it love Django to bits.

This week, I spoke to Tom Tobin, web developer at The Onion. We discussed how they built the A.V. Club website, how they manage with a million unique visitors a month, and what their plans are for the future of The Onion's online presence.

You can read the interview and leave your comments at the Django Site of the Week Website.

The Django Deployment Survey: How Are You Hosting Django Websites?

According to the official Django Documentation, there are around a half-dozen documented and supported ways of deploying Django in production environments. The recommended way these days is with Apache and mod_wsgi, but there are still a whole bunch of decisions that one must make when it comes to making their Django project public.

One of the great things about running DjangoSites is that it has exposed me to a number of fantastic Django-powered websites. Each of these is unique in one way or another, but they are all running Django. How they use it, though, is anybodies guess.

So what are our peers doing? There are proponents for and against each of the different pieces of the Django Deployment puzzle.

  • What operating system should be used? BSD, Linux or Windows?
  • What Database system should be used? Postgres, MySQL, or something else?
  • Which web server software? Apache, nginx, or one of the many python-powered web servers?
  • Which method should I use? mod_python, FastCGI, or something else?
  • Which version of Django? Do I stick to a release, or try and keep up with SVN?

To see what our Django-using peers are doing, I am starting a survey of Django website deployment methods. This will be an ongoing addition to DjangoSites.

The way it works is pretty straightforward. As of tonight, there are a number of new fields on the 'Submit a Site' and 'Edit' screens to let you select the way you have deployed your Django-powered website. I will not be publishing these details on a site-by-site basis, so you can rest easy knowing that I won't tell the world that your website is hosted on Windows with an Oracle database backend.

Once there are a material number of responses to those questions, I will publish the statistics in aggregate with some pretty colourful charts. These graphs will remain public (and dynamic - updating as the data grows) once the volume of submissions provides meaningful data.

What will this give us? An interesting look at the way that the wide public deploy their Django applications, and potentially longer-term trends showing the usage of particular deployment methods as time goes on.

Want to be involved? I'd love to hear your feedback and see as many sites as possible have their deployment details listed.

If your site is already listed at DjangoSites, you can simply log in and click 'My Sites' in the toolbar. Open each site that's listed, and click the 'Edit' link. Alternatively, if you have a whole bunch of websites you want to update, just email me your DjangoSites username and deployment details and I'll do the update for you.

Don't despair if your websites aren't listed at DjangoSites yet - you can simply sign up then submit your websites for free! Remember, the deployment details are optional so you can list your website without being included in the aggregated statistics if you prefer.

Lastly, I would like to thank Joshua Jonah of Symbiosis Marketing for planting the seed for this idea. I'm hoping it'll give an interesting insight into more of the 'behind the scenes' details of Django-powered websites.

Django Site of the Week: Deskography

Seeing how other people work is something that seems to be of interest to most developers. Whether it's because they want to become better workers themselves or because they're somewhat voyeuristic is open to debate - either way, Django-powered website Deskography is a well-designed social desk-sharing website. This week, I spoke to Gustaf Sjöberg of Distrop to find out why they chose Django to power Deskography, and what it's allowed them to do. You can read the interview over at the Django Site of the Week.

Full-text searching in Django with PostgreSQL and tsearch2

Recently, I have wanted to improve the searching ability in a number of my projects. I've often added a search box that simply does an icontains filter, which works fine for small sites but doesn't scale due to the inefficiency of matching text with basic SQL queries. The plus side of an icontains filter is that it works on all of Django's database backends (with the exception of SQLite, which works but in a case-sensitive way).

All of my projects run on PostgreSQL, an open-source database package that runs beautifully everywhere I've tried it. If you are familiar with MySQL, Postgres shouldn't be a problem for you.

When you want to do full-text searching from within your Django projects, you have a few different choices available to you, mostly depending on which database backend you're using:

  • MySQL has built-in fulltext indexing which can be used in Django with some basic ORM extensions.
  • PostgreSQL prior to 8.3 has a 'contrib' module known as tsearch2, which allows full-text indexing within your SQL queries. This is what I am going to detail below.
  • PostgreSQL 8.3 and later has tsearch2 built-in, which is usable in Django if you follow these directions from Barry Pederson
  • You can install Sphinx to index your database content (on either MySQL or Postgres), then use django-sphinx to get access to your Sphinx indexes within the Django ORM. Since this can work over a network, it may be a better option for massive datasets as you can run your database and index on separate machines.
  • Similar to Sphinx is solr, which can be used with a Django ORM addon as documented in this blog post. I believe solr works with both MySQL and PostgreSQL, but I haven't investigated it further.

I'm using PostgreSQL 8.1, and decided not to rely on external software to maintain my index due to memory limitations. Therefore my choice was simple: I utilised the tsearch2 contrib application for PostgreSQL.

Note: All of the instructions in this blog post relate to my setup on a basic Debian 4.0 system with a relatively default PostgreSQL configuration. PostgreSQL and Django were already installed and running, and I already had my models in place. My models also had data in them already as I was upgrading an existing website. YMMV.

The first thing I had to do was install the [postgresql-contrib-8.1](http://packages.debian.org/etch/postgresql-contrib-8.1) package, which provides a number of third party addons to PostgreSQL including type extensions for ISBNs, benchmarking, various full-text indexing tools, and XML manipulation tools. The only element out of these that we'll be using is tsearch2, described as "Searchable text data type supporting indexed access." The PostgreSQL contribs are similar to those in Django: If you don't actually use them, they'll just sit there taking up nothing but hard-drive space. This means the extra contribs that I am not using shouldn't cause any compatibility or runtime errors.

sudo apt-get install postgresql-contrib-8.1

Once the package is installed, on Debian I was given a bunch of new files under /usr/share/postgresql/8.1/contrib/, including a file called tsearch2.sql. You can take a look at it if you like: it provides a number of new tables, functions and data types once it's imported into a database.

The next step is to do just that: Import the tsearch2.sql file into my database. In this case I'm working on the DjangoSites database, so within my djangosites folder I ran ./manage.py dbshell.

$ cd djangosites
$ ./manage.py dbshell
djangosites=> \i /usr/share/postgresql/8.1/contrib/tsearch2.sql
djangosites=> \q

Running this command as my restricted djangosites user threw up an error immediately, and because the script utilises transactions the rest of the changes were rolled back. To get around this I had to run the import as the postgres superuser. Alternatively, I could give the djangosites user a higher level of access.

$ psql -U postgres djangosites
djangosites=# \i /usr/share/postgresql/8.1/contrib/tsearch2.sql
... (a few hundred CREATE TABLE, CREATE FUNCTION, and INSERT commands)
djangosites=# \q

My djangosites database is now able to recognise the commands used by tsearch2, and a handful of new database tables have been created:

$ ./manage.py dbshell
djangosites=> \dt pg*
            List of relations
 Schema |     Name     | Type  |  Owner
 public | pg_ts_cfg    | table | postgres
 public | pg_ts_cfgmap | table | postgres
 public | pg_ts_dict   | table | postgres
 public | pg_ts_parser | table | postgres
(4 rows)
djangosites=> \q

I now need to create a new field on the table I wish to index and search to hold the full-text search keywords and index. This is pretty straightforward, and needs to be done once for each table you'll be searching on:

$ ./manage.py dbshell
djangosites=> ALTER TABLE websites_website ADD search_tsv TSVECTOR;
djangosites=> CREATE INDEX search_index ON websites_website USING gist(search_tsv);
djangosites=> \q

This simple command added a new field called search_tsv to my websites_website table and created an index on that field. The type, TSVECTOR, is only available if we successfully imported the tsearch2.sql file earlier. The name search_tsv is arbitary, I chose that name so when inspecting the schema later I'll be reminded it's a tsvector field and not for general use.

In my example I want to include four fields in the search index: the website Title, URL, Description and Tags. Because my table is already populated with plenty of data, I need to update the index for each existing row. If you are working with a fresh database, this won't be needed.

This first example will build the search index out of a single column, if you only have one column to index:

$ ./manage.py dbshell
djangosites=> UPDATE websites_website SET search_tsv=to_tsvector('default', title);
djangosites=> \q

If, like me, you have multiple columns you want to match on, then you need to use the COALESCE command while combining them to ensure columns with no data are handled without errors:

$ ./manage.py dbshell
djangosites=> UPDATE websites_website SET search_tsv=to_tsvector('default', COALESCE(title, '') ||' '|| COALESCE(url, '') ||' '|| COALESCE(description, '') ||' '|| COALESCE(tags, ''));
djangosites=> VACUUM FULL ANALYZE;
djangosites=> \q

Running this command gave me an error, as my djangosites user did not have permission to read the pg_ts_* tables created during the import earlier. To fix this, I jumped into the console as the postgres user again and granted SELECT access to the relevant tables. I could immediately run the UPDATE command above after doing this.

$ psql -U postgres djangosites
djangosites=# GRANT SELECT ON pg_ts_cfg, pg_ts_cfgmap, pg_ts_dict, pg_ts_parser TO djangosites;
djangosites=# \q

The UPDATE command populated my search_tsv field with a mass of data taken from the columns I fed into it. If you run a SELECT on this column now you'll see that search data:

$ ./manage.py dbshell
djangosites=> SELECT title, search_tsv FROM websites_website WHERE slug = 'www-rossp-org';
         title          |                   search_tsv 
 Ross Poulton's Website | 'wed':20 'blog':10,27 'info':17 'ross':1,6 'howto':15,28 ....
(1 row)
djangosites=> \q

So we have built our initial search index and loaded it up with data from the existing database records. But what about when somebody comes along and inserts a new row, or modifies an existing row? We need to make sure the index is updated so that the search remains useful. To do this, we create a database trigger.

$ ./manage.py dbshell
djangosites=> CREATE TRIGGER websites_website_tsvectorupdate BEFORE INSERT OR UPDATE ON websites_website FOR EACH ROW EXECUTE PROCEDURE tsearch2('search_tsv', 'title', 'url', 'description', 'tags');
djangosites=> \q

You can now try inserting data into your table with a simple INSERT query, then SELECT that row to ensure the search_tsv column is populated. If you get an error on the insert stating ERROR: could not find tsearch config by locale, then you need to update the tsearch2 configuration to match the locale that your system is using.

$ psql -U postgres djangosites
djangosites=# SHOW lc_collate;
(1 row)

djangosites=# UPDATE pg_ts_cfg SET locale='en_US.UTF-8' WHERE ts_name='default';
djangosites=# \q

We've now reached a point where all existing data is indexed, and all new or updated records are indexed automatically. This data isn't very useful as it is, though. We can now make use of a handful of nifty commands to actually run our full-text search against our data set:

$ ./manage.py dbshell
djangosites=> SELECT title FROM websites_website WHERE search_tsv @@ to_tsquery('melbourne');
 FC Melbourne
 Ross Poulton's Website
 Jutda - Show The Way
(3 rows)

djangosites=> SELECT HEADLINE(title, q), RANK(search_tsv, q) AS rank FROM websites_website, to_tsquery('hosting') AS q WHERE search_tsv @@ q ORDER BY rank DESC;
                        headline                         |   rank
 Libre <b>Hosting</b>                                    | 0.0865452
 TopHost (<b>hosting</b> service provider)               | 0.0865452
 Fileducky - easily upload to multiple file <b>hosts</b> | 0.0865452
 Compare VPS <b>hosting</b> plans                        | 0.0827456
 Bitbucket.org - Mercurial <b>Hosting</b>                | 0.0827456
 Chide.it - We Make Internet Applications                | 0.0759909
 Kyron                                                   | 0.0759909
 ElasticHosts                                            | 0.0759909
 excess.org                                              | 0.0607927
 GetHotShots                                             | 0.0607927
(34 rows)

djangosites=> \q

As you can see, the HEADLINE() and RANK() commands come in quite handy. HEADLINE() will highlight the matching words with HTML <b> tags, and RANK() will provide a ranking that can be used to sort the items from most to least relevant.

Something interesting to note in the above hosting example is that a search for hosting also matched the word host and hosts. This fuzzy searching is configurable in tsearch2, but is beyond the scope of this blog post (and, to be honest, the defaults work perfectly for me in my English-based websites).

The last piece of this puzzle is to make use of our search within our Django view. Assuming we have a variable called q with our search terms, we can use the extra() command in Django's ORM. Note that we turn the space-separated keyword list into a pipe-separated list, so all searches will OR words together by default:

keywords = "|".join(q.split())
results = Website.objects.select_related().extra(
        'snippet': "headline(websites_website.description, to_tsquery(%s))",
        'rank': "rank(search_tsv, %s)",
    where=["search_tsv @@ to_tsquery(%s)"],
    select_params=[keywords, keywords],

We're now left with results, a standard Django QuerySet for the Website model with two new fields: snippet contains a section of the description column with the search terms highlighted (avoiding XSS during template display is an exercise for the reader) and rank, a numeric ranking of the usefulness of each row returned. We've sorted it by the rank, so that the most likely search results are presented at the top of the list. You may want to add this code into a custom manager to provide quicker access to it, although it will work fine in a view as it is.

You can pass results into your template, or even into a generic view that makes use of pagination. Your website search will be more accurate so it'll be more useful to your visitors, and it'll place less load on your server. Enjoy!


To get an idea of how efficient this method of searching is, take a look at these two examples. The first is utilising the LIKE command, which Django defaults to for it's icontainssearches. The second is utilising the tsearch2 index explained above. There are less than 2000 rows in this table, and the difference in both query time and results returned is staggering.

$ ./manage.py dbshell
djangosites=> \timing
Timing is on.
djangosites=> SELECT title,url FROM websites_website WHERE title LIKE '%hosting%' OR description LIKE '%hosting%' OR url LIKE '%hosting%' OR tags LIKE '%hosting%';
(19 rows)

Time: 13.892 ms
djangosites=> SELECT title,url FROM websites_website WHERE search_tsv @@ to_tsquery('hosting');
(34 rows)

Time: 5.921 ms
djangosites=> \q

If I run each command twice in quick succession, the LIKE query drops to about 11ms and the tsearch2 query drops to 0.9ms. Given most searches don't happen twice in quick succession, I'm seeing a 50% improvement in query times. As the number of rows grows, I would expect the LIKE query to take longer whilst the tsearch2 query should stay relatively stable.


I made use of the following excellent resources whilst setting this up for myself and writing this blog post. They're strongly recommended as further reading:

  • Barry Pederson's blog post *PostgreSQL full text search with Django*, which assumes PostgreSQL 8.3 is being used, is very useful. The Django code in my blog post is from Barry's post.
  • The DevX article *Implementing Full Text Indexing with PostgreSQL* is a non-Django-specific article covering the installation of tsearch2, configuration and use in an SQL context.
  • The official tsearch2 documentation:
  • The tsearch-v2-intro document covers generalised use and explains how various parts of tsearch2 work under the covers
  • The tsearch2 guide goes into more detail and covers configuration of items such as the fuzzy searching, weighting the rankings, and more. It's essential reading if you're not working in English or if you have complex models to index.

Django Site of the Week: ForecastWatch

Eric Floehr is the man behind ForecastWatch and ForecastAdvisor, two Django-powered weather websites that aggregate and analyse weather forecasts to compare their accuracy on an ongoing basis. This week, I spoke to Eric about the history behind his sites, how he handles massive data sets, and his conversion from Ruby on Rails to Django. You can read all about it over at the Django Site of the Week.

This weeks' SOTW was delayed by a day due to our Australia Day celebrations on the 26th of January. I figured if they can delay an international Tennis match for fireworks, I could delay this as well :)

Django Site of the Week: ShutterCal

ShutterCal is a photo sharing site with a difference - it encourages contributors to upload one photo every day to help people reach a goal: whether that may be to improve their photography, watch themselves evolve over a year, or remember people and places they visit. ShutterCal started in 2007, and this week I spoke to developer Dan Ferrante to find out why he chose Django, some of the challenges he's faced, and how he uses Django and other open-source software to make his life easier. You can read more over at the Django Site of the Week.

Django SOTW Moving to Mondays

To better fit with both my own way of doing things and with general web browsing patterns of the Django SOTW public, I'm moving the SOTW to Mondays rather than Saturdays.

You can expect new interviews to go live on Monday evenings (Australian EST) which means our US visitors should see them in their RSS feeds first thing Monday morning.

Also, I've finally succumbed to the forces and I've joined Twitter so you if that's your kinda thing then come and say hi!

Django Site of the Week: Represent

Represent is a new website prototype from the New York Times that provides New York residents with information about the whereabouts of their elected representatives. What's interesting about this website is that it's one of the first large-scale sites to implement GeoDjango for spatially-aware applications. This week, I spoke with Derek Willis to get some details on their implementation of a Django project at one of the worlds' most famous newspapers.

You can read the entire interview over at the Django site of the Week website.

Django Site of the Week: EveryBlock

The Django Site of the Week is back after a Christmas-induced break with an interview with Adrian Holovaty. Adrian is no stranger to Django, and his name is known throughout the community as one of the brains behind Django's birth and subsequent open-source release. His latest project EveryBlock is the evolution of an earlier mashup, chicagocrime.org, which won Adrian a number of awards. So what are the driving forces behind EveryBlock? I recently spoke with Adrian to find out.

You can read the interview now over at Django Site of the Week.


Want to see more? Check out the yearly archives below.