Tag Archive: django.

Self-Hosted Server Status Page with Uptime Robot, S3, and Upscuits

For quite a while I've had a public "Status" page online for WhisperGifts via Pingdom. It basically just shows uptime over the past few days, but given my site is relatively low-volume and not ovely critical to my customers, the $10/month for Pingdom was actually one of my largest expenses after hosting.

So, I started looking for an alternative.

Today I re-deployed the WhisperGifts Status Page using a combination of Uptime Robot, Upscuits and Amazon S3.

In short, I have Uptime Robot checking the uptime of my site (including it's subsites, such as the admin and user pages). The statistics are gathered and presented by Upscuits, which is entirely client-side JavaScript hosted on S3.

My basic todo list for next time:

  1. Sign up for Uptime Robot. I'd been using them for ages on their Free plan as a backup to Pingdom; this gives 5-minute checks. Their paid plan gives 1-minute resolution.
  2. Add your sites, make sure they're being monitored correct.
  3. On the Uptime Robot dashboard, click My Settings. Open the section labelled Monitor-Specific API Keys and search for your Monitor. Copy the API key to a text file for later; repeat this step for subsequent monitors you want to include on your status page.
  4. Download the latest Upscuits release to your PC.
  5. In the public folder of the upscuits package, rename config.example.js to config.js. Paste your API key(s) inside it.
  6. Create an AWS bucket called eg status.mysite.com and enable website mode. Setup your DNS etc to point to this bucket.
  7. Upload the contents of public/ to your AWS bucket
  8. Visit your new status page and view your last 12 months of Uptime Robot statistics
  9. Close your Pingdom account saving $10 a month Profit!

For a small site like mine this has a couple of obvious benefits. It's free (or $4.50/month if you want higher resolution - still half the price of the most basic Pingdom plan); it uses a tiny amount of S3 storage which is as good as free, and doesn't involve running any server-side code. The included index.html is also easily customisable if you like, since it's just plain HTML (using the Bootstrap framework, by default). This is a big win over hosted solutions, IMO.

What's New With WhisperGifts

In the past couple of months I've done some fun and interesting stuff over at WhisperGifts, my side project that lets couples put their bridal gift registry online. It's all built with Django, so I thought I'd share a few neat things I've come across along the way.

Bookmarket and image detection

For a while I've had a bookmarklet to add any item to your registry, which basically took the page title, selected text (or meta description), and URL and pre-filled the Add Item form. Recently, I updated it to also find the largest image on the page, and add that as the default image for the item you selected.

This makes it pretty quick to add an item from, for example, an Amazon page or a manufacturer's website. The code to do it was surprisingly simple. Apologies for the short variable names; this is taken directly from my bookmarklet code so brevity is a plus in that scenario.

var ims=document.getElementsByTagName('img');
var imsMxD=0;
var im='';
    var xDim=parseFloat(ims[x].width)*parseFloat(ims[x].height);

This will give me a variable, im, which contains the URL of the largest image on the page. I then pass this to the form to pre-populate the image field. It's a really easy way to find the largest image on the page (in this case, taken as the largest surface area) and then do something useful with it.

Updated HTML and Pricing

I rebuilt WhisperGifts marketing site and dashboard using Bootstrap. The visual design is identical to the previous design, but it's now responsive. Making this change only took a few hours of effort, and saved me from trying to retrofit responsive utilities into my existing layout. It also gave me a chance to clean up my Django templates a bit.

At the same time, I changed the default template that my customers get to be much nicer than the black and white default I previously used. Pricing also got simplified; I took out both the cheap and expensive paid options and kept a single paid plan for $29. The pricing page is much simpler as a result.


For something a bit different, I installed and set up restless to get a REST API for some parts of the WhisperGifts site. For a while I've wanted to play around with some mobile client development, and a proper API will make that a bit easier. It's only just been announced and given the site's audience I am not expecting a huge amount of use, but it was an interesting project to undertake regardless.

This might also lead me to mess around with a single-page JavaScript app that can consume the API (specifically for the user dashboard side of things) but who knows when!

Weather lookups

If a WhisperGifts user has added the address of their wedding to their registry page, I use that address to do hyperlocal weather lookups using Forecast.io. I'm using python-forecastio, which makes the interesting part of this only a few lines of code:

forecast_result = forecastio.load_forecast(settings.FORECASTIO_API_KEY, lat, lng, time=registry.weddingdate)
weather = forecast_result.currently()
w_temp = weather.d.get('temperature', None)
w_summ = weather.d.get('summary', None)

if w_temp is not None and w_summ is not None:
    registry.weather = "%s, %s&deg;" % (w_summ, w_temp) # Results in "Partly cloudy, 19°".

I can then show this on the couple's registry page and use it in reminder emails sent to guests. At the moment I do these lookups daily for any registry that's within the next month; so far Forecast.io has weather details for 30 days out for most locations which is rather amazing.

None of these are specific to the wedding business, but it's been a fun way to play around with a few pieces of tech that haven't really fit into any other project so far.

Djangosites Open Sourced

Back in 2008 I started djangosites.org as a listing of websites powered by Django. Prior to that, we relied on a wiki page to see who was using Django, so an image-based website felt like a big improvement.

Since day one I've promised to release the source code that I use for the site. It's relatively simple, so I never stressed much about making it a high priority - but I continue to be asked and politely berated for not getting it published.

Today that's changed. I think it's too late for me to say I've come good on my promise, but the Djangosites source code is now available on GitHub.

The README has more details, but in short this is a dump of the code currently running the site. I'll continue to use this repository as changes are made to the live site, however I'm not actively working on djangosites at this point in time (other than reviewing & approving submissions)

There's a few pieces of this that might be useful for people new to Django, but otherwise this is really a collection of generic views. The useful bits might be:

Suggestions and pull requests are welcome, but I'm not actively soliciting changes. I should probably clean things up a bit given that this codebase hasn't changed materially since Django 0.96, other than slight refactors to allow upgrades to work - so I'm certainly not yet taking advantage of new functionality that's been made available in recent Django versions. Perhaps now you can see how bad the code is, I'll have more of an incentive to fix it :)

The source code is available right now from GitHub under a mixed licence: the Python code is MIT-licenced, and the rest (HTML etc) is not open source but included in the repository for completeness and as an example.

Tracking CPC Results in Django

Like many startups, I use CPC ads to attract attention to WhisperGifts. I bid a set fee per click for particular search words on Google, and for ads shown to my target demographic on Facebook. I wanted to track an individual signup to their source CPC campaign, so put together a really quick bit of Django middleware to help me out.

All I do is stash the value of utm_campaign (set by Google or Facebook) into a cookie called wgcookie.

class CampaignMiddleware(object):
    def process_response(self, request, response):
        from registry.models import Campaign
        if 'utm_campaign' in request.GET.keys():
            if 'wgcampaign' not in request.COOKIES.keys():
                # add campaign cookie
                c = request.GET.get('utm_campaign', None)
                response.set_signed_cookie('wgcampaign', c, max_age=31536000, httponly=True)
        return response

At signup time, I read out the cookie into the users table:

    campaign_code = request.get_signed_cookie('wgcampaign', None)
    campaign_code = None

user.campaign_code = campaign_code

Simple! I've now got the capability to track actual revenue by campaign source - a very handy tool to identify which campaigns might warrant higher or lower spending.

I'm aware this isn't rocket science, but I figured it's worthwhile sharing - it makes more sense to me to track these things directly in my own database, than to try and match data from the AdWords panel with various analytics services.

Happy CPC bidding!

New Podcast: Django Roundup

Lincoln Loop are one of the earlier Django-based development shops, and their various employees contribute in many ways to the open-source community. One new addition they've just made is the launch of Django Round-Up, a podcast covering the news in the Django community.

This is a podcast hosted by @kennethlove and @bkonkle from @lincolnloop that highlights recent articles and projects in the Django community. We love talking about web development, so our podcast focuses on casual conversations as we cover the latest blog posts and project releases.

I was surprised to hear my name coming through my headphones, only a minute into their first episode - with a quick review of my recently-published django-readonly-site package.

As a result of their comments I've made some minor updates to address questions and suggestions from the podcast team.

I want to publicly thank them for including my item in their inaugural episode, and suggest that anybody in the Django community goes out and checks out this valuable new resource!


Occasionally I need to take WhisperGifts offline, but still show some parts of the site to users. This has included some system changes that require the site to be non-functional for a little while (such as doing a deployment with a bunch of backwards-incompatible changes, or large database migrations) and for server moves, whilst waiting for DNS changes to propogate.

To do this, I wrote a little library that I could toggle within my Django settings. I've just pulled it out of the WhisperGifts codebase, and django-readonly-site is now available on GitHub. I think it's pretty simple to use.

Install it with pip install django-readonly-site, add readonly to your Django projects' settings.INSTALLED_APPS, and set settings.SITE_READ_ONLY = True. More options are available to keep parts of your site online, see the README for more details.

By keeping parts of your site online (such as the homepage, about us page, and in my case a customers' registry listing) you can provide a transitional experience to users, while the database-intensive and high-integrity parts of the site (such as signup, account management, and checkout) are taken offline with a polite "Sorry, we're temporarily unavailable" message.

Just after I had to quickly move to Rackspace after an outage with my previous web host, Rackspace announced that they now had a public cloud offering in Australia. For performance reasons, I'll be moving from DFW to SYD soon - and I will use django-readonly-site to try and minimise the perceived downtime for my users.

Your thoughts, suggestions, and pull requests are welcome on the GitHub Project Page.

Getting Paid in Django with Pin Payments

Payments in Australia are controlled by the so-called "big four" banks, and it's been difficult for a long time for startups to get merchant facilities to process credit cards online. Accounts cost hundreds of dollars per month, with high transaction costs and minimum transaction volumes thrown in.

We watched with teary eyes as companies such as Stripe launched and made it easy for developers to process credit cards, and kept struggling with the PayPal "API" with hope that one day we'd see Stripe in Australia.

I was excited, then, to see that Pin Payments have just publicly launched in Australia. They're currently offering a $9/month account fee, which isn't free but it's certainly ideal for small companies. After June 2013 this will jump to $50/month, so it seems to make sense to sign up now if you've got any medium-term plans.

Like Stripe, Pin offer a JavaScript Library for processing payments on your page without ever handling credit card details. This makes it much quicker and easier to implement.

Because I process payments in multiple places on WhisperGifts, I needed a reusable way to render the payment form and process the payment in a worker queue. The resulting code has been packaged up as django-pinpayments, which is still simple and in an Alpha state but it's a great starting point.

The code is released under the BSD licence, and I'd love to get your suggestions, comments, and GitHub Pull Requests.

A few things on the to-do list include providing Celery tasks out of the box, and providing tests & documentation. If you can help with any of these I'd be very happy :)

In the meantime, I'm about to go live with credit cards on WhisperGifts for the first time. I'll keep PayPal around as an alternative, but if demand is low I won't hesitate to turn it off. I can't wait.

The Definitive Answer, Explained

Yesterday I posted that Django was almost certainly suitable to use for your project. I've had some minor push-back, so I thought I'd explain a little.

When beginning a project, many businesses appear to spend an inordinate amount of time making technical decisions that are often outside their area of expertise. One such decision might be from a small business owner wanting to decide whether to build their shopping cart with Django, Rails, or PHP.

The hard truth is that for the most part, this decision doesn't matter. All three of the above can be used to successfully build exactly the sort of shopping cart that you want, no matter how bespoke.

An article I came across this week talked about the same theme but in a different context. It's by Forbes' Gene Marks, and is titled What Salesforce.com Won't Tell You. The message here is that no matter what CRM solution you implement, you'll get results if you implement it well - and that means getting the right people to build/design it, getting your staff on board, and making sure somebody owns the system.

This applies to your website project, too.

  1. Ensure somebody at your company owns the website and makes decisions based on outcomes rather than technology
  2. Ensure you're working with somebody competent to build your website. Don't fuss over whether they use PHP, Django, Rails, or otherwise: defer to their experience (it's what you're paying for, after all)
  3. Embrace what you build.

My feeling is that when you're building something bespoke, #2 is the most important: make sure you work with somebody you trust. If you aren't able to let go of some control and let them make technical decisions for you, then your project is already doomed.

Of course there are exceptions here. If you're a Rails developer, just build with Rails (unless you want to try Django). If you know for a fact that a particular technology can't work for you, then don't use it. But if you aren't at all technical, then don't try to make technical decisions that impact your business: please find somebody who can make that decision for you.

After all, you wouldn't expect a web developer to tell you how to layout your retail store, would you?

The Definitive Answer To "Can I Use Django For This Project?"

Short: Yes.

Longer: Almost certainly. If you don't know any technical reason why Django isn't a good fit, then Django is probably a good fit.

WhisperGifts: The Tech That Let Me Launch

The WhisperGifts re-launch recently went very well! I promised a more technical follow-up with some details about what's new and what's changed, so if you want to know more about what makes WhisperGifts tick then you'll want to read on. Hint: It's a dash of Django, a pinch of Python, and a slathering of open-source software all around.

The primary technology behind WhisperGifts is Django, the "web framework for perfectionists with deadlines". My own deadline for this project was rather, ahem, flexible - the new design was a work in progress for 2 years (on-and-off, mostly off due to the birth of our baby) and the back-end re-write happened over a few months early in 2012.

Django allows us to iterate quickly. I find the language natural to use and the documentation is epic. There's a number of things that no framework does out of the box; I've tried to rely on open-source to fill as many gaps as possible rather than re-writing things from scratch like I did with the original WhisperGifts site - this is mostly because the open-source ecosystem around Django is now so much larger than it used to be.

As an example, originally I rolled my own authentication as the user management modules in early Django releases were rather inflexible. Building your own authentication is never a good idea, so I've migrated to using built-in Django logic. Similar changes are found throughout my codebase.

What I Use

Django, obviously. I use most of what comes with Django: The ORM and URL dispatcher, the included Admin, User and Cache apps, and more. Some might be interested to note that I don't use class-based views, simply because I don't see a need to change at this point.

Caching is done using Memcached and nested cache tags, as I've blogged about previously. I also use Django's site-wide caching middleware for anonymous users, which reduces load time for the marketing/static portions of the site.

Images are processed via the sorl-thumbnail package. I can generate thumbnails in any size on the fly. All of my images are stored locally - due to my current volume, the overhead of setting up a more formal CDN or even just using S3 isn't worthwhile.

Customer payments (for upgraded packages) are handled by PayPal. To interface with their IPN and to simplify the user-facing workflow as much as possible, I use django-paypal.

To track in-app metrics (such as number of signups, number of upgrades, number of new items) I use django-app-metrics and get a simple daily e-mail. I'm also testing out Mixpanel which although it isn't free lets me get much more detailed statistics for the same sorts of metrics. django-app-metrics even has a backend to automatically push data through to Mixpanel, so I might use that later on.

All e-mails are sent asynchronously (using django-celery) so they don't tie up front-end web serving. I deliver my e-mails via Postmark with the useful django-postmark library. All my outgoing e-mails include both HTML and plain-text components; I also embed a header image. In the geek world this is heavily frowned upon, but remember who my audience are: couples getting married and their wedding guests. Postmark makes these e-mails simple.

The front-end web server to all my websites is always nginx. It is small, easy to configure, does a wicked job of serving up any static files (both my own site static files and those customer images that have been uploaded) and integrates well with Django. To run Django for nginx I always use gunicorn managed by superisord.

My site-specific CSS and JavaScript files are hand-crafted during development then at runtime combined together effortlessly and minimised as much as possible using django-compressor.

To make sure that any gremlins are caught and dealt with, django-sentry catches any exceptions in my code and presents them in an interface that is incredibly useful: You can see which exceptions occur most often, what conditions trip them, and more.

In a similar vein I use django-timelog and occasionally review how long my views are taking to render in a live environment, django-debug-toolbar gives me similar data during development.

Bringing it all together

For much of the above all I need to do is pip install django-compressor and add the relevant code to my settings.py and templates. Very little of what I've mentioned above has changed the way I develop or deploy; they simply make life easier. The fact that I can pick up these bits of software (most of which weren't available 3-4 years ago) and use them off-the-shelf with some very minimal setup just makes me love Django development even more.

I therefore owe a big "thank you" to the Django community.

Previously I've manually written code to do many of the things I've mentioned above (and bad code, at that, given it's status as "helper code" rather than the main part of my projects). I owe a few people a beer or three.

DjangoSites Move Complete

For the past few years I've hosted all of my projects on a single RimuHosting VPS. It's old, running Debian 5.0, and maintenance and upgrades have become headachy. It isn't easy to run Python 2.7 on old Debian versions, and since I only started using virtualenv relatively recently, things were a real mess.

As such, it was time to upgrade the VPS. I made the decision to look around for other hosting options, and eventually settled on Prgmr as the performance-for-dollar ratio seemed much higher to me.

I've taken the opportunity to break things down into multiple smaller VPS units to try and keep maintenance somewhat simpler. I can pull a single site's server down without breaking any other sites, and I can upgrade components individually for a single site.

Prgmr's pricing makes this possible: Their 'base' is very low - you pay $4 for each VPS plus RAM.

Djangosites.org now sits on it's own 512mb Prgmr Debian 6.0 instance. With 128mb taken up with memcached and a small portion to nginx and the operating system, there's plenty of RAM left for my gunicorn worker processes.

So how did I move everything across?

Although it took me 3 days to finish, the actual effort involved was really quite low. I hope these tips help you with server migrations in the future!

  1. In advance, drop the TTL on your DNS records low. I went to 300 seconds, but on the wrong domain. Doh!
  2. Deploy the new server environment. I have a shell script that creates the relevant users, installs all the Debian packages I need, installs system-wide tools like PIP, installs system-wide Python libraries I need, installs VirtualEnv, installs & configures Postgresql & Memcache, etc.
  3. Put your old app in Offline mode. For Djangosites, this means changing the 'accounts' and 'submit' URLs to point to flatpages instead of their views.
  4. Create a virtualenv on your new server
  5. Copy your (now not-being-written-to) database & code to your new server
  6. Configure nginx for the new app, spin up your app instances
  7. From your desktop, change your HOSTS file to let you access the new server directly as if it were live. Test your siste.
  8. Revert your HOSTS file, and update DNS.
  9. Revert your DNS TTL.

Users with stale DNS will continue to see the read-only site on the old server, whilst those with sensible ISPs will see the updated site within a few minutes.

Upgrading at the same time

Because moving code is boring I decided to take the opportunity to upgrade everything to Django 1.4. Djangosites was, until this week, running on 1.3 but it was using a number of deprecated features that wouldn't port to 1.4. It was also using the Postgres 8.3 contrib package for tsearch, which has different syntax to Postgres 8.4's built-in tsearch.

Unsurprisingly, the codebase has dropped due to this simplification. It's also let me clean out the cruft, simplifying things even further.

I also used the move to point DjangoSites at my Sentry instance - previously I just relied on the built-in Django error e-mails. Using Sentry made it easier to catch a few quirky bugs that were leftover from the move & upgrade.


DjangoSites now has it's own VPS environment, standing alone from my other projects (which, incidentally, also have their own VPS environment). This means I can do maintenance on it without impacting any other projects.

I can also manage memory/RAM with less fuss: I just ask Prgmr to increase the ram for one VPS instance, rather than try and juggle the way each application within a single server fights for memory.

At present, even on only a 512mb VPS, DjangoSites has plenty of headroom. With some basic Blitz benchmarking we're good for north of a million hits a day on a commodity hosting account... obviously we're nowhere near that at the moment, but it's good to know we can grow into this new t-shirt.

My monthly hosting costs have also dropped, but overall I've got allocations of more RAM, bandwidth, and disk space.


DjangoSites now has it's own VPS environment, which means my operations time is going to increase. Any updates and patches to non-app code (such as Postgresql updates or OS security fixes) need to be done multiple times, not just once.

I also suck at remembering hostnames, and need to stop instinctively logging into a single host server. I'm sure I'll get over this pretty quickly, though.


I can't recommend Prgmr highly enough. Their prices are great but more importantly the VPS instances I've got are rock solid. They're fast (including the disk IO) and very responsive. My old RimuHosting VPS wasn't too bad, but the IO was certainly nowhere near as good.

DjangoSites Downtime - Server Move

Heads-up: DjangoSites.org will be down, at some point in the next week, while I move it to a new server. Expected downtime is approximately 1 hour to transfer files & database; if your ISP has DNS servers that don't observe TTL then you might not see updates for a while longer.

This is happening as I'm moving DjangoSites to it's own VPS to make it easier to balance available resources between it and my other projects.

Key-based cache expiration with Django

Last week, the team over at 37Signals wrote up an article on their newly implemented Key-based cache expiration system and it hit me: It's such a simple idea with obvious benefits, why hadn't I implemented a similar caching mechanism before? Being a Django user, the Rails code didn't make much sense to me but the concept certainly did - so here's my take on it with a quick Django example.


I've just implemented this caching strategy for WhisperGifts for a re-launch that will go live in the next few weeks. We allow couples to publish an online gift list, then let people select items from that list. Pretty basic stuff, but rendering the gift list can require n+1 queries due to the way that my purchase data is kept. This hasn't been a big issue until now, when I've built new functionality and generally just extended things a bit.

The cache strategy is so simple it's taken longer to write up here than it did to alter my existing codebase.

My basic model is as follows:

  • Registry, the top-level "collection" of items for each wedding.
  • Item, of which there are many for each Registry
  • Buyer, of which there are 0-n for each Item

With that in mind, read on...

Model Setup

The first task to do is make sure your models have a last_modified timestamp. Mine already did, but just in case you need an example:

class Registry(models.Model):
    name = models.CharField(max_length=100)
    last_modified = models.DateTimeField(null=True, editable=False)

def save(self, *args, **kwargs):
    from datetime import datetime
    self.last_modified = datetime.now()
    super(Registry, self).save(*args, **kwargs)

Next, make sure that whenever you save an Item or Buyer the last_modified change cascades upwards to the Registry.

class Item(models.Model):
    registry = models.ForeignKey(Registry)
    name = models.CharField(max_length=100)
    last_modified = models.DateTimeField(null=True, editable=False)

def save(self, *args, **kwargs):
    from datetime import datetime
    self.last_modified = datetime.now()
    registry = self.registry
    registry.last_modified = datetime.now()
    super(Item, self).save(*args, **kwargs)

Note: I'd highly recommend this gets taken care of in an abstract base class. The above is just a (bad) example.

Define your cache keys

At the top of models.py I defined a variable named DATA_SCHEMA_REVISION. For now I increment it whenever I make a change to my schema that should invalidate the entire cache - not that it's a regular occurrence.

On each model, now, define a cache_key property. For example, on my Item model:

def cache_key(self):
    return 'myproject/%s/item-%s-%s' % (DATA_SCHEMA_REVISION, self.id, self.last_modified)

Again, as a good programmer you'll put this in your abstract model, won't you?

Configure Django caching.

This isn't hard. I use memcached. YMMV.

Update your templates.

My existing template was along these lines:

<b>{{ registry.name }}</b>
{% for item in registry.item_set.all %}
    {{ item.name }}
    {{ item.expensive_calculation_and_rendering }}
{% endfor %}

The new, improved, fully cached version:

{% load cache %}
{% cache 9999999 registry registry.cache_key %}
<b>{{ registry.name }}</b>
{% for item in registry.item_set.all %}
    {% cache 9999999 item item.cache_key %}
    {{ item.name }}
    {{ item.expensive_calculation_and_rendering }}
    {% endcache %}
{% endfor %}
{% endcache %}

Notice that we're caching this for a very long time - because the cache key will change whenever our data is changed. If an Item changes, that item's cache key (and the parent Registry cache key) will be changed due to the new timestamp, and they'll be re-rendered. However when we re-render the outer Registry cache fragment, it's primarily made up of still-valid cache components for the other Items.

This is an incredibly simple way to keep your site nice and snappy. There's some fantastic discussion over on the original 37Signals blog post - I strongly recommend you read & understand that post as well as it's comments, then consider my examples above as a bare minimum implementation in Django.

How quick is it?

A brief update, 10 minutes after this post was written. I have done some very rudimentary tests and come up with the following numbers. Note that these are NOT representative, however the difference is very clear. Keep in mind this is running in a development environment with debugging enabled and various other slowdowns not present in production!

Cold cache 17.34 seconds
Warm cache 1.11 seconds

This is the time taken for Django to process the request form start to finish, from the initial HTTP request to returning a complete HTML page. There's plenty of other gains to make but as a starting point this is not too bad at all!

Small open-source release: django-menu

Many moons ago on this blog I wrote about a simple menuing system for Django. For the sake of convenience, I've just packaged up that code (plus a few minor improvements) into a package named django-menu which is also available via PyPi with pip install django-menu. Basic documentation is included in the package and in the git repository.

Please log any issues or suggested improvements via the GitHub issue tracker!


A while ago I released a helpdesk tool that I use to manage support requests, under the name of Jutda Helpdesk (named after my small consulting company). The project has received a slow but steady stream of patches and bug fixes, however it's always been a little tricky to manage with a single committer over at Google Code. To make life easier for everybody involved, I've renamed the project to django-helpdesk and shifted the source code and issue management to GitHub.

I thought I'd spend a few words talking about these two changes.

Firstly, the change in name. Because django-helpdesk was originally built for my own use internally at Jutda and WhisperGifts I released it under the name of "Jutda Helpdesk" when I opened the source up a few years back. There was a bit of a thought that I could release more products with a similar naming scheme, such as "Jutda Basket Weaver" and "Jutda Donut Maker". Those products never eventuated, leaving the only open-source product as "Jutda Helpdesk". This seems to have caused a bit of confusion, with people referring to the product simply as "Jutda" which does no good for either the helpdesk product or my business.

The next change was a move from Google Code to GitHub for the project site, including the source code management and issue tracking. The reason for this was twofold:

  1. For users to offer patches they had to log an issue with a patch and hope I could apply it sooner rather than later. By using GitHub, anybody can fork the project and begin making changes, meaning I don't need to be involved for people to share code changes.

  2. Every other Django project that I personally follow is on GitHub, and I prefer the git DVCS workflow over SVN. It seems most of the Django community is of a similar mindset, from what I can see.

So the project has been moved. I've migrated any issues that were open and that I feel need working on; most feature requests were culled out as I am not in a position to do custom development at the moment. Now that people can fork the project on GitHub I hope feature requests come in the format of pull requests or at least patches.

Lastly I've made a few other distribution improvements. The project is now on PyPi so you can install it using 'pip install django-helpdesk'. The listing over at DjangoPackages has also been updated so that you can see the PyPi downloads and mark yourself as a user of the package.

I would love to hear any comments or feedback you've got via e-mail or on Twitter (where I'm @RossPoulton. Enjoy!

DjangoSites Deployment Statistics

Every person that submits a site to DjangoSites gets a chance to include details about how they deployed their website: what database they use, what version of Django they use, and so on.

The aggregated statistics are now online for all to see. The deployment details for individual websites are not visible.

You can see the stats in a basic form along with some pretty charts on the DjangoSites website by clicking the Deployment Stats link in the navigation bar.

I'd love to hear any feedback you've got, please let me know your thoughts via e-mail or Twitter

Gracious E-Mail Bounce Handling in Django with Postmark

Recently I've been on a quest to simplify the way I deliver my websites to my customers. Not that my customers know: the primary changes are relating to server monitoring, being proactive about a few things, and getting rid of elements that I don't understand.

One of the things I really didn't understand that well and wanted to offload to somebody else was my email delivery. I already use a company called Tuffmail to handle my mailboxes (that is, for my actual email boxes - not email sent from within my Django applications) and I'm extremely happy with their service. I use and trust Tuffmail because I know I can't keep on top of everything I need to know to run SMTP and IMAP services properly and securely.

The next step in my outsourcing process was to find a company to deliver email sent from within my Django applications. Until now I've been running a local Postfix server used for delivering mail but not receiving it, and for most emails that works fine. As websites grow, though, it becomes harder to manage what happens with bounced email and also to guarantee delivery - many email providers seem more than happy to mark emails form virtual servers as spam!

Getting Somebody Else to Deliver Your Mail

My shortlist for outsourced SMTP came down to two companies: Sendgrid and Postmark. Both of these services charge a small fee to become your SMTP server for outgoing email with a number of great features.

Sendgrid, in particular, has some fantastic features that you usually only see from mailing list providers such as the ability to track e-mail opens and links within emails. Both providers deal well with unsubscribe requests and let you easily categorise emails so you can get statistics based on deliverability of signup emails compared to purchase receipts.

My final choice ended up being Postmark, primarily for two reasons:

  1. Postmark allow you to control your SPF & DKIM settings per-sender, so email looks as if it comes from who you say it comes from. Sendgrid only allow this on more expensive plans, which are overkill for my needs - the end result is that some e-mail clients display the sender as "Sendgrid on behalf of XYZ".
  2. Postmark allow you to configure multiple 'servers', each with different API credentials, to separate email delivery for each of your apps or websites. I want a feature like this to be able to see different statistics for WhisperGifts as opposed to DjangoSites.

Configuring Postmark

Getting started with Postmark is easy. Just sign up and follow the instructions to create a server, a sender signature and then configure your Django installation to use them as an SMTP server (which needs to be enabled per-server within the Postmark system). If you're on Django 1.2 there is an EmailBackend by Paul Martorana which you can use and gain even more features than using plain ol' SMTP.

One of the real benefits, as fast as I'm concerned, is managing your e-mail bounces. I've recently implemented this code over at DjangoSites to help deal with users who mistype their e-mail address. I currently get about 4-5 of these a week, and as the site gets more popular the volume of bounced email is growing.

Configuring Django to Handle Bounces via Postmark

I have created a Python file called postmarkBounces.py in my DjangoSites project folder. It's contents are as follows:

from django.core.mail import send_mail
from django.http import HttpResponse
from django.contrib.auth.models import User
from django.conf import settings

    import json
except ImportError:
        import simplejson
        from django.utils import simplejson

def postHandler(request):
    Gets POST'd data from Postmark's BOUNCE handler. We put a message against
    the user letting them know their email bounced.

    Requires HTTP Basic authentication with a username set in settings.POSTMARK_BOUNCE_PASSWORD.

    Configure Postmark to use 'http://mypassword:blank@www.mysite.com/handle_post/'

    Sample of the received JSON data in the HTTP POST body is:
          "BouncedAt": "2010-06-03T21:00:19.0155096-04:00",
          "CanActivate": true,
          "Description": "Test bounce description",
          "Details": "Test bounce details",
          "DumpAvailable": true,
          "Email": "john@example.com",
          "ID": 42,
          "Inactive": true,
          "Name": "Hard bounce",
          "Tag": "Test",
          "Type": "HardBounce",
          "TypeCode": 1
    authorised = False
    if request.META.has_key('HTTP_AUTHORIZATION'):
        (authmeth, auth) = request.META['HTTP_AUTHORIZATION'].split(' ',1)
        auth = auth.strip().decode('base64')
        username, password = auth.split(':',1)
        if username == settings.POSTMARK_BOUNCE_PASSWORD:
            authorised = True

    if not authorised:
        response =  HttpResponse('Authorization Required', mimetype="text/plain")
        response['WWW-Authenticate'] = 'Basic realm="Bounce Handler"'
        response.status_code = 401
        return response

    if request.method != 'POST':
        response = HttpResponse('Data must be provided via HTTP POST', mimetype="text/plain")
        response.status_code = 405 # Bad Method
        return response

    # The data we're interested in comes in via the body of the HTTP POST,
    # rather than as post parameters. As such we need to read in the
    # raw_post_data of the POST rather than using request.POST.
    raw_json = request.raw_post_data
    data = simplejson.loads(raw_json)
    email_address = data.get('Email', None)
    bounce_type = data.get('Type', None)
    bounce_desc = data.get('Description', None)

    # We only write a message if the bounce is a 'Hard Bounce', this means
    # we aren't bothering the user for transient errors such as DNS lookup
    # failures, offline mail servers, or full mailboxes. Of course, if
    # these errors persist eventually we'll get a hard bounce and let the
    # user know.
    if email_address and bounce_type == 'HardBounce':
            user = User.objects.get(email=email_address)
            # Using messaging system from Django pre-1.2.
            user.message_set.create(message="An email to %s seems to have bounced back to us - the response we got said '%s'. Can you please check it by clicking 'Account Settings'?" % (user.email, bounce_desc))
            # Not the best way to do this - however we don't want to
            # raise an error if the user doesn't exist (eg we are getting
            # a bounce for an email not directly related to a user in
            # our system)

    # All Postmark needs to get is a response, any response. No templates or
    # fancy content needed here!
    return HttpResponse("OK")

You will note that this refers to a particular settings called POSTMARK_BOUNCE_PASSWORD - configure a random password in your settings.py file:


Due to the way HTTP Basic authentication works (which we'll be using later) you should avoid punctuation in this password.

Lastly, set up urls.py to point a URL to our new bounce handler:

(r'postmark_bounce/', 'postmarkBounces.postHandler'),

Reset your Django application (if required - depending on your deployment method) and you're good to go.

Telling Postmark To Tell Us About Bounces

Over at Postmark, log in to your control panel and go to the 'Settings & API Credentials' page for your server. Note that each Postmark 'server' has it's own settings screen, so bounces are handled differently per-application.

In the 'Bounce Hook' field, tell Postmark the full HTTP URL to your bounce handler, including the password you entered earlier:


It should look something like the below screenshot:

Click the 'Check' button and you should see a message letting you know that the response code was 200 - all is OK! If you get an authorisation error (HTTP error 401) then you have entered an incorrect password. If you see a HTTP error 500, check out the e-mail Django should have sent to your server administrator for more details.

Show The User

The last piece of the puzzle is to alert the user when the email has bounced. Over at DjangoSites, I do this by using Django's messaging framework (it's a Django 1.1 installation - the syntax for this has changed in 1.2) as seen in the above code snippet. Note that I don't force users to confirm their email address before they log in - if your users have to do that, you'll need to be more creative with your notification mechanisms!

Add some snazzy CSS and you're good to go. This is what a DjangoSites user will see while browsing the site if we realise that emails to them have bounced:

This is pretty unobtrusive, and it is only shown to the user once. If you want to be stricter about enforcing a valid e-mail address, just modify the way you store the 'failed email' flag and report it to the end-user.

In Closing...

I moved DjangoSites to Postmark about a fortnight ago, and configured the e-mail bouncing in less than an hour last week. That's not a long time to a solid comparison, but I can already report that I am happy with the service I've received from Postmark. My other Django sites including Jutda and WhisperGifts are already sending e-mail via the service, and the per-website reporting is wonderful.

If you want to do something more complex with your bounces, then I suggest you take a look at the Postmark Bounce Documentation which explains a full API to review bounces as needed.

Seeing fewer delivery errors come into my personal inbox as a result of the Postmark service is a welcome change. The fact that the process of alerting the user and fixing their account is now automated means I save a bit of time and as my websites grow the time saved will continue to grow. As a small business owner who continues to work a full-time "day job", I can only see that as a good thing.

Update Your DjangoSites Screenshots When You Want

You can now update your DjangoSites website screenshot by simply editing your listing and ticking the box that says "Re-do Screenshot".

To edit a website, first log in to DjangoSites using your username or OpenID. Once logged in, click 'My Sites' in the navigation bar, then click on the website you wish to have updated.

From the website detail page, click the 'Edit' link next to your username. Make any changes you wish to make, click the Re-do Screenshot checkbox, and click 'Save'.

It's important to note that when you edit a website, it becomes un-verified so it will not be publicly visible. I manually review all submissions and edits, so it might take me a few days to verify and publish your listing after you edit it.

This is something that has been on the books for a while, and I'd like to thank Richard Barran for prodding me into getting the change done.

While you're editing your listing, you can update your deployment details to help with our Django Deployment Survey. The results will be published shortly for all to see - thank you to those who have responded so far!

Django Site of the Week: The A.V. Club

For various reasons I haven't been able to post a Django Site of the Week for quite a while now, which is a little embarrassing. I've had this little interview sitting in my inbox for over a month, and I've only just been able to write it up and get it published.

Just about everybody online has come across The Onion at some stage. Whether you have interpreted a satirical news story as the truth, or just read it for a laugh, The Onion is a staple of online publishing.

A less-satirical yet still enjoyable feature of The Onion is The A.V. Club. Originally part of the printed Onion newspaper and then a PHP-powered Drupal based website, The A.V. Club is now running on Django - and the team who built it love Django to bits.

This week, I spoke to Tom Tobin, web developer at The Onion. We discussed how they built the A.V. Club website, how they manage with a million unique visitors a month, and what their plans are for the future of The Onion's online presence.

You can read the interview and leave your comments at the Django Site of the Week Website.

The Django Deployment Survey: How Are You Hosting Django Websites?

According to the official Django Documentation, there are around a half-dozen documented and supported ways of deploying Django in production environments. The recommended way these days is with Apache and mod_wsgi, but there are still a whole bunch of decisions that one must make when it comes to making their Django project public.

One of the great things about running DjangoSites is that it has exposed me to a number of fantastic Django-powered websites. Each of these is unique in one way or another, but they are all running Django. How they use it, though, is anybodies guess.

So what are our peers doing? There are proponents for and against each of the different pieces of the Django Deployment puzzle.

  • What operating system should be used? BSD, Linux or Windows?
  • What Database system should be used? Postgres, MySQL, or something else?
  • Which web server software? Apache, nginx, or one of the many python-powered web servers?
  • Which method should I use? mod_python, FastCGI, or something else?
  • Which version of Django? Do I stick to a release, or try and keep up with SVN?

To see what our Django-using peers are doing, I am starting a survey of Django website deployment methods. This will be an ongoing addition to DjangoSites.

The way it works is pretty straightforward. As of tonight, there are a number of new fields on the 'Submit a Site' and 'Edit' screens to let you select the way you have deployed your Django-powered website. I will not be publishing these details on a site-by-site basis, so you can rest easy knowing that I won't tell the world that your website is hosted on Windows with an Oracle database backend.

Once there are a material number of responses to those questions, I will publish the statistics in aggregate with some pretty colourful charts. These graphs will remain public (and dynamic - updating as the data grows) once the volume of submissions provides meaningful data.

What will this give us? An interesting look at the way that the wide public deploy their Django applications, and potentially longer-term trends showing the usage of particular deployment methods as time goes on.

Want to be involved? I'd love to hear your feedback and see as many sites as possible have their deployment details listed.

If your site is already listed at DjangoSites, you can simply log in and click 'My Sites' in the toolbar. Open each site that's listed, and click the 'Edit' link. Alternatively, if you have a whole bunch of websites you want to update, just email me your DjangoSites username and deployment details and I'll do the update for you.

Don't despair if your websites aren't listed at DjangoSites yet - you can simply sign up then submit your websites for free! Remember, the deployment details are optional so you can list your website without being included in the aggregated statistics if you prefer.

Lastly, I would like to thank Joshua Jonah of Symbiosis Marketing for planting the seed for this idea. I'm hoping it'll give an interesting insight into more of the 'behind the scenes' details of Django-powered websites.

Django Site of the Week: Deskography

Seeing how other people work is something that seems to be of interest to most developers. Whether it's because they want to become better workers themselves or because they're somewhat voyeuristic is open to debate - either way, Django-powered website Deskography is a well-designed social desk-sharing website. This week, I spoke to Gustaf Sjöberg of Distrop to find out why they chose Django to power Deskography, and what it's allowed them to do. You can read the interview over at the Django Site of the Week.

Full-text searching in Django with PostgreSQL and tsearch2

Recently, I have wanted to improve the searching ability in a number of my projects. I've often added a search box that simply does an icontains filter, which works fine for small sites but doesn't scale due to the inefficiency of matching text with basic SQL queries. The plus side of an icontains filter is that it works on all of Django's database backends (with the exception of SQLite, which works but in a case-sensitive way).

All of my projects run on PostgreSQL, an open-source database package that runs beautifully everywhere I've tried it. If you are familiar with MySQL, Postgres shouldn't be a problem for you.

When you want to do full-text searching from within your Django projects, you have a few different choices available to you, mostly depending on which database backend you're using:

  • MySQL has built-in fulltext indexing which can be used in Django with some basic ORM extensions.
  • PostgreSQL prior to 8.3 has a 'contrib' module known as tsearch2, which allows full-text indexing within your SQL queries. This is what I am going to detail below.
  • PostgreSQL 8.3 and later has tsearch2 built-in, which is usable in Django if you follow these directions from Barry Pederson
  • You can install Sphinx to index your database content (on either MySQL or Postgres), then use django-sphinx to get access to your Sphinx indexes within the Django ORM. Since this can work over a network, it may be a better option for massive datasets as you can run your database and index on separate machines.
  • Similar to Sphinx is solr, which can be used with a Django ORM addon as documented in this blog post. I believe solr works with both MySQL and PostgreSQL, but I haven't investigated it further.

I'm using PostgreSQL 8.1, and decided not to rely on external software to maintain my index due to memory limitations. Therefore my choice was simple: I utilised the tsearch2 contrib application for PostgreSQL.

Note: All of the instructions in this blog post relate to my setup on a basic Debian 4.0 system with a relatively default PostgreSQL configuration. PostgreSQL and Django were already installed and running, and I already had my models in place. My models also had data in them already as I was upgrading an existing website. YMMV.

The first thing I had to do was install the [postgresql-contrib-8.1](http://packages.debian.org/etch/postgresql-contrib-8.1) package, which provides a number of third party addons to PostgreSQL including type extensions for ISBNs, benchmarking, various full-text indexing tools, and XML manipulation tools. The only element out of these that we'll be using is tsearch2, described as "Searchable text data type supporting indexed access." The PostgreSQL contribs are similar to those in Django: If you don't actually use them, they'll just sit there taking up nothing but hard-drive space. This means the extra contribs that I am not using shouldn't cause any compatibility or runtime errors.

sudo apt-get install postgresql-contrib-8.1

Once the package is installed, on Debian I was given a bunch of new files under /usr/share/postgresql/8.1/contrib/, including a file called tsearch2.sql. You can take a look at it if you like: it provides a number of new tables, functions and data types once it's imported into a database.

The next step is to do just that: Import the tsearch2.sql file into my database. In this case I'm working on the DjangoSites database, so within my djangosites folder I ran ./manage.py dbshell.

$ cd djangosites
$ ./manage.py dbshell
djangosites=> \i /usr/share/postgresql/8.1/contrib/tsearch2.sql
djangosites=> \q

Running this command as my restricted djangosites user threw up an error immediately, and because the script utilises transactions the rest of the changes were rolled back. To get around this I had to run the import as the postgres superuser. Alternatively, I could give the djangosites user a higher level of access.

$ psql -U postgres djangosites
djangosites=# \i /usr/share/postgresql/8.1/contrib/tsearch2.sql
... (a few hundred CREATE TABLE, CREATE FUNCTION, and INSERT commands)
djangosites=# \q

My djangosites database is now able to recognise the commands used by tsearch2, and a handful of new database tables have been created:

$ ./manage.py dbshell
djangosites=> \dt pg*
            List of relations
 Schema |     Name     | Type  |  Owner
 public | pg_ts_cfg    | table | postgres
 public | pg_ts_cfgmap | table | postgres
 public | pg_ts_dict   | table | postgres
 public | pg_ts_parser | table | postgres
(4 rows)
djangosites=> \q

I now need to create a new field on the table I wish to index and search to hold the full-text search keywords and index. This is pretty straightforward, and needs to be done once for each table you'll be searching on:

$ ./manage.py dbshell
djangosites=> ALTER TABLE websites_website ADD search_tsv TSVECTOR;
djangosites=> CREATE INDEX search_index ON websites_website USING gist(search_tsv);
djangosites=> \q

This simple command added a new field called search_tsv to my websites_website table and created an index on that field. The type, TSVECTOR, is only available if we successfully imported the tsearch2.sql file earlier. The name search_tsv is arbitary, I chose that name so when inspecting the schema later I'll be reminded it's a tsvector field and not for general use.

In my example I want to include four fields in the search index: the website Title, URL, Description and Tags. Because my table is already populated with plenty of data, I need to update the index for each existing row. If you are working with a fresh database, this won't be needed.

This first example will build the search index out of a single column, if you only have one column to index:

$ ./manage.py dbshell
djangosites=> UPDATE websites_website SET search_tsv=to_tsvector('default', title);
djangosites=> \q

If, like me, you have multiple columns you want to match on, then you need to use the COALESCE command while combining them to ensure columns with no data are handled without errors:

$ ./manage.py dbshell
djangosites=> UPDATE websites_website SET search_tsv=to_tsvector('default', COALESCE(title, '') ||' '|| COALESCE(url, '') ||' '|| COALESCE(description, '') ||' '|| COALESCE(tags, ''));
djangosites=> VACUUM FULL ANALYZE;
djangosites=> \q

Running this command gave me an error, as my djangosites user did not have permission to read the pg_ts_* tables created during the import earlier. To fix this, I jumped into the console as the postgres user again and granted SELECT access to the relevant tables. I could immediately run the UPDATE command above after doing this.

$ psql -U postgres djangosites
djangosites=# GRANT SELECT ON pg_ts_cfg, pg_ts_cfgmap, pg_ts_dict, pg_ts_parser TO djangosites;
djangosites=# \q

The UPDATE command populated my search_tsv field with a mass of data taken from the columns I fed into it. If you run a SELECT on this column now you'll see that search data:

$ ./manage.py dbshell
djangosites=> SELECT title, search_tsv FROM websites_website WHERE slug = 'www-rossp-org';
         title          |                   search_tsv 
 Ross Poulton's Website | 'wed':20 'blog':10,27 'info':17 'ross':1,6 'howto':15,28 ....
(1 row)
djangosites=> \q

So we have built our initial search index and loaded it up with data from the existing database records. But what about when somebody comes along and inserts a new row, or modifies an existing row? We need to make sure the index is updated so that the search remains useful. To do this, we create a database trigger.

$ ./manage.py dbshell
djangosites=> CREATE TRIGGER websites_website_tsvectorupdate BEFORE INSERT OR UPDATE ON websites_website FOR EACH ROW EXECUTE PROCEDURE tsearch2('search_tsv', 'title', 'url', 'description', 'tags');
djangosites=> \q

You can now try inserting data into your table with a simple INSERT query, then SELECT that row to ensure the search_tsv column is populated. If you get an error on the insert stating ERROR: could not find tsearch config by locale, then you need to update the tsearch2 configuration to match the locale that your system is using.

$ psql -U postgres djangosites
djangosites=# SHOW lc_collate;
(1 row)

djangosites=# UPDATE pg_ts_cfg SET locale='en_US.UTF-8' WHERE ts_name='default';
djangosites=# \q

We've now reached a point where all existing data is indexed, and all new or updated records are indexed automatically. This data isn't very useful as it is, though. We can now make use of a handful of nifty commands to actually run our full-text search against our data set:

$ ./manage.py dbshell
djangosites=> SELECT title FROM websites_website WHERE search_tsv @@ to_tsquery('melbourne');
 FC Melbourne
 Ross Poulton's Website
 Jutda - Show The Way
(3 rows)

djangosites=> SELECT HEADLINE(title, q), RANK(search_tsv, q) AS rank FROM websites_website, to_tsquery('hosting') AS q WHERE search_tsv @@ q ORDER BY rank DESC;
                        headline                         |   rank
 Libre <b>Hosting</b>                                    | 0.0865452
 TopHost (<b>hosting</b> service provider)               | 0.0865452
 Fileducky - easily upload to multiple file <b>hosts</b> | 0.0865452
 Compare VPS <b>hosting</b> plans                        | 0.0827456
 Bitbucket.org - Mercurial <b>Hosting</b>                | 0.0827456
 Chide.it - We Make Internet Applications                | 0.0759909
 Kyron                                                   | 0.0759909
 ElasticHosts                                            | 0.0759909
 excess.org                                              | 0.0607927
 GetHotShots                                             | 0.0607927
(34 rows)

djangosites=> \q

As you can see, the HEADLINE() and RANK() commands come in quite handy. HEADLINE() will highlight the matching words with HTML <b> tags, and RANK() will provide a ranking that can be used to sort the items from most to least relevant.

Something interesting to note in the above hosting example is that a search for hosting also matched the word host and hosts. This fuzzy searching is configurable in tsearch2, but is beyond the scope of this blog post (and, to be honest, the defaults work perfectly for me in my English-based websites).

The last piece of this puzzle is to make use of our search within our Django view. Assuming we have a variable called q with our search terms, we can use the extra() command in Django's ORM. Note that we turn the space-separated keyword list into a pipe-separated list, so all searches will OR words together by default:

keywords = "|".join(q.split())
results = Website.objects.select_related().extra(
        'snippet': "headline(websites_website.description, to_tsquery(%s))",
        'rank': "rank(search_tsv, %s)",
    where=["search_tsv @@ to_tsquery(%s)"],
    select_params=[keywords, keywords],

We're now left with results, a standard Django QuerySet for the Website model with two new fields: snippet contains a section of the description column with the search terms highlighted (avoiding XSS during template display is an exercise for the reader) and rank, a numeric ranking of the usefulness of each row returned. We've sorted it by the rank, so that the most likely search results are presented at the top of the list. You may want to add this code into a custom manager to provide quicker access to it, although it will work fine in a view as it is.

You can pass results into your template, or even into a generic view that makes use of pagination. Your website search will be more accurate so it'll be more useful to your visitors, and it'll place less load on your server. Enjoy!


To get an idea of how efficient this method of searching is, take a look at these two examples. The first is utilising the LIKE command, which Django defaults to for it's icontainssearches. The second is utilising the tsearch2 index explained above. There are less than 2000 rows in this table, and the difference in both query time and results returned is staggering.

$ ./manage.py dbshell
djangosites=> \timing
Timing is on.
djangosites=> SELECT title,url FROM websites_website WHERE title LIKE '%hosting%' OR description LIKE '%hosting%' OR url LIKE '%hosting%' OR tags LIKE '%hosting%';
(19 rows)

Time: 13.892 ms
djangosites=> SELECT title,url FROM websites_website WHERE search_tsv @@ to_tsquery('hosting');
(34 rows)

Time: 5.921 ms
djangosites=> \q

If I run each command twice in quick succession, the LIKE query drops to about 11ms and the tsearch2 query drops to 0.9ms. Given most searches don't happen twice in quick succession, I'm seeing a 50% improvement in query times. As the number of rows grows, I would expect the LIKE query to take longer whilst the tsearch2 query should stay relatively stable.


I made use of the following excellent resources whilst setting this up for myself and writing this blog post. They're strongly recommended as further reading:

  • Barry Pederson's blog post *PostgreSQL full text search with Django*, which assumes PostgreSQL 8.3 is being used, is very useful. The Django code in my blog post is from Barry's post.
  • The DevX article *Implementing Full Text Indexing with PostgreSQL* is a non-Django-specific article covering the installation of tsearch2, configuration and use in an SQL context.
  • The official tsearch2 documentation:
  • The tsearch-v2-intro document covers generalised use and explains how various parts of tsearch2 work under the covers
  • The tsearch2 guide goes into more detail and covers configuration of items such as the fuzzy searching, weighting the rankings, and more. It's essential reading if you're not working in English or if you have complex models to index.

Django Site of the Week: ForecastWatch

Eric Floehr is the man behind ForecastWatch and ForecastAdvisor, two Django-powered weather websites that aggregate and analyse weather forecasts to compare their accuracy on an ongoing basis. This week, I spoke to Eric about the history behind his sites, how he handles massive data sets, and his conversion from Ruby on Rails to Django. You can read all about it over at the Django Site of the Week.

This weeks' SOTW was delayed by a day due to our Australia Day celebrations on the 26th of January. I figured if they can delay an international Tennis match for fireworks, I could delay this as well :)

Django Site of the Week: ShutterCal

ShutterCal is a photo sharing site with a difference - it encourages contributors to upload one photo every day to help people reach a goal: whether that may be to improve their photography, watch themselves evolve over a year, or remember people and places they visit. ShutterCal started in 2007, and this week I spoke to developer Dan Ferrante to find out why he chose Django, some of the challenges he's faced, and how he uses Django and other open-source software to make his life easier. You can read more over at the Django Site of the Week.

Django SOTW Moving to Mondays

To better fit with both my own way of doing things and with general web browsing patterns of the Django SOTW public, I'm moving the SOTW to Mondays rather than Saturdays.

You can expect new interviews to go live on Monday evenings (Australian EST) which means our US visitors should see them in their RSS feeds first thing Monday morning.

Also, I've finally succumbed to the forces and I've joined Twitter so you if that's your kinda thing then come and say hi!


Want to see more? Check out the yearly archives below.