Consider Sphinx for your search needs

Intro

For most entities building a website, search is not really a consideration. Consideration in the sense that for your search functionality you intend to simply query your database backend for the results you need.. like everyone does.. right?

Certainly for most use cases, the power of modern database backends mean that specific search software is not, and will never be a requirement.

Everyone builds out software intending for it to become popular but unless your software is going to need to query millions of rows using complex queries, considerations like this are not neccesary. That said, if you are unsure of the potential of your product it may be worth considering this now because, as with all things it will be significantly more difficult to integrate into a legacy project down the line.

What is Sphinx?

Sphinx is an open source full text search server. It is extremely powerful, easy to setup, and has a well documented, well architected PHP API for you to use.

It is used by Craigslist as well as many many other entities, small and large.

Use case

There are many use cases for software such as Sphinx. The most apparent use case in my opinion is one that I have utilized Sphinx to resolve - querying large, complex data sets.

If for example you have a denormalized database architecture (for good reasons) and you need to produce search functionality that queries many tables for millions of rows, Sphinx may well be a suitable answer. You have denormalized your database with good reason, and the only respect in which your architecture is lacking is in its ability to be searched. What can you do?

An extremely complex mySQL query for example might take seconds or even tens of seconds. If you want to provide a good user experience, you cannot keep your user waiting for that long.
Instead you can index all of this data on a Sphinx server (running independently of your web/database server(s)) and query it quickly using the provided API.

I implemented this such that 4 million rows could be queried in a negligible amount of time, where negligible = milliseconds.

Issues

The most apparent issue with indexing your data and searching it is that your indexed data quickly becomes outdated. Sphinx fortunately can execute delta indexes which only index changed data. You can for example run a full initial index and then run delta indexes every 15 minutes. You can alter your usage based on your requirements - if you have regularly changing data, you may want to run delta indexes more often.

Conclusion

The above is an abstract look at Sphinx search in relation to a personal implementation of it. To get down to the nitty gritty I suggest you take a look at the latest documentation. I highly recommend the product. It is well documented, and supported, and is actively developed.


Thomas Clowes

Thomas Clowes

I am a 28 year old software engineer from the United Kingdom. During the day I build multi platform applications. In my spare time I eat food and run marathons. Sometimes I write angry tweets.