Productive Rage

Dan's techie ramblings

The Full Text Indexer Post Round-up

This is a compilation of links to articles outlining some of the details of the Full Text Indexer project I put together, just so I could point a link to everything all in one place (like from the BitBucket ReadMe!)

I wrote about the basic building blocks of the Index Generator, went off on a few tangents about how using different key types could allow for searches over data with multi-lingual content (or support Product data that has different descriptions for different web sites, for example) and then came back round to illustrate how I've used the code for this blog's search functionality.

Along the journey, I got to learn a few new things, take advantage of other's research and have fun trying to improve the performance of some of the bottlenecks in the index generation process.

I also had a chance to revisit the basic immutable list structure that I used from the get-go in this project and improve its performance characteristics as well (again, taking a lot of inspiration from cleverer people who've tackled the same problems before me! :)

The code can be found in the Full Text Indexer BitBucket Repository. I've still got a few ideas I'm contemplating toying with - but I've also got other projects I want to investigate! So we'll just have to see what happens with this next..

Update (5th March 2013): I just can't seem to let this lie! :) I've added another post The Full Text Indexer - Automating Index Generation which demonstrates some new code that will examine your source data type and generate an index for you, all on its own! Easy! (Added to the list above).

Update (14th March 2013): And another! This time about support for structured querying, a way to combine terms with AND, OR, NOT operators. See The Full Text Indexer - Structured Queries. (Added to the list above).

Update (28th March 2013): Documenting an extension to the index data that allow for more performant consecutive term matching: The Full Text Indexer: Source Locations. Followed by a way to utilise this information for Search Term Highlighting with Source Locations. (Added to the list above).

Update (25th July 2013): Inspired by the "The 10 Megabyte Manifesto" and NeoCities, I've developed a way to consume search index data with JavaScript to enable a copy of this blog to be hosted where the searching is done entirely client-side. Read about it at The Full Text Indexer goes client-side! and see it in action live at productiverage.neocities.org! (Added to the list above).

Update (30th July 2013): A follow-up to the "The Full Text Indexer goes client-side" describing how the search index data can be compressed to take up less space on the host: JavaScript Compression (Putting my JSON Search Indexes on a diet). (Added to the list above).

Posted at 18:06