Just came across this nice looking project: RediSearch. It adds on top of Redis a full-text search engine capable of incremental indexing, fuzzy searching (for autocomplete) as well as numeric range and geo searching. It has a bunch of libraries ready to go in various languages too. Looks to be an interesting alternative to heavier weight frameworks like ElasticSearch or Solr.

The JRediSearch library is still in early stages and not on maven central. It’s also a bit ropey to be honest; for example, the pom creates an assembly that includes junit (!) but without a required library. So, you have to clone the repo, then to make it usable, you have to put the latest Jedis snapshot (that comes with it, but isn’t on maven central yet) into your local repo so that the jredisearch library will work at runtime.

git clone https://github.com/RedisLabs/JRediSearch.git
cd JRediSearch
mvn install
mvn install:install-file -DgroupId=redis-clients -DartifactId=jedis -Dversion=3.0.0 -Dfile=lib/jedis-3.0.0.jar -Dpackaging=jar

Once you’re there, you can run up the RediSearch Docker image to have a play.

To spin up the Docker container and be able to access it locally, you can simply attach it to your local host’s network (or if you prefer faff around with redis config files; but for a fiddle it’s much easier to attach to the host):

docker run -p 6379:6379 --net=host redislabs/redisearch:latest

You’ll see it all start up and you can check you can connect by running redis-cli (make sure you stop any local redis servers first!).

I grabbed a bunch of data from the Reuters Dataset and threw together a regex parser (ugh, XML - but I’ll forgive as it’s a 20 year old project). I parsed out the ID, Title and Body of 925 articles from the first file in the dataset (reut2_000.sgm).

I then threw together this Kotlin class that delgates to the JRediSearch library:

class JRediSearchKotlin(val articles: List<Article>) {
    fun index(): Int {
        val client = getClient()

        val sc = Schema().addTextField("title", 2.0)
                         .addTextField("body", 1.0)

        client.createIndex( sc, Client.IndexOptions.Default() )

        articles.forEach { a ->
            client.addDocument( a.id, mapOf( "title" to a.title, "body" to a.body ) )
        }

        return articles.size
    }

    fun query( query: String ): SearchResult? = getClient().search( Query(query).limit(0, 5) )
    fun drop() = getClient().dropIndex();

    private fun getClient() = Client("reuters", "localhost", 6379)
}

It’s not doing a great deal, but you can see the index() method ensures the index exists then adds each document to it. As an exception is thrown here if the index already exists, this would need to be altered if we were going to do incremental indexing. The query() and drop() methods are one-liners to the library.

I tried timing the results to see how it does; it’s not particularly scientific at the moment - I just wanted to see what sort of time it was taking. We can use the funky Kotlin measureTimeInMillis function to do just that:

var searchResult: SearchResult? = null;
time = measureTimeMillis { searchResult = jrs.query("bilateral") }
println( searchResult!!.docs );
println( "Took ${time} milliseconds" );

The query bilateral only exists in one document so getting a query time of 255 milliseconds doesn’t seem especially great. Admittedly, my old Dell laptop is starting to struggle these days (even the rubber seal around the screen is trying to take a vacation from the rest of the machine) and this is by no means scientific; there are undoubtedly JVM warmup issues and it is making an HTTP call to an external service also. As it’s getting late, I’ve not had chance to try adding more data, but it would be interesting to check how it scales as we add more data from the Reuters dataset.

The Redis module says it also produces fuzzy search results suitable for autocompletion and is able to provide range filtering on numerical and geographical datasets. Well, I didn’t get around to trying that yet but at least I have the scaffolding to do so now. I think one of the interesting parts is also the incremental building of the index. There is a benchmark available on GitHub.

Feel free to check out this simple Kotlin code on GitLab. What do you think to this Redis module?