How to build simple caching with Redis

What is caching with Redis about?

Before going in-depth and explaining how we’ve build simple caching with Redis, let me explain what catching is about, for those of you who haven’t reached this post on a geeky purpose.

Let’s say you’ve just bought the latest Arctic Monkeys LP. You come back from the record store, put it on your turntable, spin it, and then put it back on your record shelves. A few albums laters, and you decide to listen to it one more time. Bummer, you need to stand up the couch, go back to the shelve, and dig through your collection to find it and play it again. Then bring it back. Alternatively, you can create a small record stack besides your turntable, and keep the most recently played records here (for a week max), saving time on each play. That’s basically what caching is about.

caching with Redis

In software development, that’s a common practice, and there are many ways to handle this. Until recently, we were using memcached to cache queries over the musical knowledge graph powering seevl.fm. Yet, after a while, accessing it took longer that executing the queries. Instead of tweaking the config, we decided to implement a simple caching with Redis, especially as we were already using it for query-indexing, and others have reported great performance using redis as a cache.

Using Redis as a simple cache system

I’ve already written about why we love redis as a NoSQL store, as does Instagram. As a simple key-value store, it is natively well suited to cache and retrieve query results with SET and GET, using an MD5 (or any other hash) of the query as a key.

redis 127.0.0.1:6379> SET query_id query_result_string
OK
redis 127.0.0.1:6379> GET query_id
"query_result_string"

Obviously, for a proper caching, an expiration date is needed, and that’s where redis’ EXPIRE is useful.

redis 127.0.0.1:6379> SET query_id query_result_string
OK
redis 127.0.0.1:6379> EXPIRE query_id 10
(integer) 1
redis 127.0.0.1:6379> GET query_id
"query_result_string"
(... wait 10 seconds ...)
redis 127.0.0.1:6379> GET query_id
(nil)

Those 3 commands are – almost – all you need to build a simple cache system with redis. Nice, indeed?

Artist Music Discovery

Redis cache: the Python way

Since our back-end is built with Python, we’ve implemented the following RedisCache using redis.py.

The set and get methods are self-explanatory, and flush uses an additional prefix parameter, as explained below – in addition to the default cache| one, used to differentiate cache values from other key-values in the database.

But the most important part is the use of redis transactions (MULTI/EXEC) through redis.py’s pipeline:

  • First, no operation can happen between the key-value assignment and defining their expiration timeout;
  • Then, as the pipeline is atomic, the key and its expiration timeout are set together – or not at all if one command fails.
self._redis.pipeline().set(key, cPickle.dumps(value)).expire(key, timeout).execute()

Caching and flushing SPARQL queries results

To apply this to the SPARQL queries that power our platform, we generate an MD5 for every query string, and check / set the cache before calling our Virtuoso graph database. The full query result is stored as a value, and different timeouts are assigned depending on the kind of query: Things like the rdf:type of an entity change only in exceptional cases, while social data (e.g. taste based recommendations) are more dynamic.

caching with Redis

seevl’s personal library and related recommendations

On the other hand, most entity’s properties (like the record label of an artist, etc.) might change when we extract data from Web sources in our ETL pipeline. To make the cache – and its flushing part – more efficient, the entity ID is used as an additional prefix. For example, the query below retrieves all Blur’s genres.

SELECT DISTINCT ?object ?prefLabel WHERE {
   mo:genre ?object 
  OPTIONAL { ?object skos:prefLabel ?prefLabel . }
}

Instead of just using an MD5 for the key (cache|52f606b6c59793478754f466d4aaa3eb), we use the following

cache|entity:9Uy2yWYs-sparql-52f606b6c59793478754f466d4aaa3eb

It’s then super easy to purge the cache only for queries related to this entity, in case they haven’t been removed already via a timeout expiration.

from redis import RedisStrict
from rediscache import RedisCache

redis = StrictRedis(host='localhost')
RedisCache(redis).flush('entity:9Uy2yWYs-sparql')

As per our flush method definition, the redis DELETE command is applied only to keys matching the cache|entity:9Uy2yWYs-sparql* pattern, effectively removing the cached values of all Blur-related queries (i.e. where Blur is a subject in the graph). That way, our cache is efficiently purged each time we crawl new data about Blur, without impacting other key-values.

Caching in action

Implementing cache is standard for Web applications, and this is what helps us to deliver a faster experience on seevl.fm. So, even though some queries may be long to compute (imagine a huge record collection where you need a minute to find your favorite LP), we generally deliver results (such as artist fact-sheets) promptly to you.

If you want to see it in action, go to seevl.fm, browse artist pages, and enjoy hours of free music discovery!
And if you’re using redis for similar use-cases, let us know your experience in the comments below.

Artist Music Discovery

Follow use for regular updates:

Co-founder of seevl. Love music, love data. Follow me on Twitter @terraces.

Tagged with: , , , , , , , , , , ,
Posted in Engineering, Products
  • Phil

    My boss just implemented a redis cache to store some application specific metadata that was slowing down our site significantly. Glad to see someone else doing something similar with redis.py. Loved the writeup btw.

    • http://seevl.net/ Alexandre Passant

      Thanks – can you share your use-case? Do you have any tech blog about it?

  • Pingback: How to build simple caching with Redis - It's a...

  • Sunny

    Interesting article! can you reveal some stats on scale of your app, like number of users?

    • http://seevl.net/ Alexandre Passant

      So far, we’ve about 5K signed-up users, and a total of 80M data points, most of them being music-related (e.g. artist label or genre artists), and are stored in REDIS for caching as explained in this post.

      Glad you enjoyed the post!

  • Roman

    You can use http://redis.io/commands/setex instead pipeline

    • http://seevl.net/ Alexandre Passant

      Thanks – I didn’t know this one, that makes things simpler indeed! I’ll update the post accordingly