Secondary indexes and write-cas in Elliptics

 

Write-cas stands for write-compare-and-swap, which means server performs your write only if your cookie matches what it has. We use data checksum as a cookie, but that can be changed if needed.

CAS semantic was introduced to heal a problem when multiple clients update the same key. In some cases you might want to update a complex structure and not to loose data made by others.

Since we do not have distributed locks, client performs read-modify-write loop without any lock being held, which ends up with the race against other clients. CAS allows to detect that data was changed after we read it last time and our modification will overwrite those chages.

In this case client gets error and performs read-modify-write loop again. There are nice function to use write-cas: with functor which just modifies data (read and write is hidden) and the one, and low-level method, which requires checksum and new data – if it fails, it is up to the client to read data again, apply his changes and write it again.

The first write-cas client is elliptics’ secondary indexes. You may attach ‘tags’ or ‘indexes’ with data to any key you want. This means that after secondary indexes update has completed, you will find your key under the ‘tag’ or ‘index’.

For example, tag can be a daily timestamp – in this case you will get list of all keys ‘changed’ at given day.

Of course you can find all keys which match multiple indexes – this is kind of ‘AND’ logical operation. Our API returns all keys found under/in all requested tags/indexes.

All indexes (as well as any keys) may live in different namespaces.

So far secondary indexes are implemented on pure write-cas semantic. And our rather simple realtime search engine <a href=”http://reverbrain.com/wookie” title=”Wookie – realtime search engine” target=”_blank”>Wookie</a> uses them to update its reverse indexes.
And that doesn’t scale :)

We found that having multiple clients competing to update the same popular index ends up with too many read-modify-write cycles, that the whole system becomes network bound. And we want to be CPU bould instead.

To fix this issue we will change secondary indexes to be invoked from <a href=”http://doc.reverbrain.com/elliptics:serverside” title=”Elliptics server-side engine” target=”_blank”>server-side</a> engine. All server-side operations are atomic by default, so this will eliminate races between multiple clients.

Stay tuned – I will write more about Wookie soon.
After all, we want to implement a baby (really-really baby) <a href=”http://www-03.ibm.com/innovation/us/watson/” title=”IBM Watson” target=”_blank”>Watson</a> with it :)

Comments are closed.