Grape is a realtime processing pipeline.
It contains 2 major parts: persistent queue and processing workers.
Instead of going over Storm‘s steps we dramatically changed grape logic.
The main goal is data availability and data persistency. We created grape for those who can not afford losing data.
Persistent queue uses simple push/pop/ack API to store/retrieve/ack chunk of data stored in Elliptics. Object may live in queue forever – until it is processed by the workers.
Contrary to Kafka we can not lose your data if ‘data-file’ was not read for a long time or its size overflows under constant write load.
Our queue may grow in distributed storage as long as it has space (which is usually considered as unlimited), and one may start processing workers not in push manner, but using pull design.
Push messaging systems implies the whole processing pipeline has to work with the same speed as pushing process. And if there are spikes of load which processing workers can not handle, data will likely be lost. Any pipeline modification (like resharding Kafka topics) ends up stopping processing.
Pull systems do not suffer from this ‘must-have-the-same-performance’ issue – one may start new worker nodes on demand, and even if they can not handle current write spike, it will be stored in the distributed persistent queue and catched up later.