Setting up client to access Elliptics webdav server

We have installed a simple webdav server + elliptics backend for you to test, you can connect using your favorite client and check how POSIX filesystem works.

For Linux we recommend using wdfs and definitely not dvfs2, the latter is very racey and basically doesn’t work. For Windows to enable network drive support with authentication you will have to install webdav client (wrapper actually) which supports this, default Windows webdav client doesn’t support auth, we recommend NetDrive.

Here is a linux setup:

sudo wdfs http://webdav.reverbrain.com:8021/webdav /mnt/webdav/ -o uid=1000,gid=1001,umask=0022,username=test,password=test,allow_other

Where uid/gid should correspond to your local user. Username/password is test/test.

Please note, that this node can be (and will be) destroyed/reset at any moment.

POSIX filesystem interface for Elliptics distributed storage

We in Reverbrain create storages. Starting from single-node search engines to multi-petabyte clouds spawning hundreds of servers. All of them are accessed mostly via HTTP API and sometimes using programming interfaces for Python, Go and C/C++

But there is a huge number of clients who do not need complexity of the HTTP API, instead they want to have locally connected storage directory with virtually unlimited free space. They will store and load data using existing applications which operate with files. It is very convenient for user to mount remote storage into local directory and run for example backup application which will copy local files into this directory and this will end up in several physically distributed copies spread around the world. Without any additional steps from the user.

The first incarnation of the POSIX filesystem interface for some simpler distributed storage we created was a network block device which contained a connection layer and depending on the block offset it selected a server to work with. That was a bit ugly way to work, since block device doesn’t know which object is being stored or read, and what locking should be performed. Locking was always either too coarse or too fine, it ended up performing a lot of operations for simple block transfer, it became obvious that locking has to be performed on the higher layer, namely in the filesystem. This distributed block device was named DST and it lived in linux kernel for couple of years.

The second approach was to implement a filesystem. We created POHMELFS – Parallel Optimized Host Message Exchange Layered FileSystem. Its first commits were imported at the very beginning of January 2008. Actually POHMELFS was not very active and it clearly became visible that existing Linux VM stack doesn’t really scale to billions of objects, we do not have enough resources to change that – that’s a huge task both from technical and political sides. We implemented several features which were then found in Parallel NFS and Ceph. POHMELFS lived in linux kernel for several years.

We removed both project from the linux kernel back then and concentrated on Elliptics distributed storage. And now its time to resurrect POSIX filesystem interface.

But we decided to move another way. Native filesystem interface is fast, but you have to implement it for every OS, this requires a lot of resources which will be wasted supporting different versions for different OSes. Do you know inode allocation differences between Windows 8 and 10?

We found that our clients do not need this, instead they want network attached directory which works pretty well using WebDAV protocol. Well, not exactly, since Windows clients do not support authenticated webdav, and some applications like NetDrive has to be installed, but it happened to be almost standard application for NAS/SAN surprisingly.

We implemented WebDAV server which supports HTTP authentication and connects to Elliptics storage. There are limitations both in WebDAV protocol and in our server, in particular we do not allow locking to be transferred among servers, i.e. if client connected to storage via one gateway and the reconnected using the other, interlocking will not see each other. But that should not be a problem, since webdav prohibits parallel update of any object.

We can create many private folders for every user, it is even possible to add features on top of user files like indexing for search, version control and so on, but that’s a different story.

The main advantage is that this distributed storage is cheap per gigabyte. You can add many commodity servers into Elliptics cluster this will just increase the size of the storage without interruption – system scales linearly to ~4Tb/day writes in our setups and 200+Tb/day in Yandex for example. You can also put replicas of your data into different datacenters – this is inherent feature of Elliptics, and if connection to one datacenter drops down, client will just work with the other replicas.

And all those features are now accessible via usual filesystem interface. It is possible to access data via HTTP or other APIs though.

If you are interested, feel free to contact us info@reverbrain.com

Adaptive MPEG-DASH streaming and arbitrary channel muxing

Ever wanted to be able to compose your own content stream without reencoding video files? Like having video stream from this file, and audio from these files, and then add another video and so on and so on?

We have created a simple service to highlight one of our technologies which allows to create adaptive MPEG-DASH stream (the same stream as youtube returns) and mix many streams from different files stored in Elliptics.

stream1

In the example above NeuralNetwork course video is being played and some of my saxophone music as audio track.

DASH stream is being created on demand from different sources, so if you want to add new sound track or link multiple videos one after another there is no need to reencode video content each time. The service we created is rather simple, it does not use every feature of the technology, in particular, playlist protection is not used in the service, i.e. you can share it with others and initializing DASH player with our URL you will be able to play it on your site. Also we do not turn on stream insertion, i.e. when you want to play the main video and put in some additional video stream (or its part) at some time offset. We have both of this features in the server, but there are no interface controls for them in the service yet.
As a side note using this service one can create gapless audio/video playing, i.e. no player reinitialization between tracks like on popular video streaming sites.

So far we only support MPEG-DASH streaming, and while almost all modern browsers support it (i.e. Chrome, Firefox, IE and Opera), Safari is out of the list. We do not yet fully support HLS, and although Apple had announced at WWDC2016 that they will switch from mpeg2ts streaming (probably being more compatible with DASH stream), we still have plans to implement HLS streaming too.

So, if you are interesting to play with the technology, you are welcome to our playground service: http://stream.reverbrain.com
Do not be afraid of mad design, just upload files, they will be tagged automatically, and create playlists!

Audio/Video transcoding server

I’m pleased to announce Nullx – our standalone audio/video/subtitles transcoding server. It accepts files to be transcoded via http and either returns transcoded files in the reply or optionally uploads file (and metadata) into Elliptics storage.

So far it is a quite simple service which does not change parameters of the streams like height/width or bitrate, but instead transcodes streams into h264/aac/mov_text format, which is only suitable for HLS adaptive streaming. Since we plan to add realtime downscaling of the data for adaptive streaming, this service will be extended and will receive some per http request controls which will tell how exactly should given stream be transcoded, so far I believe only h264 profile and height/width for video and bitrate for audio streams are needed.
That will be our next release.

Nullx – transcoding server – is used in our broadcast service which allows to group uploaded into Elliptics storage audio/video streams, mux them together with different options (like order, timings, split at various time positions, sound from different source and so on) and adaptively stream resulted data using MPEG-DASH/HLS protocols (natively supported by Chrome/IE/Firefox and Safari on desktop and mobile).

Server-side operations

Elliptics distributed storage is being built as a client-server architecture, and although servers may discover themselves, exchange various statistic information and forward requests, they most of the time serve client’s requests.

Recovery in a distributed storage is a tricky operation which requires serious thinking on which keys have to be copied to which destinations. In Elliptics recovery is another client process which iterates remote nodes, reads data to be recovered and update needed keys.

But there are cases when this round trip to client is useless. For example when you require missing replica, or when you have a set of keys you want to copy or move to new destination.

Thus I introduced two server-side operations which allow to send content from one server to multiple replicas. It is intended for various recovery tools which optimize by not copying data from local node to recovery temporary location, instead they may tell remote node to send data directly to required locations. It can also be used to move data from one low-level backend (for example eblob) to a newer version or different backend without server interruption.

There is a new iterator type now which sends all keys being iterated to set of remote groups. It does it with the speed of network or disk (what it slower), in local tests iteration over 200Gb blobs sending data over the network to one remote node via write commands ended up with ~78MB/s sustained speed. There were pikes though, especially when remote node synced caches. Both sender and recipient had 30Gb of RAM. Rsync shows ~32MB/s speed on these machines, but not because it is that slow, but because of ssh which maxed out CPU by packet encryption.
Iterator sends dnet_iterator_response structure for each write result for every key it has processed just like for usual iterator, neither API nor ABI is broken.

Second server-send command accepts vector of keys. It searches for all remote nodes/backends which host given keys in the one specified group, splits keys into per-node/backend basis and tells remote backends to send appropriate keys to specified remote groups. The same iterator response is generated for every key which has been processed.

All operations are async and can run in background with other client requests being handled in parallel.

There are 3 operation modes:
1. default – writing data to remote node using compare-and-swap, i.e. only write data if it either doesn’t exist or it is the same on remote servers. Server sending (iterator or per-key) running in this mode is especially useful for recovery – there is no way it can overwrite newer copy with the old data.
2. overwrite – when special flag is set, it overwrites data (clears compare-and-swap logic)
3. move – if write has been successful, remove local key

There is example tool in examples which iterates over remote node and backends and performs copy/move of the keys being iterated. Next step is to update our Backrunner HTTP proxy to use this new logic to automatically recover all buckets in background.

Stay tuned!
Generate Samurai siege Diamond

HLS and DASH adaptive streaming and muxing from the same files from Elliptics

Our streaming engine Nulla is now capable of streaming in both HLS and MPEG-DASH formats from Elliptics storage. We do not require uploading multiple files (in mp4 and mp2ts container formats) or specially repack frames in the media files (that’s what mp4box is used for) for streaming.

Elliptics streaming service Nulla creates DASH/HLS stream in realtime from data file you have provided, it builds either mp4 or mp2ts container on demand based on the streaming offset client requests. This allows not to force clients to upload multiple format files (mp2ts for HLS, mp4 for MPEG-DASH) or repack your existing files to meet streaming demands (fragmenting stream and put indexing box in front of data).

Since we build stream from data frames and create container in realtime we can adjust presentation and decode times and build an audio/video muxer. This allows, for example, to stream one file and put another one into the middle or stream file and split it into X chunks each of which will be followed by another different file, like 5 seconds from file X, 10 seconds from Y, 15 seconds from X starting from 5-seconds offset, then 10 seconds from file Z, while audio track has own muxing and so on and so on.

This is all being controlled via siple json API and guarded from embedding into hostile sites via random URLs with limited lifetime.

We built a simple example page which shows this kind of muxing and streaming.
Depending on your browser our servers will stream either HLS (desktop Safari and iOS) or MPEG-DASH (tested in current stable versions of Chrome, Firefox and IE) from 2 video and 2 audio files uploaded quite far ago.

Enjoy!

http://video.reverbrain.com/index.html

Source code for this simple HTML page shows how simple and convenient is our API.

Next tasks is to build a cache for preprocessed chunks and distribute it among multiple geographically distributed elliptics nodes in your cluster. We also plan to add automatic transcoding of video stream into smaller bitrate which will be automatically selected by the browser (that’s why HLS/DASH are adaptive streamings), currently you have to upload files in multiple bitrates and use them in API to create appropriate playlist, this task can be automated and will be implemented in the next release.

Another next major goal is to implement live translation from application (or browser) into Elliptics, who will distribute your translation via HLS/DASH to the thousands of simultaneously watching users.

Adaptive MPEG-DASH streaming and multiple stream muxing in Elliptics

We built Elliptics distributed storage quite long time ago, it is mature technology which just works when you have to store medium and large objects in geographically distributed locations.

It also supports progressive download: FLV and byte-range (tutorial, mp4) streaming, but we wanted more – native adaptive streaming with channel muxing from elliptics.

Basically, we wanted to upload multiple mp4 files into Elliptics and then stream them to client in required order like 10 seconds of the first file, then 20 seconds of the second, then 10 minutes from the first started from 10’th second while there is another track (like sound) in background which is mixed in its own way. And preferably with adaptive bitrate switching if client moves from slower to faster networks like 3g-to-wifi and vice versa.

Another example of this muxing feature is gapless music playing – you listen songs one after another without delays in between, without player reinitialization, without delay waiting for song (or its part) to be downloaded when previous one has stopped.

There are many technologies which implement streaming: HLS, HDS, MPEG-DASH, RTMP, RTSP and others. It looks like HLS is the winner for now, it is backed by Apple and is supported by iOS and Safari, but MPEG-DASH is backed by large group of vendors and is supported by all other browsers including TVs except iOS. Since Flash is going to die after youtube.com, netflix.com and other large vendors stopped streaming in that format, I believe MPEG-DASH will become more and more popular (its market share to date is rather small) and eventually only HLS and MPEG-DASH will be default streaming protocols.

There are fair number of streaming services which support both HLS and MPEG-DASH, but most of the time they require quite a lot of efforts to create fault-tolerant streaming service which would work with distributed storage. And neither of them supports audio/video muxing described above. Actually streaming technology itself partially supports this feature, for example in MPEG-DASH there is a notion of “Period”, and multiple periods would be played one after another. But this is a quite advanced feature which is not yet supported by reference player Dash.js (there are commercially available players which somewhat support this feature though). There are questions on implementation whether player can reinitialize itself to play multiple periods without stream interruption.

We decided to implement MPEG-DASH in our Elliptics streaming service to support both adaptive streaming and stream muxing. To implement this we create the whole mpeg container in runtime and only read samples data from audio/video files stored in Elliptics. To allow muxing all files in the stream must be encoded the same way though.

Using our technology one can implement 5-seconds muxing (5 seconds of the first video, then 5 second of the second, then next 5 seconds from the first and so on) in example below using following control json:

"tracks": [
{
  "bucket": "b1",
  "key": "video_1.mp4",
  "duration": 5000
},
{
  "bucket": "b1",
  "key": "video_2.mp4",
  "duration": 5000
},
{
  "bucket": "b1",
  "key": "video_1.mp4",
  "start": 5000,
  "duration": 5000
},
{
  "bucket": "b1",
  "key": "video_2.mp4",
  "start": 5000,
  "duration": 5000
}
]

But enough words, show me the result.

Here it is, muxing 2 video and 2 sound channels in the way described above without interruption and gaps. All 4 files are stored in Elliptics storage as usual objects.

http://video.reverbrain.com/index.html

Please note that Elliptics and streaming servers are located in USA and it adds 150+ ms to get the first chunk (or if you’ve sought into the area which isn’t yet present in the cache) from Russia, otherwise it is very fast.

You can check the source of the html page above to see how muxing is being set up, you can play with different settings and watch the whole files or mix them in other ways around.

Enjoy, and stay tuned!

Range headers and native HTML5 streaming support in Backrunner

Plain HTML5 streaming requires server to properly handle RFC 2616 Range header, in particular video seek/rewind uses Range header to specify exact position within file to start streaming from.

HLS protocol is a bit different, but yet again player uses Range header to specify position within timeframe.

Previously Elliptics HTTP server Backrunner used size/offset URI parameters for this purpose, which are not ajax friendly and obviously are not supported by standard players.

With this new Backrunner update we add Range and If-Modified-Since headers support.
The former allows to work HTML5 pleers with Elliptics HTTP proxy out of the box. If-Modified-Since is quite useful for client-side caching.

Here is a simple example of our video-on-demand service.

Our future plans include realtime HLS generation and transcoding for live-translations built on top of elliptics and related technologies.

Greylock tutorial – distributed base search engine based on Elliptics

We’ve heavily updated Reverbrain documentation pages: doc.reverbrain.com, and I’m pleased to present our distributed base search engine Greylock. Documentation page includes generic information about search engine and tutorial, which includes installation process, configs and two types of clients: plain HTTP API (similar to what is expected from base search engine like Greylock and ElasticSearch) and Python client (works via HTTP too, but also uses Consul to acquire mailbox locks). If you need C++ tutorial, you can check greylock test suite which includes insertion/selection/removal as well as various iterators over data, self-recovery tests, statistics and other interesting bits.

I get a fair number of questions on how is Greylock different from ElasticSearch or Solr for instance or Amazon Cloud Search? They all have enormous amount of features and work with large amount of data, so what’s the purpose?

And the answer is just two words: scalability and automation.
If you worked with Elastic you do know which operations have to be made to reshard cluster when current sharding scheme becomes a bottleneck (how long it takes, what is the performance penalty and how dangerous is the process). When you work in the environment where new documents always arrive and space consumption grows with time, this resharding process will have to be started again and again with new servers added. At some point this becomes a serious issue.

With Greylock this is not needed at all. There is virtually no data and index movements when new servers are being added due to Elliptics bucket system. This design proved to work really well in Elliptics storage installations, where upload rates reach tens of terabytes daily, and that’s only our clients data, there are other seriously larger installations for example in Yandex.

We concentrated on scalability problem and solved it. And yet we do have a set of features. It is not comparable with Elastic of course even not counting NLP tasks which we will release later (language models and spelling correction for instance for any language where you can find a rather large corpus). Greylock supports basic relevance model based on the word distance among client request and words in the document.

Likely two of the worst issues are absence of numerical indexes and client locks. Both were made deliberately. Numerical indexes break pagination, which in turn means that if you want 100 documents out of a million, you will have to read them all into RAM, resort either to numeric order or into lexical order (that’s how document ids are stored in the inverted indexes), intersect the whole million of keys and return the first 100. For any subsequent request this has to be done again and again. Without numerics pagination works with iterators pointing to inverted indexes only, the whole index (and its millions of entries) is never being read, only some pages are accessed sequentially.

To help with numerics Greylock supports document timestamps, i.e. a single 64-bit numeric per document ID which is used in inverted indexes sorting order. Of course this is not a replacement for fair numeric index, but it does solve almost all of our use cases.

The second major issue is consistency and client locking. Greylock as well as Elliptics are not strictly consistent storages. With Greylock things are even worse – amount of data overwritten by a single client insert can be large and index structure (originally started as a distributed B+/*-tree) does not tolerate broken pages. Elastic and others implement consistency model (like Raft, Paxos or ZAB) internally. Greylock doesn’t. That’s why we require clients to acquire locks in some other system like Consul, Etcd or ZooKeeper to work properly. Our tutorial shows basic locking scheme implemented using strictly consistent Consul key-value storage.

We have a major plan for Greylock distributed search engine, expect new features and give it a try: http://doc.reverbrain.com/greylock:greylock
If you have any questions, you are welcome:
Google group: https://groups.google.com/forum/?fromgroups=#!forum/reverbrain,
this site and comments.

Server-side operations

Elliptics distributed storage is being built as a client-server architecture, and although servers may discover themselves, exchange various statistic information and forward requests, they most of the time serve client’s requests.

Recovery in a distributed storage is a tricky operation which requires serious thinking on which keys have to be copied to which destinations. In Elliptics recovery is another client process which iterates remote nodes, reads data to be recovered and update needed keys.

But there are cases when this round trip to client is useless. For example when you require missing replica, or when you have a set of keys you want to copy or move to new destination.

Thus I introduced two server-side operations which allow to send content from one server to multiple replicas. It is intended for various recovery tools which optimize by not copying data from local node to recovery temporary location, instead they may tell remote node to send data directly to required locations. It can also be used to move data from one low-level backend (for example eblob) to a newer version or different backend without server interruption.

There is a new iterator type now which sends all keys being iterated to set of remote groups. It does it with the speed of network or disk (what it slower), in local tests iteration over 200Gb blobs sending data over the network to one remote node via write commands ended up with ~78MB/s sustained speed. There were pikes though, especially when remote node synced caches. Both sender and recipient had 30Gb of RAM. Rsync shows ~32MB/s speed on these machines, but not because it is that slow, but because of ssh which maxed out CPU by packet encryption.
Iterator sends dnet_iterator_response structure for each write result for every key it has processed just like for usual iterator, neither API nor ABI is broken.

Second server-send command accepts vector of keys. It searches for all remote nodes/backends which host given keys in the one specified group, splits keys into per-node/backend basis and tells remote backends to send appropriate keys to specified remote groups. The same iterator response is generated for every key which has been processed.

All operations are async and can run in background with other client requests being handled in parallel.

There are 3 operation modes:
1. default – writing data to remote node using compare-and-swap, i.e. only write data if it either doesn’t exist or it is the same on remote servers. Server sending (iterator or per-key) running in this mode is especially useful for recovery – there is no way it can overwrite newer copy with the old data.
2. overwrite – when special flag is set, it overwrites data (clears compare-and-swap logic)
3. move – if write has been successful, remove local key

There is example tool in examples which iterates over remote node and backends and performs copy/move of the keys being iterated. Next step is to update our Backrunner HTTP proxy to use this new logic to automatically recover all buckets in background.

Stay tuned!