on
Benji a new Scala DSL for reactive Object storage
We are glad to announce that Benji, a new reactive Scala DSL for Object storage, developed by Fabernovel, is now available under open source Apache Public License 2. Discover in this first article what are the key features.
Quick look at the context
Nowadays, there is a myriad of ways to store data.
For instance, databases, either SQL or NoSQL (that’s another topic) can both work with structured or binary data (e.g. blob in RDBMS).
However, considering binary data, there are some more efficient kinds of storage:
- File storage: The plain old one, with a hierarchy of files and folders (as on your computer hard drive).
- Object storage: Data is stored in objects, which are gathered in scalable collections (buckets).
An Object storage has some benefits over a plain file storage, starting from its abstraction.
Such storage is designed to be fully OS and hardware agnostic, for both client and server sides.
Object storage also offers better availability, as providers are supposed to support replication across multiple nodes/data centers.
Another key point about the Object storage services is scalability, as they allow to increase the available space when needed.
Data can be organized on the Object storage the same way as on the file storage.
File Storage | Object Storage |
---|---|
Directory | Bucket No bucket in bucket (but prefix filter) |
File | Object Range query, metadata, versioning |
At Fabernovel, we are used to integrate such storage on reactive Scala projects.
A clear need, a tailored answer
Depending on the storage provider, many JVM libraries are available, such as the AWS library or Google Cloud client.
It’s also possible to directly use the REST API for these services.
Either way, the integration is specific to one provider: it’s not possible to use the same integration for both AWS S3 and Google Cloud Storage.
In some cases, we need to develop a module working with several storage providers.
This can be useful to reduce the refactoring required to switch from one storage to another.
It also allows to benchmark the very same code with different providers, for example to determine which of S3 and Google Cloud Storage better fits a specific use case.
In a quality approach, it can also be required to have the same code running on a production environment using a provider, while it’s tested on a QA environment with another.
Benji, our homemade solution
Therefore, we have designed an abstraction DSL for the common Object storage operations: listing the bucket, putting data in object, managing the versioning when supported, etc … and we have developed a Scala library which implements it : Benji (2 points for those finding the meaning of this name).
Benji currently supports any S3 compliant provider, as AWS or CEPH, and is also compatible with Google Cloud Storage.
Thus, its DSL described bellow makes it easy to declare in your Scala project what you can do with an Object storage, whichever provider is used at runtime (like the JDBC abstraction for the SQL databases).
We have designed it in a modular way, so you can only add the Benji dependencies required for the storage you want to use (e.g. Benji S3 but not Google in your SBT build).
Benji is also a reactive library, based on Akka Stream, which makes it possible to process Object storage metadata (e.g. bucket or object listing) and data (from/to the objects), in a non-blocking way and without having to load everything in memory.
Thanks to its reactive approach it allows to address concerns such as Backpressure (e.g. Slow consumer/Fast producer issue) or Supervision.
As we find it very satisfying to develop well tested projects, Benji is not only tested but it also provides modules to easily test your own code.
To do so, a Benji module for testing purposes (or for QA) is available, based on Apache VFS.
With that, during tests, the Object storage abstraction can be provisioned with a local and temporary instance, not depending on an external service, which makes it possible to execute even offline.
In order to develop Play applications using Object storage, Benji provides a convenient module to easily provision, either with runtime Dependency Injection or compile time one.
Benji is not only modular, it’s also extensible: it’s possible to implement a new module without altering the core library. This extensibility is based on the Service Provider Interface mechanism.
Benefitting from that, we intend to implement modules for more providers (e.g. Azure Blob Storage), or to support some compatible services (such as Git LFS, MongoDB GridFS).
See also examples.