Benji a new Scala DSL for reactive Object storage

We are glad to announce that Benji, a new reactive Scala DSL for Object storage, developed by Fabernovel, is now available under open source Apache Public License 2. Discover in this first article what are the key features.

Quick look at the context

Nowadays, there is a myriad of ways to store data.

For instance, databases, either SQL or NoSQL (that’s another topic) can both work with structured or binary data (e.g. blob in RDBMS).

However, considering binary data, there are some more efficient kinds of storage:

An Object storage has some benefits over a plain file storage, starting from its abstraction.

Such storage is designed to be fully OS and hardware agnostic, for both client and server sides.

Object storage also offers better availability, as providers are supposed to support replication across multiple nodes/data centers.

Another key point about the Object storage services is scalability, as they allow to increase the available space when needed.

Data can be organized on the Object storage the same way as on the file storage.

File Storage Object Storage
Directory Bucket No bucket in bucket (but prefix filter)
File Object Range query, metadata, versioning

At Fabernovel, we are used to integrate such storage on reactive Scala projects.

A clear need, a tailored answer

Depending on the storage provider, many JVM libraries are available, such as the AWS library or Google Cloud client.

It’s also possible to directly use the REST API for these services.

Either way, the integration is specific to one provider: it’s not possible to use the same integration for both AWS S3 and Google Cloud Storage.

In some cases, we need to develop a module working with several storage providers.

This can be useful to reduce the refactoring required to switch from one storage to another.

It also allows to benchmark the very same code with different providers, for example to determine which of S3 and Google Cloud Storage better fits a specific use case.

In a quality approach, it can also be required to have the same code running on a production environment using a provider, while it’s tested on a QA environment with another.

Benji, our homemade solution

Therefore, we have designed an abstraction DSL for the common Object storage operations: listing the bucket, putting data in object, managing the versioning when supported, etc … and we have developed a Scala library which implements it : Benji (2 points for those finding the meaning of this name).

Benji currently supports any S3 compliant provider, as AWS or CEPH, and is also compatible with Google Cloud Storage.

Thus, its DSL described bellow makes it easy to declare in your Scala project what you can do with an Object storage, whichever provider is used at runtime (like the JDBC abstraction for the SQL databases).

ObjectStorage.buckets                     // list buckets
ObjectStorage.bucket(name): BucketRef     // obtain bucket reference
    BucketRef.objects                     // list objects
    BucketRef.obj(name): ObjectRef        // obtain object reference
    ObjectRef.{ put, get, delete }        // upload, download, ...

We have designed it in a modular way, so you can only add the Benji dependencies required for the storage you want to use (e.g. Benji S3 but not Google in your SBT build).

Benji is also a reactive library, based on Akka Stream, which makes it possible to process Object storage metadata (e.g. bucket or object listing) and data (from/to the objects), in a non-blocking way and without having to load everything in memory.

BucketRef.objects: Source[Object, ..]
ObjectRef.put: Sink[ByteString, ..]
ObjectRef.get: Source[ByteString, ..]

Thanks to its reactive approach it allows to address concerns such as Backpressure (e.g. Slow consumer/Fast producer issue) or Supervision.

As we find it very satisfying to develop well tested projects, Benji is not only tested but it also provides modules to easily test your own code.

To do so, a Benji module for testing purposes (or for QA) is available, based on Apache VFS.

With that, during tests, the Object storage abstraction can be provisioned with a local and temporary instance, not depending on an external service, which makes it possible to execute even offline.

def vfs: Try[ObjectStorage] =
  VFSTransport.temporary("benji").map { tx => VFSStorage(tx) }

In order to develop Play applications using Object storage, Benji provides a convenient module to easily provision, either with runtime Dependency Injection or compile time one.

// COMPILE TIME: Controller, custom Module, ApplicationLoader …
class MyComponent1(...) extends BenjiFromContext(context, name) {  }

// RUNTIME:
class MyController @Inject() (..., val storage: ObjectStorage)
  extends AbstractController(components) {  }

Benji is not only modular, it’s also extensible: it’s possible to implement a new module without altering the core library. This extensibility is based on the Service Provider Interface mechanism.

Benefitting from that, we intend to implement modules for more providers (e.g. Azure Blob Storage), or to support some compatible services (such as Git LFS, MongoDB GridFS).

See also examples.