Last April, I had the pleasure of speaking at the Salishan Conference on High Speed Computing where I presented two interesting use cases for object storage in an HPC ecosystem. The first, and more traditional, as an economical active archive for simulation data, makes perfect sense. Once the computational analysis is over, why use expensive primary storage (think NAS and SAN) to house the raw data and results of simulations indefinitely?
It makes much more sense to store that data on cost-effective scale-out object storage that is fully searchable, easily accessible both inside the organization and over the internet (think URL to the object), and allows for NFS and S3 access to the same objects. In short, these are some of the many benefits of RESTful object APIs over traditional POSIX-based filesystems. Of course, the downside is that data must be moved/copied from object storage to file-based storage to run compute-intensive analytics on that data.
But what if you could remove the network and storage performance bottlenecks of the object storage environment? Could you then replace the primary storage (traditional POSIX and parallel file systems) with more economical object storage? That leads me to the second and quite frankly a more innovative use case—as a new storage, tenant and content management solution for read-intensive HPC workflows that could actually replace parallel file systems.
Those are the exact questions that the team at the Science and Technology Facilities Council (STFC) Rutherford Appleton Laboratory (RAL) Space supporting the JASMIN project set out to answer back in 2016.
First, they tackled the network latency. STFC employs an HPC “leaf/spine” routed CLOS network with 100Gb spine switches and 100Gb leaf or top of rack (TOR) switches. Every TOR is connected to every spine switch and there is equal uplink/downlink bandwidth on every TOR switch. This design delivers a super low-latency, non-blocking network where there are only 3 switch hops of
Next up was to identify an object storage solution that could deliver those benefits mentioned above, while at the same time achieving the performance required to replace their parallel file system for read-intensive workloads…enter Caringo Swarm.
At its core, Swarm is built around a “pure” object storage architecture that is simple, symmetrical and does not rely on traditional storage technologies such as caching servers, file systems, RAID or databases. Instead, data is written raw to disk together with its metadata, meaning objects are “self-describing.” Identifying attributes such as the unique ID, object name and location on disk are published from the metadata into a patented and dynamic shared index in memory that handles lookups.
This design is quite “flat,” infinitely scalable and very low latency as there are zero IOPS to first byte. It also eliminates the need for an external metadata database both for storing metadata and as a mechanism for lookups. Automatic load balancing of requests using a patented algorithm allows for all nodes and all drives to be addressed in parallel, removing the need for external load balancers and front side caching—both of which can present significant performance challenges in an HPC environment where total aggregate throughput is the goal rather than single-stream performance.
For S3 testing, COSBench was used to run ramp-up tests leveraging up to 20 physical client machines to measure the throughput potential of the entire Swarm cluster. Sequential tests were run using 2 Gigabyte erasure-coded files. In this environment, Swarm achieved 35 GB/sec throughput, over 60% better than the minimum requirement. You can download the complete benchmark whitepaper.
So…can object storage replace parallel file systems? In an HPC environment where high-aggregate read throughput, as well as durability and accessibility of data over a common protocol (such as S3 wrapped in a multi-tenancy framework), are required, the answer is a resounding YES with Caringo Swarm Hassle-Free, Limitless Object Storage!