Caringo Swarm’s unit of content, unsurprisingly, is the object. It isn’t a file technically, as no traditional file systems are used inside of Swarm. File systems simply aren’t robust or fast enough to scale into the hundreds of billions of objects, but as with most technologies, there are tradeoffs. Typically, once an object is written in an object storage environment, if a change is made to that object, it is deleted and re-written. Let’s take a closer look at why.
Physically, an object is stored in Swarm as a simple, contiguous sequence—or stream—of bytes on raw disk. The contiguous stream consists of two parts, data and metadata, that are written strictly once and always remain physically encapsulated together.
Superficially, there may seem to be a contradiction between the fact that a Caringo Swarm object is written out just once in pretty much a single operation and the requirement of being able to update it. Looking under the hood, it’s easily explained why that isn’t actually the case.
Caringo Swarm offers three types of objects or streams—Named, Aliased, and Immutable; they all leverage the same mechanics underneath.
The most basic type is the Immutable object. Using an HTTP POST operation, it is sent to the cluster, stored on disk in replicated or erasure-coded format, and a 128-bit UUID is sent back to the client as a sign of success. Its existence and location is recorded in Swarm’s index journal, as well as in a slot of Swarm’s RAM index itself, comprising about 50 bytes of RAM.
The second object type, Alias, differs from the Immutable object in that it can be updated using an HTTP PUT. Rather than updating or overwriting in place, a new object is written and the existing UUID is “rerouted” to point to it. This additional level of indirection is maintained in the RAM index using an additional RAM slot.
Last but not least, so-called Named objects fully conform to the naming scheme introduced by Amazon AWS for their S3 storage service. Rather than offering a random UUID back to the client application, they allow the application to bring its own name in an HTTP POST-based create operation. Anything that conforms to a standard URL can serve that purpose. There is also one level of container objects, called buckets, which are essentially Named objects themselves, that can group Named objects together by having their name recorded in the contained objects’ metadata.
Two of the three objects above are updatable (mutable) and two of the three are identified by UUID (are unnamed):
A Caringo Swarm cluster can have multiple Tenants and associated Domains that can be secured and administered separately as a kind of virtual cluster. Domains in turn contain buckets that contain Named objects, making Caringo Swarm effectively a superset of AWS S3 in that regard. Multi-tenancy is essential to those use cases where different user populations need to be served in fundamentally independent ways.
So what’s in a name? Well…as you can see, a lot. But at Caringo, we work tirelessly to make sure that you don’t need to worry about the mechanics. We provide all of this functionality through a RESTful interface supported by SDKs in all major languages. We also have a team of solutions and integration experts ready to answer your questions. Want to learn more? Then join us for our ‘Object Storage for Developers’ webinar on March 16.
Abstract: This whitepaper presents the results from recent benchmarking of Caringo Swarm object storage on a multi-Terabit converged Ethernet Software-Defined Storage Super Data Cluster deployed by the UK Science and Technology Facilities Council’s (STFC) Scientific … More Details »