Caringo Acquired by DataCore Read the Press Release

Storage Challenges: We Love Big Objects

Learn how Swarm stores, protects and provides access to data sets that are too cumbersome for traditional file systems.


Valentine’s Day is a good day to reflect on the people and things that you love. And, one of the things the Caringo Engineering team loves the most is a good challenge. One problematic aspect of big data is handling individual objects that are very large. Case in point: one of our customers stores memory dumps from supercomputer runs that are multiple terabytes in size. Video streaming is another common application that can create files of arbitrarily large size.

Traditional file systems have problems with these applications because individual files are tied to specific disks, each of which has finite capacity. Swarm deals with large objects differently, internally breaking them up into pieces of manageable size and seamlessly assembling the pieces when they are needed. This means, for example, that a Swarm cluster can easily store objects that are much larger than any of the disks managed by the cluster. Swarm can even do this on streaming (HTTP chunked transfer encoded) writes without missing a beat. If you have lots of objects like this, it’s easy to just add more resources to a running cluster and the cluster will keep accepting more data, large or small.

Swarm supports writing up to 4 terabyte objects, by default, as this configurable limit is large enough for most applications. However, with some simple settings changes, Swarm can accept objects of 64 terabytes and beyond. Swarm will protect these large objects just as well as smaller ones, and with Swarm’s elastic content protection, users can find the right balance of data footprint and protection.

Few people have the patience to write a single multi-terabyte object. Because of this, we developed a parallel upload capability, which automatically breaks up the object into parts and uploads those parts in parallel. Once the parts are in Swarm, a final Swarm operation stitches the parts together into the final object. So, a larger cluster not only helps to store more data, it also provides resources that allow more parallelism during ingest.

Thanks to this and other features that support extremely large objects, the flexibility and capability of Swarm really shine for easily storing, protecting, and providing access to data sets that are too cumbersome for traditional file systems.

Want to fall in love with Swarm object storage? Download our complimentary Swarm Developer Edition 10TB Cloud.

Don Baker
Don Baker

About The Author

Don Baker is the lead developer for the Swarm product. He joined Caringo in 2010. Prior to Caringo, Don worked as a research scientist in Austin for over a decade. Don earned his PhD in Computer Science from Rice University.

More From Author