Resilience & Recovery with Swarm Object Storage, part 1

Data resiliency has been a feature of Swarm from day 1 and the architecture of Swarm object storage has never had a single point of failure.


Data resiliency has been a feature of Swarm from the beginning. The architecture of Swarm object storage has never had a single point of failure. One or more Swarm nodes may be down for maintenance or due to hardware failures and the rest of the Swarm cluster keeps running, servicing requests.

Since the data in Swarm remains available, users are likely to not even know about the outage. Because a larger Swarm cluster has better throughput than a smaller one, the loss of one or more nodes generally only has a small impact on performance.

Why is Swarm Object Storage Resilient?

Protecting Data with Caringo Swarm Object Storage

One of the resiliency features is that Swarm keeps multiple replicas of objects and it will not co-locate replicas where single failures are likely to take them both out. It’s the old idea of “don’t put all your eggs in one basket.” Replica distribution is discussed in my 2016 blog series (Protecting Data in the Cloud Age with Object Storage) as well as in our Protecting Data with Caringo Swarm Object Storage whitepaper.

The level of protection is easily configured, both at the cluster and at the object level. As a result, Swarm and the data it stores are resilient to failures at platter level, the level of the disk drive, the chassis, and even logical subclusters. In the latter case, customers can configure Swarm to know about a cluster’s network topology and location information.

How Replication and Distribution Increase Resiliency

Elastic Content Protection: Replication and Erasure Coding

Full replication and distribution of replicas (and erasure coded segments) are achieved from the time of the initial object write. Swarm’s health processor is a background task that is continuously checking for the presence and locations of replicas so that a cluster’s data is always protected from all but the most catastrophic failures, such as a fire or flood. Various remote replication options provide data resiliency for those possibilities. You can learn more by reading the Elastic Content Protection Technical Overview paper.

Learn More about Resilience & Recovery of Swarm Object Storage

Data Resilience & Recovery in Swarm Object Storage

Data resiliency is an integral part of Swarm object storage software and we are proud to have customers with clusters that have been operational for over 10 years, through all manner of hardware failures and upgrades. In next week’s blog post, I’ll discuss the data recovery features that come into play when a failure actually happens.

Register now for our August 20 Tech Tuesday webinar, Data Resilience & Recovery in Swarm Object Storage. It will feature T.W. Cook, VP Engineering, and John Bell, Sr. Consultant, and include live Q&A throughout the webcast.


Don Baker
Don Baker

About The Author

Don Baker is the lead developer for the Swarm product. He joined Caringo in 2010. Prior to Caringo, Don worked as a research scientist in Austin for over a decade. Don earned his PhD in Computer Science from Rice University.

More From Author