With the onslaught of data that organizations are experiencing in the cloud age, protecting all that data has never been more important. Storing and providing access to your stored data may be the most visible thing that Swarm does, but as lead developer for Caringo’s object storage, data protection is of primary importance to me and my team. Our founder, Jonathan Ring, uses a metaphor for Swarm as a ship carrying and protecting your data as the river of hardware changes over time, either through upgrade or eventual failure. This is the first of a three-part blog on how Swarm protects our customer’s data.
It’s worth explaining the various data protection mechanisms in Swarm and how a Swarm cluster can serve as the unit data protection even if parts of the overall system fail. Swarm has the bases covered when it comes to protecting data. Even before there’s a failure, Swarm is actively protecting your data. Our basic strategy is to make multiple replicas of your data so that we are never putting “all our eggs into one basket.” A central function of the Swarm health processor is to maintain the right number and placement of replicas of all objects in the cluster despite changing conditions. The number of replicas Swarm stores for an object is up to the cluster administrator. More replicas gives higher protection at the expense of more space used.
Many customers choose three replicas, which protects all data in a cluster against 2 simultaneous disk losses at a cost of 3 times the logical data stored. All replicas are completely equivalent and so there is no risk of some special replica being more vulnerable than any other. Those three replicas are placed in locations within the cluster that are unlikely to fail at the same time and we make sure that all replicas are faithful copies by checking computed hashes during transfer.
These mechanisms extend to erasure-coded objects, which may allow for high protection and a smaller data footprint at the expense of many parts (which we call segments) that must be assembled to reconstruct the object. Segments, too, must have independent failure modes and they are subject to similar health processor checks to make sure all the segments comprising an object are present and properly placed for optimal protection.
As an aside, a great feature of Swarm is that we can protect different objects with different encoding schemes and different levels of protection all in the same Swarm cluster. This is a patented feature of Swarm that no other storage provider has. We even allow these protections to change over time, so you can protect your data more when your data is more valuable and less when it’s not as valuable.
There’s a lot to talk about when it comes to data protection with object storage, so watch for part 2 of this 3 part series where I’ll discuss how individual objects are protected during their lifetimes. In the meantime, if you have questions, feel free to email us at email@example.com.