In part 2 of this series on data protection, I discussed how Swarm protects individual objects during their lifetimes. In this part, I describe how Swarm protects customer data through various catastrophic failures.
What about more catastrophic failures? If a chassis is lost, it’s most likely due to a power supply or motherboard failure. Swarm disks can just be moved to another chassis without much effort and no data will be lost. Active recovery may be counterproductive in this case as the recovery may fully replicate the contents of the chassis’ disks faster than the administrator can provision another chassis and move the disks. In this case, suspending recovery may be the prudent thing. But even if all the disks are gone, active recovery will do the right thing and recover all disks that were on the chassis. You may recall from part 1 of this series that Swarm’s health processor would prevent the co-location of replicas or segments of the same object on the same chassis, so even though multiple disks are involved, all the data lost is just a single replica or segment of a possibly larger number of objects. So with more recovery time, Swarm easily handles a chassis loss.
Swarm allows a cluster administrator to define logical subclusters. This feature allows cluster administrators to tell Swarm of even larger units of failure, involving say, power supplies or network structure. When this feature is used, Swarm will spread replicas and EC segments across the subclusters so that data remains accessible and recoverable, even if there’s a subcluster outage or loss.
You might think that’s the end of the story, but Swarm can even protect your data from the loss of an entire cluster! We allow the entire contents of one cluster to be replicated to another cluster for disaster recovery or other purposes. It takes just minutes to set up a replication feed which can, depending only on network bandwidth between the clusters, maintain a near perfect backup of the original cluster in a distant physical location.
Data protection is a core function of Swarm that leverages cluster resources to protect your data all the way from bit errors to natural disasters. You can do so with full knowledge of the protection and resource trade-offs and make the best decision for your needs. You can even make different decisions for different types of data. If you are serious about protecting your data, check out Swarm 8, coming 02.16.16, with even more features to protect you from common user errors. Register today for our webinar to learn more.