As I head out to New York City for Strata + Hadoop World 2015, I can’t help but reflect on just how many customers we work with. From hospitals to research labs to global telecoms, we work on some of the largest repositories of data in the world. And, overall capacity and file counts are skyrocketing!
Hadoop for scale-out data analysis is well established; however, one of the challenges of a growing footprint is providing an efficient archive and disaster recovery for the data that’s been analyzed. This is why we developed SwarmFS for Hadoop—a bridge between Swarm and scale-out analysis enabled by Apache™ Hadoop®, Cloudera® or Hortonworks® Data Platform. Files on Swarm are available to MapReduce and other Hadoop processing jobs, or can be copied back to HDFS if needed.
So you’re probably thinking, “Hadoop already provides a storage infrastructure. Why do I need Swarm as well?” Let’s take a look at the benefits of using Swarm with Hadoop:
Reduced Footprint: By using erasure coding and Elastic Content Protection, you can achieve a 10x data protection overhead improvement over the Hadoop Distributed File System (HDFS), which triple replicates. This makes Swarm clusters ideal for HDFS backup, archive and disaster recovery.
Application Interoperability: Swarm provides secure and high throughput data access to other users and applications through the Swarm REST API. An open source implementation of this API, the Swarm SDK is available in Java, Python, C++ and C# .NET.
High Performance: HDFS utilizes Caringo Swarm’s symmetric architecture via fully parallel and highly available data access for high-performance analyses. All Swarm nodes are effectively name nodes and all nodes are data nodes.
Compliance: Retention requirements can be enforced with Write-Once Read Many (WORM) functionality, making it ideal for financial and legal data.
Self Healing: Background processes regularly run integrity checks to ensure an end-to-end data protection lifecycle.
For more information, check out our SwarmFS Product Brief. We will be at StrataHadoop World 2015 in NYC from September 29 to October 1 at booth number 562 and would love to meet you! You can also email us at firstname.lastname@example.org. And, don’t forget to follow us on LinkedIn, Twitter, Google+ and Facebook to keep up with our latest news as we continue to evolve our products to meet your data storage challenges!