Frequently Asked Questions
Object storage software that gives you control over any volume,
flow or size of unstructured information
How may we help you?
You may also browse all of the questions below to find what you are looking for.
Frequently Asked Questions
What is Swarm object storage?
Swarm is a storage platform that helps you stay ahead of the ongoing challenges associated with rapidly growing unstructured data sets (documents, text files, audio, images, video, etc.). Offered as software or as appliances, Caringo Swarm includes built-in streaming, multi-site deployment, content management, security and protection. With Swarm object storage, content-driven organizations can now consolidate content stores, ensure data availability and enable secure file sharing and streaming directly from the storage layer.
What is CAStor?
CAStor was the original name of Caringo’s object storage software offering when it first launched in 2006. CAS refers to Content-Addressable Storage. Paul Carpentier, Caringo Co-Founder and Board Member, is known as the inventor of CAS. In May of 2014, due to the continued innovation by Caringo in the object storage space and enhancements in performance, content management, and search, we changed the name of the product to Swarm. However, we continued to use and increment the same version number to reflect the maturity and stability of the product.
What is Swarm architecture?
Swarm architecture is a parallel architecture that utilizes a bidding algorithm to determine the most efficient location in the entire cluster to store or deliver an object from. This is why Swarm performance is superior to other solutions that rely on some form of caching layer technology. Performance results for a large-scale, low-latency deployment (without a caching layer) can be found here.
What are the benefits of Swarm object storage?
Swarm’s rapid recovery delivers up to 25X less down-time during data recovery than traditional RAID storage. All software runs from RAM, resulting in an industry-leading 95% storage utilization for media and content. Swarm also provides up to 5X faster throughput, without utilizing expensive SSDs, for S3 read and write operations when compared to competitors.
Is it easy to setup Swarm object storage?
The Virtual Machine-based versions of Swarm can be up and running in minutes. For larger, bare-metal or appliance-based deployments, the most time-consuming steps in Swarm deployment are racking and stacking hardware and ensuring proper network configurations. Once a Swarm cluster is deployed, new hardware for capacity or performance can be added in less than 90 seconds.
What is object storage?
From a non-technical perspective, the easiest way to understand object storage is to think of how a valet parking system works. When you hand over your car to the valet to have it parked, you are issued a ticket with a specific number on it. To retrieve the car, you simply provide the ticket/number to the valet. You don’t need to know or care about where the car is parked or if it has been moved, so long as your car is returned in the same condition.
Object storage functions similarly, but with digital content. A file is submitted/stored and a key/unique identifier (UUID) is returned to the application so it can easily access the file when needed. When that file is later requested for retrieval, the application passes the key back to the object storage system and the file is retrieved. There are no file hierarchies, folder names or disk locations associated with the stored file. If the file is moved in storage, the key never changes. The file is free to move system wide, enabling highly automated system management and optimization which is why object storage can scale easily in both file count and capacity.
Learn more about object-based storage vs file-based storage.
Where should UUIDs be stored in object storage?
UUIDs or user-defined object names can be stored in applications, in documents or in a database. There are no restrictions as to where or how the UUIDs or user-defined object names are stored.
What is pure object storage?
Pure object storage does not have a file system, metadata database or any single points of failure. Pure object storage is the only way to provide uncomplicated, massive scale since all bottlenecks are removed and complexity is automatically managed by the software as the system grows. Caringo Swarm has been and always will be a pure object storage technology.
As with any technology, however, there are different ways to implement technical concepts. Some object storage solutions are built on top of file systems or employ single points of failure like controller nodes, management nodes or metadata databases; these object storage products would not be considered “pure.”
What is Elastic Content Protection (ECP)?
Elastic Content Protection (ECP), provides the storage industry’s most comprehensive data protection functionality and enables you to match the right storage durability and service level agreement (SLA) to individual users or applications. You can optimize data center footprint by using erasure coding and/or enhance data accessibility and performance utilizing replication and data distribution. ECP is the only data protection scheme to provide the choice and movement between replication and erasure coding, simultaneously available on the same node with automated migration from one protection scheme to the other. The result is a storage solution that dynamically adapts storage capacity utilization and object count based on your business, retention or accessibility requirements.
What is erasure coding?
Erasure coding provides Enterprise protection at lower storage footprint levels and is ideal for moderately sized files (1 MB+) and large content stores.
As file sizes and data sets grow – operational resources such as data center space, power, cooling and hardware become more difficult to manage and copy-based protection schemes become less efficient. The solution is erasure coding, which breaks a file into multiple data segments and computes parity segments, resulting in total segments that use less capacity and operational resources than an additional copy but still provide enterprise-grade protections.
Files split into multiple data segments (k) and additional parity segments (p) based on the content of the data segments. This results in m total segments (k + p = m) being distributed to m different Swarm nodes or sub-clusters.
What is storage replication?
Storage replication provides Enterprise protection with faster access for small or large files and is ideal for content delivery and small files (KBs-MBs).
Responsiveness is enabled by rapid access to an entire file. This means that to ensure fast response times replicating whole copies is a better solution than splitting a file into segments.
Replication uses copy-based data protection where complete copies of a piece of content are made and distributed across nodes or sub-clusters. Because replication stores data contiguously on disk once the first bit is identified, content is delivered rapidly and efficiently without the need for rehydration.
How does Swarm assure data integrity?
Swarm employs a hash algorithm that computes a digest, sometimes referred to as a digital fingerprint, based on the bit sequence for each content object (file). The digest is used by Swarm’s Health Processor and runs in the background and continuously checks the content’s integrity. If an object is determined to be corrupt, a new replica is generated from another uncorrupted replica stored in the system. This ensures that there is always the correct number of clean replicas available and accessible in Swarm.
What is a Content Integrity Seal?
The hash digest is also used as the Content Integrity Seal, which is a method to prove the authenticity of a content object as an original for compliance and evidentiary purposes in an open, customer auditable data structure. Swarm separates the content address (UUID) from the digital fingerprint (digest) allowing the hash algorithm to be seamlessly upgraded if the original is compromised. This has happened with both the MD5 and SHA-1 algorithms and Swarm’s patented, transparently upgradeable hash assures the long-term integrity of content.
Does Swarm object storage provide WORM storage?
Yes, Swarm can provide WORM (write once read many) storage if specified when content is written. Once specified, WORM content can never be deleted. Swarm can also manage content lifecycle information automatically and one can specify that a file cannot be changed throughout its defined life and cannot be deleted until the retention period has expired. This addresses regulatory mandates such as SEC17a-4, which is the most stringent regulatory requirement defined for data storage.
Does Swarm object storage support Legal Hold?
Yes, Legal Hold is a built-in feature of Swarm object storage. Legal Hold creates a point-in-time snapshot of a specified set of objects at a specified time that are then immutably stored regardless of what happens to the original object or cluster and satisfies SEC 17-4(f).
What are the requirements to scale and upgrade a Swarm object storage cluster?
Capacity can be increased in a Swarm cluster dynamically while the system is running. Simply add a new node to the cluster and the available capacity is automatically added to the available pool without the need to provision or configure the new storage. Upgrading the server hardware for Swarm nodes is similarly easy. Boot the new, updated server(s) into the cluster then gracefully retire the older node(s) to be removed. All the content on the server node being retired is replicated to other nodes in the cluster and when completed its disks are wiped clean and it can be removed. All this is done while the Swarm cluster is operational and without impact to applications or data availability. We call this “hot scaling.”
What is the level of effort to administer Swarm object storage?
Swarm is easy to administer. It eliminates the need to provision or configure storage when new capacity is added. Its self-healing characteristics allow the Swarm cluster to seamlessly recover from a failed node or disk without impacting data availability. If a node goes down, the cluster immediately recognizes its loss and the rest of the cluster works together to replicate all the content on the impaired node. This occurs without administrator intervention or impact to applications and data availability. The Swarm cluster is also self-balancing; that is, it will automatically balance stored content evenly across nodes in the cluster for optimal performance and to eliminate hot spots. All of these actions require minimal administrative overhead, and a Swarm cluster can be managed from a central browser interface. A single system administrator can easily manage a 20+ PB Swarm cluster.
What is Darkive™?
Darkive is a Caringo patented adaptive power conservation technology that monitors storage system operations and automatically spins down disks and reduces CPU utilization.
How do you integrate an application with Swarm object storage?
Applications integrate with Swarm using the S3 protocol or NFS. For direct integrations where functionality beyond the S3 protocol or NFS are required, Swarm uses a simplified subset of the HTTP 1.1 standard called Simple Content Storage Protocol (SCSP). It is an on-the-wire protocol that will never be outdated and will never require porting. Essentially, there is no proprietary API and any application or web service can be interfaced to Swarm in a matter of hours.
What are the differences between block storage and object storage?
Unlike file systems that ride on top of block storage devices, Swarm object storage provides a single, flat address space to store content. Information about the file in Swarm is stored with the object using metadata (that is, data about the data). In file-based storage, information about the file is stored in the file system while the file itself is fused to a specific hardware location through file hierarchies, folder names and physical disk location (inodes and blocks). Swarm stores files as whole objects or object segments in contiguous disk space and only needs to manage a single UUID for each piece of content. This approach virtualizes information from the hardware layer, enabling the movement of objects throughout the storage system and the continuous evolution of hardware while maintaining data integrity and durability. For a more detailed comparison between block storage and object storage, read the blog What’s the Difference Between Block, File and Object-based Data Storage?.
Does Swarm support custom metadata?
Yes, Swarm allows custom metadata to be defined by applications to uniquely describe content objects. Swarm stores all metadata with actual content and it is persisted through its lifecycle. Other metadata elements include number of replicas to be maintained, erasure coding scheme, retention period, content type, file name and originating application. Swarm also supports a special metadata element called the Lifepoint™. To learn more watch how to use metadata with object storage.
What are Lifepoints?
Lifepoints are system-enforced and managed content lifecycle policies. Swarm stores all operational and descriptive information needed to execute these policies with the object itself – eliminating the need for a vulnerable metadata database and associated database administration.
The Health Processor continuously runs in the background enforcing Lifepoints and ensuring optimal use of available resources system wide. To learn more about the Health Processor and Lifepoints, read the Protecting Data with Caringo Object Storage whitepaper.
Can Swarm object storage store encrypted data?
Yes, Swarm supports full volume AES-256 encryption. Swarm will ensure that the information you store is exactly the same as when you stored it and has not been tampered with.
Does Swarm object storage provide index and searching?
Can you run Swarm object storage as a Virtual Machine?
Yes, Swarm object storage can run as a VM and has been tested in a VMware ESXi environment.