Swarm Technology | Caringo
 

Data Integrity, Organization and Search at Scale

At Caringo, we work tirelessly to ensure that our technological differences and innovations solve the most pressing issues associated with aggressively scaling storage environments–from several terabytes to multiple petabytes.

Swarm Portal

 

Unified Management & Reporting

Swarm has a unified web console for administrators and end users to track performance trends and reporting and manage and monitor content usage quotas (which are integrated with existing metering APIs). Swarm provides hierarchical data protection policies that can be managed by collections, buckets, etc. so your Storage Admin can set permissions and policies and delegate control and management to BUs or end users.

Infrastructure Management

Ensure your cluster is running smoothly with a detailed view of the services needed to run Swarm and network, node and feed status. Hierarchical data protection policies can be set through Swarm’s UI or via an API across clusters, domains, buckets or files. Historical metrics provide trend analysis to better enable capacity planning and monitoring.

Content

The Swarm portal makes it easy to upload, search, and view content. Add custom metadata, automatically capture file system attributes on ingest and query across all stored files at once through a web-based UI or management API.

Scale-out Storage Cluster

All Swarm nodes can handle all operations. An innovative algorithm and caching is used to dramatically reduce overhead and manual management associated with other architectures like RING. There are no single points of failure like controller nodes, databases or management nodes.

Swarm’s unique architecture enables the massively parallel interaction between each individual node, automating capacity and load balancing. Swarm doesn’t rely on a file system or database that gets brittle with size. All operational and descriptive information is stored as metadata encapsulated with each object—meaning you can scale quickly to 100s of petabytes and beyond.

Content is managed from creation to expiration via Lifepoints—administrator-defined policies stored as metadata that automatically manage number of replicas, erasure-coding scheme, delete protection and deletion.

Use Darkive™ patented adaptive power conservation technology to selectively spin down drives and power step node CPUs to reduce power consumption by up to 70% in cold archive use cases, drastically reducing TCO (comparable to tape-based archive).

White Paper

Elastic Content Protection:

Replication and Erasure Coding Explained

Elastic Content Protection

Caringo Swarm’s Elastic Content Protection combines automated management of replication and erasure coding with continuous integrity checks and fast volume recovery. All nodes participate in recovery of lost data through Swarm’s innovative distributed algorithm which gets faster as the cluster grows.

Swarm is highly available by design, supports hot-plug drives, adding/retiring disks, and rolling upgrades of the full software stack—without service downtime.

Erasure coding reduces footprint and increases data durability while replication ensures rapid access. Chose the protection method that fits your business, retention, or SLA requirements. Set protection policies per object and store replicated and erasure-coded objects on the same servers, ensuring optimal use of hardware. Automatically shift between protection methods based on age, size, location, or type.

In addition, Swarm automatically caches hot content for reliable delivery, regardless of access patterns.

Meet regulatory mandates that content is stored on non-erasable, non-rewritable media. You can also use Legal Hold to create a point-in-time snapshot of a specified set of files at a specified time. The files are then immutably stored regardless of what happens to the original file or cluster. Patented technology lets you prove in a court of law that content has not been tampered with. Integrity seals are based on only the content and can be upgraded as newer hashing algorithms replace outdated ones.

Search

Ad Hoc Search and Query

Perform ad hoc searches and queries on metadata and export the results in JSON or XML, or view results immediately from the Swarm portal.

Dynamic Organization of Content

Collections are saved queries that run dynamically when viewed, streamlining the ability to view and identify relationships across diverse data sets.

Big Data

Swarm search results can be directly analyzed via Hadoop through SwarmFS for Hadoop or easily imported to analytics applications like Kibana.

Scalability

Swarm’s no-single-point-of-failure approach makes it simple to scale. Just rack servers, boot Swarm and store. All storage balancing is automated.

A mind-numbingly large namespace combined with Swarm architecture enables massively parallel scaling to hundreds of petabytes and hundreds of billions of files.

With the parallel nodes loosely coupled and no bottlenecking controller nodes or metadata databases to get in the way, just add the hardware of your choice for linear scaling of capacity and throughput.

Supports single or multi-site deployments (with parallel replication) to support a broad range of use cases and business requirements. Sites can have different hardware configurations and protection schemes.

Swarm provides consistent, rapid performance for small (Bytes) or large (Terabytes) files. For huge files, Swarm supports parallel uploading and can append to existing files without a complete rewrite. Swarm can extract every bit of value from standard economical hardware or you can choose higher performance servers, solid-state drives or any network topology to support performance-intensive workloads.

Multitenancy, Orchestration, Security and Metering

Administrators can easily manage tenants and integration into Enterprise Identity Management solutions like LDAP and Active Directory in addition to token-based authentication systems like Amazon S3.

Access a data feed of transactions and requests by domain such as storage requests, attempted logins, etc. to create detailed billing and accounting reports for bandwidth usage, access audits, and API request summaries. Graphical reporting is also supported through the administrator portal.

Easy storage management and flexible bucket naming is enabled by lightweight tenant and domain creation and allocation. Access and deliver content using your corporate domain name. Virtual hosting of buckets is also supported through the S3 API.

Optimize the use of your resources based on the value of your information with continuous protection, using replication or erasure coding at a single site or at multiple sites. We focus on fast, proactive recovery so your files will be there when you need them and provide AES-256 encryption at rest. Compliance features like WORM, integrity seals and legal hold come built in.

Swarm supports the Amazon S3 API through an extensible architecture that can be used to seamlessly support additional third-party APIs. A broad range of applications that currently support the Amazon S3 API work directly with Swarm.

Use LDAP, Active Directory, and Linux PAM authentication for integration into existing corporate identity management systems. Swarm also supports token-based authentication for pre-validated access logins.