Elasticsearch & Object Storage: Petabyte-Scale Search SolvedAs Caringo Swarm Object Storage has evolved, we have continuously added smart functionality that brings value to our customers (check out our Smarts of the Swarm whitepaper). Among the most helpful for our customers is Elasticsearch—a distributed, RESTful search and analytics engine that can be used with object storage to enhance the effectiveness of metadata searching operations. Elasticsearch provides you with the ability to see what you actually have stored in your object storage and to search for streams based on metadata.

We started using Elasticsearch in its infancy (version 0.90) because we cared about distributed computing and wanted to solve a problem with listing the streams in your Swarm cluster (Swarm had a Content Router for enumerating streams, but it was not realtime and did not have query capability). We evaluated a number of solutions, including Solr, Elasticsearch and traditional SQL databases (which lacked scale). Elasticsearch was the most promising and it passed our rigorous testing standards for speed of writes and searches. It also included an extensive API to use for diagnosing issues in an Elasticsearch cluster. Validating our choice, Elasticsearch has grown in awareness and reach with system administrators far beyond our original expectations.

In Swarm version 8, we improved Swarm feeds to allow multiple feeds. This allowed us to handle an Elasticsearch schema change without downtime—Swarm updates both the current and a new elastic database, you can switch to the new one once indexing is complete. Updated in Swarm version 9, Elasticsearch 2 is a fully integrated analytics engine designed for horizontal scalability and open access to Swarm’s operational metrics, metering, and metadata.

Elasticsearch can also be used with 3rd-party big data tools, like Kibana and Hadoop. Swarm integrates Elasticsearch and extends the Swarm API with commands for querying Swarm objects in terms of their metadata. (Learn more by watching The Power of Metadata webinar.) Through this feature, Swarm indexes object metadata in near real time and lets you perform ad hoc searches on the attributes and metadata of stored objects. Our Content Portal also provides users with an easy-to-use web UI with Collections for using Elasticsearch for metadata queries.

Swarm builds and maintains your search data (index) through your Search Feed, and it can regenerate the search index should it ever be lost. To minimize unavailability of listings, you can take a snapshot of your Elasticsearch index so that you can restore it for instant disaster recovery. You can even store this snapshot in a Swarm cluster using the Elasticsearch AWS Cloud plugin and Swarm Gateway S3.

Want to learn more? Join me and Caringo VP of Product Tony Barbagallo as we take an in-depth look at best practices for using Elasticsearch with Caringo Swarm Object Storage.

Elasticsearch and Object Storage Webinar
Wed, 7/26 10am PT l 1pm ET

Join us for our Elasticsearch and Object Storage Webinar Cover