As we head into the heart of summer, it reminds me of how quickly time flies and how rapidly technology changes. With a mature product such as Swarm, new features are added regularly to keep up with the evolving needs of the marketplace and our customers. In addition, we consistently review the value of existing features to ensure they not only meet current needs but anticipate future needs as well.
Two short years ago, my colleague Jamshid “Jam” Afshar blogged on how Elasticsearch & Object Storage solves petabyte-scale search as he prepared to discuss the topic in a webinar. Next week, Jam will be joining me on our monthly Tech Tuesday webinar to discuss using Elasticsearch with Object Storage.
What is Elasticsearch and Why Should I Use it?
Elasticsearch is a distributed search and analytics engine that offers a RESTful API which can be used with object storage to enhance metadata searching operations.
In Swarm Object Storage, Elasticsearch provides the ability to list and query objects based on their metadata information. This is a key capability needed to bring structure to a large pool of unstructured data. (If you want to learn more about metadata with object storage, I highly recommend you watch our Using Metadata with Object Storage webinar.
Why Does Swarm Object Storage Use Elasticsearch?
At Caringo, we were early adopters of Elasticsearch (going as far back as Elasticsearch version 0.90) because we needed a scalable solution to solve the problem of listing objects in a Swarm cluster. At the time, we evaluated NoSQL approaches including Solr, Elasticsearch and MongoDB in addition to traditional SQL database offerings (noting that traditional SQL databases lacked necessary scale and still do). We found that Elasticsearch was by far the most promising solution. Specifically, it passed our rigorous testing standards for speed of writes/updates and searches.
Additionally, Elasticsearch included an extensive API for management and diagnostics in an Elasticsearch cluster. Fulfilling the promise that we saw in the infancy of Elasticsearch, the technology has grown in popularity and reach with many large Elasticsearch deployments in production to date.
How Does Elasticsearch Provide Structure to “Big Data?”
Swarm Object Storage software is fully integrated with Elasticsearch. This is implemented in the form of a “search feed” which populates the Elasticsearch cluster with the metadata information present on the stored objects. This information is effectively cached in the Elasticsearch cluster for fast list and query operations.
Furthermore, the Swarm API itself is extended to allow for list and query of Swarm objects in terms of their metadata. This results in the ability for Swarm to index object metadata in near real time, enabling you to perform ad hoc searches on the metadata attributes of stored objects. With the Swarm Content Portal, we take things further by providing a web UI which allows you to easily save frequently used queries as Collections. These Collections can be presented as virtual folders which will always return the latest set of objects that meet the criteria for the query.
Note that although Swarm Software is the ultimate authority for the metadata information of all objects stored, it’s still a best practice to take a snapshot of your Elasticsearch index. This allows for decreased time to recovery in the event of an unanticipated failure and allows you to quickly return list and query capability to clients and applications which depend on it.
Ready to Learn More?
Register today for our July 16 webinar on Elasticsearch where Jam and I will:
- Explain what Elasticsearch is and the benefit of using it with object storage
- Take an in-depth look at best practices for using Elasticsearch with object storage
- Demonstrate the use of Elasticsearch with Caringo Swarm