How Do I Use Metadata with Object-based Data Storage?

Take a deep dive on metadata and how it can be used to unlock the intelligence potential that resides in large data storage repositories

 

Metadata is data about the data, but how does it work with object storage? Let’s take a deep dive on metadata after a quick review on the basics of object storage.

Overview:
  1. Object Storage Basics
  2. REST Content API
  3. What is Metadata?
  4. Metadata Affect on Storage Objects
  5. Metadata Requirements
  6. Metadata Example
  7. Metadata Enabling Search
  8. What is Object Versioning?
  9. Managing Object Versioning
  10. Data Storage Collections
  11. Elastic Stack & Kibana
  12. What is Elasticsearch?
  13. Conclusion

What are the Basics of Object Storage?

While object-based software-defined data storage has become more popular in the last few years, there are a number of consistent qualities and features in best-of-breed object storage products like Caringo Swarm. This includes:

  • Objects are composed of metadata and data, and ideally this allows them to be self-describing and available for search.
  • Data protection is provided by replication and erasure coding.
  • High-availability is built into the architecture and is active-active.
  • Namespaces are flat and vast, not hierarchical and constrained.
  • Access is through HTTP(s) protocol, making it a natural for mobile and responsive as well as traditional enterprise application development.
  • Amazon S3 *is* Object Storage, and the S3 protocol is widely supported across object storage and application vendors.

Learn More About S3

If you would like to learn more about S3 API support in object-based data storage, watch our Tech Tuesday webinar: What your storage vendor isn’t telling you about S3 on demand.


REST Content API for Applications

HTTP is the most commonly used protocol in the world and there are libraries in every relevant language. The REST Content API offers a number of benefits for object storage, including:

  • Direct access through HTTP 1.1
  • High throughput and parallel processing
  • Request methods: POST, PUT, GET, HEAD, DELETE
  • Metadata-driven versioning

What is metadata? data about data, information, part of every object, a love note to the future – jason scott internet archive

What is Metadata?

Let’s dive a bit deeper into metadata, a term first used in 1983 (according to the Merriam-Webster dictionary). The prefix “meta” is generally used with the name of an established scientific discipline to denote a more comprehensive study that transcends the scope of the original discipline.

  1. Metadata is data about the content data
  2. Metadata is information about the object
  3. Metadata is part of every data storage object
  4. “A love note to the future.” —Jason Scott, Archivist, Internet Archive

Metadata makes data storage object playable, searchable, displayable, actionable, executable

How does Metadata Affect Data Storage Objects?

So, why does it matter? Enabled by metadata, objects become playable, displayable, actionable and executable. Maybe most importantly, metadata make objects findable through indexing and search. If you can’t find it, you don’t have it.


What are the Requirements for Using Metadata with Object Storage?

In the Caringo Swarm ecosystem, metadata can quickly be added to objects directly from the client. Metadata needs to be portable, so Caringo Swarm stores the metadata directly with the object, rather than in a separate database (a scaling impediment employed in some object storage solutions).

Annotating objects with metadata advances the way files can be searched, organized and analyzed at scale. Once on Swarm, data can be profiled with big data analysis tools (such as Kibana) and collections of files can be mounted based on the result of a metadata search.


Metadata in Swarm HTTP interface, with basic metadata fields and full metadata fields

Metadata Example in Swarm Object Storage HTTP Interface

Basic metadata fields, full metadata fields and the actual GET request in the Swarm HTTP interface.

Basic Metadata Fields: tmBorn; content-length; name; content-type; etag;
Full Metadata Fields: Content-base; content-disposition; content-encoding; content-language; content-location; content-md5; lifepoint; x-*-meta[-*];
Request: GET http:///?format=xml&fields=name,content-length,x-color-meta

Swarm search enabled by metadata, actionable insight, dynamic organization, integrated search

Each Search Feed indexes metadata in Elasticsearch. Essentially, this helps you find a needle in a haystack, a very large haystack at that! The Swarm Search capabilities map one to one with S3 metadata and brings a number of benefits including:

  • Actionable insight with targeted analysis
  • Dynamic organization of content using classification, key words and descriptive content, with multiple ways to track that content
  • Integrated search stack optimized within the storage system

Metadata-Driven Object Versioning, Listing all historical version in a bucket. Request and Response Example

What is Metadata-Driven Object Versioning?

In the code example above, you can see how metadata is used in Swarm object versioning. This illustrates how it works. However, you don’t need to be able to understand it at the command line or write command line code on a line-by-line basis to use the Swarm Object Versioning.


Object versioning and editing in Swarm object storage software web interface

How Do You Manage Object Versioning and Editing in Swarm Object Storage?

At Caringo, we have added a front-end user interface (UI) that you can use to control versioning via drop-down menus and fields. You can see these screen interfaces in the illustration above. It allows you to log in, view previous versions, and you can delete previous versions or revert to them.


Search metadata data storage collections at scale in Swarm object storage web interface

Searching Data Storage Collections at Scale

Collections allow you to search at scale as illustrated above. This essentially corresponds to the concept of a Smart Folder in your email, on your filesystem or your laptop. You can persist a search to pull dynamic lists of objects that match those patterns. For example, you could run a list of objects that have been run through an anti-virus check and a list of those that have not. Then, you can process objects as needed to fit them into your workflow, move them to another bucket, etc. It means you do not have to pass operations on millions of objects individually. You can do a downstream scripting on objects to process them as needed.


Kibana and Elastic Stack with Object-based Data Storage

Leveraging Elastic Stack with Kibana and Object-based Storage

Caringo Object Storage leverages Elastic Stack with Kibana. This gives you the ability to run your own reports, analyze objects at scale and easily customize dashboards. This is a key tool to use at runtime and can be used for tasks such as intrusion detection and packet analysis. It builds the metadata story in the right way so it brings you the high-availability and data protection that is at the core of the object storage promise.

Mathematician Clive Humby said that “Data is the new oil.” Using the Elastic Stack helps you to unlock the value of that data.


Elasticsearch and Object Storage Integratopm

What is Elasticsearch and How Does it Work with Object Storage?

Elasticsearch is a distributed, RESTful search and analytics engine that can be used with object storage to enhance effectiveness of metadata searching operations. Caringo Swarm Object Storage integrates Elasticsearch and extends the Swarm API with commands for querying metadata. Through this feature, Swarm indexes object metadata in near real time and lets you perform ad hoc searches (via query commands) on the attributes and metadata of your stored objects.

View our webinar on demand to learn more about best practices for using Elasticsearch with object storage and for a demonstration of how Elasticsearch is used in Caringo Swarm Object-based Data Storage Software.

Conclusion

Metadata enables possibilities to unlock information and trends that can transform your business. From scientific discovery to business intelligence, metadata is the key to unlocking the data that transforms the world we live in and the way organizations and businesses function.

If you want to unlock the intelligence potential that resides in your large data repositories, contact us today to set up a consultation and custom demo.