A couple of weeks ago, I discussed RAID vs. advanced erasure coding in the context of protecting vital healthcare data. This week, I want to talk about some issues around the storage of object metadata.
One of the great advantages of object storage is that each object can have a collection of metadata that describes the object. Metadata may include a name for the object, a location, the format or type of the data, categories that it is a member of, etc. Some object storage systems store this metadata in SQL databases or worse, flat files in their underlying filesystem’s directory, which are both terrible architectural models. Why?
For starters, the database (or flat files) become the bottleneck for the storage system because they are hit so often by requests from applications, resulting in the database or storage nodes becoming bogged down, severely impacting scalability. That’s why NoSQL is so popular in large-scale or big data applications—because it scales so much better than traditional SQL or flat file approaches.
As we all know, SQL databases require ongoing administration and the schema are rigid, stifling the ability of an organization to extend metadata with additional information that are relevant to the application. In the case of using flat files instead of a real database, additional problems crop up such as rolling back a transaction, recovering from a data corruption event or getting a reasonable-sized answer set for a given query.
Lastly, and this is the one that makes me tremble in fear, what if a failure occurs? What if the storage of the objects somehow becomes out of sync with the metadata database because a database server or a switch or an application or something else failed or hiccuped? When the metadata database has to be updated every time an object is updated, it is the responsibility of the application to keep the metadata database in sync with the objects that are being stored. But what if the application gets confused or overwhelmed or crashes? And, honestly, do I have to administer the metadata database myself? Do I need to architect a proven, mission-critical metadata database infrastructure that can withstand drive failures or other kinds of failures and heal itself? Yikes!
Not to worry—as long as you are using Caringo storage systems. Swarm stores the metadata with the objects so it can never be torn apart or get out of sync. The metadata is easily extended to incorporate additional custom elements and it is automatically protected by the elastic content protection that is used in Caringo Swarm to protect the objects. Applications are not themselves responsible for keeping metadata in sync with the objects and they can ask any of the nodes in Swarm for an object’s metadata, and get just the answer set they need, so there is no bottleneck on a centralized SQL database server. Metadata storage, like every aspect of Caringo, is simple, bulletproof and limitless.
Have questions? Feel free to contact us, or join one of our upcoming webinars:
What are the characteristics of each storage tier and when should you use NVMe, SAN, NAS, Cloud, Object, or Tape Storage? More Details »
While object storage isn’t a panacea, it is an increasingly important storage technology that enables on-demand access for video workflows. More Details »