“Metadata, you see, is really a love note – it might be to yourself, but in fact it’s a love note to the person after you, or the machine after you, where you’ve saved someone that amount of time to find something by telling them what this thing is.”
The terms “metadata” and “metadata search” get thrown around often in the data storage industry. What does the word metadata mean, how does metadata work and why is it important? And, just as important, why is metadata suddenly a hot topic in 2020?
What is Metadata?
Metadata is literally data about the data. The term, coined in 1983, was formed by combining the Latin term “meta” which means “transcending” and data (a term first used in 1646), which is factual information.
According to Wikipedia, the first description of metadata came from MIT’s Center for International Studies experts David Griffel and Stuart McIntosh in 1967.
“…we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data.”
David Griffel and Stuart McIntosh
Sir Tim Berners-Lee, acknowledged as the inventor of the World Wide Web, noted that the phrase “machine understandable” is key in his definition.
“Metadata is machine understandable information about web resources or other things.”
Sir Tim Berners-Lee
Where is Metadata Used?
The easy answer for where is metadata used would be “just about everywhere data is stored.” You don’t have to look far to find places where metadata is used. Metadata enables us to access information and entertainment every day, right at the tip of our closest device. It is also used to aggregate data for various types of research (scientific, business intelligence, etc.). Whether it is weather, healthcare, scientific research, biomedical, geospatial (such as that stored in the UK STFC JASMIN cluster), data warehousing, movies, music, or any other arena where digital data is stored, metadata is a valuable tool.
As the daughter of a librarian, I would be remiss if I did not mention metadata’s storied history in the world of Library Science. Remember the days of card catalogs? From the mid 1800s, those little cards were designed to include metadata for books, including:
- Date of publication
- Location of asset (e.g., Dewey Decimal or Library of Congress)
This was a painstaking process, but it enabled librarians and library users to easily find books by author, title, subject, and so on. Libraries migrated away from card catalogs to using digital databases in the 1980s. At that time, I was a student at the University of North Texas working part-time in the campus library. The transition from a paper catalogue to an online system was a tremendous undertaking, however, it was well worth the result for both staff and students.
Surveillance video is another good example of how metadata is used. The date and time stamps and locations are critical for surveillance video to be useful in solving and prosecuting crimes.
Last month, Michael “Q” Brame chronicled his adventures in Home Surveillance Using NFS and Object Storage. When we posted the blog, we created metadata for the blog in the form of a short description and also created metadata about the image we used to show Q’s set-up. This type of metadata helps people to locate the material.
Metadata About a Blog Post
- Publication Date
Photo Metadata on Local Filesystem
- MIME Type
- File Size
- File Location
- Created Date/Time
- Modified Date/Time
- Date Last Opened
- Photo Dimensions
- Color Space
- Color Profile
- Alpha Channel
- Name and Extension
Email Metadata on Incoming Message
Every month, I write a newsletter that is sent out via email (you can subscribe if you do not currently receive it). The image above shows metadata about the newsletter I sent out yesterday.
- Date Sent
- From Email Address
- Reply-To Email Address
- …and more
Metadata for a Relational Database
Metadata is also recorded for items in a relational database, such as in this listing of content from the Caringo.com website.
- Allow Null
What are the Types of Metadata?
Descriptive, structural, administrative, rights management, preservation guide and accessibility are all types of metadata. The efforts to standardize metadata have led to multiple systems and what is used by one industry or standards organization is not necessarily used across the board.
How is Metadata Generated and Stored?
Swarm Object Storage allows custom metadata to be defined by applications. This metadata is used to uniquely describe content objects. There are a number of other metadata elements that are stored by Swarm—including:
- Number of replicas to be maintained
- Erasure coding scheme
- Retention period
- Content type
- File name
- Originating application
- Lifepoint™ (Caringo system-enforced and managed content lifecycle policies)
Swarm stores the operational and descriptive information to execute the Lifepoint policies as well as all other metadata with the object itself, keeping it portable and thus eliminating the need for database administration and a separate metadata database (which introduces unnecessary vulnerability and becomes a scaling impediment).
How Does Metadata Help You?
Simply put, metadata helps you help your organization by making sure that you can efficiently move and retrieve data in a manner that makes sense for your business or endeavor. For example, if you are a news organization and want to look for footage from a certain event, wouldn’t it be nice to have them in a centralized storage repository and to have each file tagged with the date, time, place and event so you could retrieve it quickly? Certainly a lot easier and faster than going to pull old tapes and then having to review them to find the footage.
Using metadata to annotate objects in storage can open up opportunities for how files can be searched, sorted and analyzed at scale. It is also key to being able to monetize existing content and build new revenue streams.
In the screenshot above, you can see how a search can be performed on the objects in a Swarm Cluster by selecting the type of metadata you want to search on.
Why is Metadata a Hot Topic Right Now?
To say there is a lot going on in today’s world would be a tremendous understatement—from scientific research organizations using high-performance computing (HPC) and managing huge data sets to video production organizations looking at how to create fresh content using an active archive. When used properly, metadata unleashes a host of benefits that are useful to almost every organization that stores some type of digital information or assets.
In an article authored by Caringo Co-founder Jonathan Ring, he listed 5 benefits of metadata for business information storage. They are:
- Eliminates ambiguities.
- Creates a data trail.
- Improves risk profiling.
- Establishes more efficient storage uses.
- Eases migration.
This is a good starting point for thinking about just how metadata might benefit your organization.
Resources to Learn More About Metadata
The following resources were consulted in writing this blog, and may be helpful in broadening your knowledge of metadata:
- Metadata Architecture, Sir Tim Berners-Lee, World-Wide Web Consortium, 1997-01-06
- The Metadata Mania, Jason Scott, The Metadata Mania, ASCII weblog, 2011-06-11
- Understanding Metadata, National Information Standards Organization (NISO), 2004
- Metadata, Wikipedia
- Metadata: Your City’s Secret Weapon, Jonathan Ring, Government Technology, 2016-05-27
- What Metadata Means for Business Information Storage, Jonathan Ring, Alley Watch, 2016
- How Do I Use Metadata with Object-based Data Storage? (webinar featuring Ryan Meek, Caringo Principal Solutions Architect), 2019-02-26