At the heart of every digital data storage system is an abstraction layer created to translate the human-viewable form of data with the underlying binary segments on the storage media. This layer is commonly referred to as a namespace. Historically, the namespace has made digital data usable; but, in today’s world of exponential data growth and instant accessibility, the namespace has become the main factor in limiting data reuse and operational flexibility. Let’s take a closer look at why.
File System is a Namespace
In a file system, the names of a particular file are stored in a hierarchical fashion where
[device name]:directory/subdirectory/filename leads to the actual file with access enabled via a standard storage protocol like SMB or NFS. This location points to a specific set of inodes which ultimately leads to the binary data on the storage media (an oversimplification, but you get the idea). There are two major issues with the file system. The first is that there is a set number of inodes limiting the address space and therefore the ultimate scale of the data store. The second is that you need to know the entire path of a file. It is basically attached to that logical location. Moving a large amount of files or upgrading the underlying hardware is a difficult and often risky process.
Distributed File System and Clustered File Systems
The scale issue (in a single location and over geographic distances) led to the creation of Distributed File Systems and Clustered File Systems. This took care of the inode issue and accessibility is still viable through standard protocols, but the underlying operational issues associated with managing the file system and the actual physical location of the bits remained, not to mention the added complexities of data durability and consistency across nodes.
Caringo’s Global Namespace
At the top of the IT wishlist in every organization is data and application portability. To solve this, many on the vendor side have integrated global namespaces and open standard APIs (or de facto standards like Amazon S3). I will save a discussion on APIs for a future blog, but do want to discuss our approach to the Global Namespace.
First, we’ve integrated S3 and NFS as front-end protocols into our universal namespace so that legacy applications and workflows can all operate on the same data.
Second, many Global Namespaces are still limited because they rely on an underlying File System or a Controller Node (or database). We take the Global Namespace further than most to a 128-bit Universal Namespace with a massive address space of 3.4E+38. Every file/object across any of the deployed Caringo solutions gets a unique ID. And that ID is stored with the object, meaning it is self-describing. This solves both the scale limits and the operational limits associated with file-system-based solutions. This also solves rebalancing issues associated with some file-system-based object storage solutions. The result is the ability to instantly find data regardless of location, continuously upgrade hardware, mix and match hardware and achieve an unrivaled level of resilience. For example, you can actually take a drive out of a chassis and plug it into another chassis and the data will instantly be accessible.
From Object Storage to Hybrid Cloud
With enterprises moving more and more of their data and processes to the cloud, the nagging issue centers around what namespace that data will reside in up on the cloud. Most gateway vendors today offer hybrid cloud solutions, whereby the data must always pass through their systems. But if you want to leverage your favorite cloud provider’s compute services, your data must be in a format that they support. In the coming months, we’ll talk more about extending our universal namespace to the cloud. In the meantime, if you have questions, feel free to contact us or visit us at one of our upcoming events.
From monitoring volcanoes and earthquakes to crop yield analysis, wildlife and insect migratory patterns, JASMIN is giving mankind unrivaled insight into our natural world. The JASMIN facility is a "super-data-cluster" that delivers infrastructure for data … More Details »
Abstract: This whitepaper presents the results from recent benchmarking of Caringo Swarm object storage on a multi-Terabit converged Ethernet Software-Defined Storage Super Data Cluster deployed by the UK Science and Technology Facilities Council’s (STFC) Scientific … More Details »