Caringo: Fixed Content Storage
Home | Products | Solutions | Partners | Resources | FAQ | News | Corporate | Contact
     

CAStor Content Storage Software
Frequently Asked Questions
1. What is CAStor?
2. How does CAStor work?
3. What is Content Addressable Storage (CAS)?
4. Where should UUIDs be stored?
5. How does CAStor provide data protection for content?
6. How is data integrity assured by CAStor?
7. How difficult is CAStor to set up?
8. What is required to scale and upgrade a CAStor cluster?
9. What is the level of effort to administer CAStor?
10. How does an application integrate with CAStor?
11. How is CAStor different from traditional file systems using block storage?
12. Does CAStor support custom metadata?
13. Is encryption built in to CAStor?
14. Does CAStor provide index/search functionality?
15. Does CAStor provide de-duplication?
16. Does CAStor provide WORM storage?
17. How can CAStor reduce my total cost of ownership (TCO)?

1. What is CAStor?  top
CAStor is content storage software designed to store unstructured data also referred to as fixed content or reference information such as documents, e-mail, images, audio, video, voice mails, ringtones, and medical images and records. Essentially CAStor deals with any digital data other than transactional database data.

CAStor software is distributed on a USB thumb drive and runs on standard, commodity server hardware (x86), which enables organizations to implement affordable clustered storage that delivers high performance, scalability and reliability. CAStor may be used to store all of an organization’s content because it delivers the speed of primary storage that is secure and cost-effective enough to archive content for the duration of its useful life.

2: How does CAStor work?  top
CAStor is based on content addressable storage (CAS) architecture. CAS does not employ traditional path names or physical addresses for storing and accessing content as is done with a file system, which become complex and brittle at scale. Rather it delivers a flat address space using a Universally Unique Identifier (UUID) enabling it to scale to billions of objects. CAStor virtualizes internal disks across a cluster of commodity server nodes creating a single storage pool that can grow to Petabytes in capacity.

3. What is Content Addressable Storage (CAS)?  top
The easiest way to understand CAS is to use the analogy of valet parking. When you hand over your car to have it parked, you are issued a ticket with a specific number on it. In order to retrieve the car, you simply provide the ticket/number to the valet. You don’t care where the car is parked, if it was moved or how it is stored as long as it is returned in the same condition.

CAS functions similarly, but with digital content. A file is submitted to CAS and it returns a key or unique identifier to the application for future access. When that file is later requested for retrieval, the application passes the key back to the CAS and the file is retrieved. There are no file hierarchies or folder names or disk locations associated with the stored file. If the file is moved in storage, the key never changes.

4. Where should UUIDs be stored?  top
UUIDs can be stored in applications, in documents or in a database. There are no restrictions as to where or how the UUIDs are stored.

5. How does CAStor provide data protection for content?  top
There are several facets of data protection that CAStor provides. It protects against loss of content objects using replication where it creates one or more exact replicas (copies) of each file stored in the system. Each replica is stored on a different node within the CAStor cluster to ensure that if a node fails, another replica will be accessible through that different node. Content can also be replicated to geographically dispersed clusters for disaster recovery or business continuance purposes. CAStor also protects content by providing WORM storage and by enforcing retention policies to meet regulatory compliance and internal governance mandates. Content cannot be changed once stored and cannot be deleted until its retention period has expired.

6. How is data integrity assured by CAStor?  top
CAStor employs a hash algorithm that computes a digest, sometimes referred to as a digital fingerprint, based on the bit sequence for each content object (file). The digest is used by CAStor’s Health Processor (HP) that runs in the background and continuously checks the content’s integrity to determine if there has been any corruption on disk. If an object is determined to be corrupt a new replica is generated from another correct replica stored in the system. This ensures that there is always the correct number of clean replicas available and accessible in CAStor.

The hash digest is also used as the Content Integrity Seal, which is a method to prove the authenticity of a content object as an original for compliance and evidentiary purposes in an open, customer auditable data structure. Unlike other CAS systems, CAStor separates the content address (UUID) from the digital fingerprint (digest) allowing the hash algorithm to be seamlessly upgraded if the original is compromised. This has happened with both the MD5 and SHA-1 algorithms and CAStor’s patent-pending, transparently upgradeable hash assures the long-term integrity of content.

7. How difficult is CAStor to set up?  top
CAStor is a software solution that is designed for a simple and easy set up. Plug a CAStor USB key into a commodity server, boot the system, and in 60 seconds, you have a running CAStor node. Connect a second server via Gigabit Ethernet, boot with another USB key and you have a 2 node cluster. Simply repeat the process to implement a cluster of the size initially required.

8. What is required to scale and upgrade a CAStor cluster?  top
Capacity can be increased in a CAStor cluster dynamically while the system is running. Simply add a new node to the cluster and the available capacity is automatically added to the available pool without the need to provision or configure the new storage. Upgrading the server hardware for CAStor nodes is similarly easy. Boot the new, updated server(s) into the cluster then gracefully retire the older node(s) to be removed. All the content on the server node being retired is replicated to other nodes in the cluster and when completed its disks are wiped clean and it can be removed. All this is done while the CAStor cluster is operational and without impact to applications or data availability. Caringo calls this “hot scaling.”

9. What is the level of effort to administer CAStor?  top
The CAStor cluster is easy to administer. It eliminates the need to provision or configure storage when new capacity is added. Its self-healing characteristics allow the CAStor cluster to seamlessly recover from a failed node or disk without impacting data availability. If a node goes down, the cluster immediately recognizes its loss and the rest of the cluster works together to replicate all the content on the impaired node. This occurs without administrator intervention or impact to applications and data availability. The CAStor cluster is also self-balancing such that it will automatically balance stored content evenly across nodes in the cluster for optimal performance and to eliminate “hot spots.” All of these actions require minimal administrative overhead and a CAStor cluster can be managed from a central browser interface that is the same whether there are 3 nodes or 3,000 nodes in the cluster.

10. How does an application integrate with CAStor?  top
The Simple Content Storage Protocol (SCSP) is the native interface to CAStor and is a subset of the HTTP 1.1 standard. It is an on-the-wire protocol that will never be outdated and will never require porting. Essentially, there is no proprietary API and any application or web service can be interfaced to CAStor in a matter of hours.

For applications where there isn’t access to the original code or are based on traditional file-based protocols, Caringo offers the CAStor File System Gateway (FSG). This provides applications the ability to access CAStor using CIFS and NFS. CAStor FSG is not a classic file system. Rather it is a thin mapping layer that looks like a file system to applications, and speaks HTTP to CAStor. On the front-end it presents a standard file system interface and on the back-end it delivers a vast flat address space, massive scalability, high performance, and reliability.

11. How is CAStor different from traditional file systems using block storage?  top
Unlike file systems that ride on top of block storage devices, CAStor provides a single, flat address space to store content and does not have the complexity of file hierarchies, folder names or physical disk locations associated with each file. It does not break a file up into blocks (bits and pieces) and then need to manage a number of individual blocks associated with each file. CAStor stores files on a whole object basis in contiguous disk space and only needs to manage a single UUID for each piece of content. That means CAStor FSG simply manages UUIDs giving it the ability to scale beyond the constraints traditional file systems encounter in terms of number of files and amount of capacity supported.

12. Does CAStor support custom metadata?  top
Yes, CAStor allows custom metadata to be defined by applications to uniquely describe content objects. CAStor stores all metadata with actual content and is persisted through its life cycle. Other metadata elements include number of replicas to be maintained, retention period, content type, file name, originating application and others. CAStor also supports a special metadata element called the LifePoint that allows an application to describe how a file should be managed during is lifecycle in CAStor.

13. Is encryption built in to CAStor?  top
CAStor does not encrypt data by itself. As there are many excellent encryption products available today, there is no reason to reinvent the wheel. Choose the encryption algorithm you like best, encrypt your content, store it in CAStor, and decrypt it when it comes out. CAStor will store whatever you put into it.

14. Does CAStor provide index/search functionality?  top
Caringo is firmly on the side that believes index/search belongs at the application layer and not the storage layer. Most, if not all, content management products provide index/search as a feature and other stand-alone search engines can easily crawl a CAStor repository using FSG to index content and provide search functionality.

15. Does CAStor provide de-duplication?  top
There are two types of approaches to eliminating duplicate data. De-duplication is a block level compression technique which has no meaning in a file based approach and is most appropriate for backup scenarios. In this method duplicate blocks of data that could span across multiple files are eliminated. Single instance storage (SIS) is the file analogy to de-duplication. This is a matter of reducing duplicate content while maintaining unique metadata. CAStor has implemented the architecture for asynchronous single instance storage so that a background process will eliminate duplicate content over time with little or no impact to performance.

16. Does CAStor provide WORM storage?  top
Yes, CAStor can provide WORM storage if specified when content is written. Once specified, WORM content can never be deleted. CAStor can also manage content lifecycle information automatically and one can specify that a file cannot be changed throughout its defined life and cannot be deleted until the retention period has expired. This addresses regulatory mandates such as SEC17a4, which is the most stringent regulatory requirement defined for data storage. CAStor has been designed to deliver the assurance that content can be stored safely over extremely long periods of time.

17. How can CAStor reduce my total cost of ownership (TCO)?  top
CAStor has a dramatic effect on TCO components which is described in detail in the
CAStor and the TCO Effect Information Brief