- What is CAStor?
- How does CAStor work?
- How easy is CAStor to set up?
- What is Content Addressable Storage (CAS)?
- Where should UUIDs be stored?
- How does CAStor provide data protection for content?
- How is data integrity assured by CAStor?
- What is required to scale and upgrade a CAStor cluster?
- What is the level of effort to administer CAStor?
- How does an application integrate with CAStor?
- How is CAStor different from traditional file systems using block storage?
- Does CAStor support custom metadata?
- Is encryption built in to CAStor?
- Does CAStor provide index/search functionality?
- Does CAStor provide WORM storage?
- How can CAStor reduce my total cost of ownership (TCO)?
- Does CAStor provide high performance for both small and large files?
- Can you run CAStor in a VMWare environment?
- Can you run Content File Server (CFS) and Content Router (CR) in a VMWare environment?
1. What is CAStor? top
CAStor is object storage software designed to store unstructured data also referred to as fixed content or reference information. This includes documents, e-mail, images, audio, video, voice mails, ring tones, and medical images and records. Essentially CAStor deals with any digital data other than transactional database data.
CAStor software runs on standard, commodity server hardware (x86), which enables organizations to implement affordable clustered storage that delivers high performance, scalability and reliability. CAStor may be used to store all of an organization's content because it delivers the speed of primary storage that is secure and cost-effective enough to archive content for the duration of its useful life.
2: How does CAStor work? top
CAStor does not employ traditional path names or physical addresses for storing and accessing content as is done with a file system, which become complex and brittle at scale. Rather it delivers a flat address space using a Universally Unique Identifier (UUID) enabling it to scale to billions of objects. Applications requiring a predetermined name for their stored objects for application integration can alternatively define their own object names while still taking advantage of CAStor's scalability. CAStor virtualizes internal disks across a cluster of commodity server nodes creating a single storage pool that can grow to Petabytes in capacity.
3. How easy is CAStor to set up? top
CAStor is designed for a simple and easy set up. Plug a CAStor USB key into a commodity server, boot the system, and in 60 seconds, you have a running CAStor node. Connect a second server via Gigabit Ethernet, boot with another USB key and you have a 2 node cluster. Simply repeat the process to implement a cluster of the size initially required. Alternatively, large clusters can be centrally booted via a PXE boot server for ease of use.
4. What is Content Addressable Storage (CAS)? top
The easiest way to understand object storage is to use the analogy of valet parking. When you hand over your car to have it parked, you are issued a ticket with a specific number on it. In order to retrieve the car, you simply provide the ticket/number to the valet. You don't care where the car is parked, if it was moved or how it is stored as long as it is returned in the same condition.
Object storage functions similarly, but with digital content. A file is submitted/stored and a key/unique identifier is returned to the application for future access. When that file is later requested for retrieval, the application passes the key back to the object storage system and the file is retrieved. There are no file hierarchies, folder names or disk locations associated with the stored file. If the file is moved in storage, the key never changes. The file is free to move system wide enabling highly automated system management and optimization which is the reason why object storage can scale easily in file count and capacity.
5. Where should UUIDs be stored? top
UUIDs or user-defined object names can be stored in applications, in documents or in a database. There are no restrictions as to where or how the UUIDs or user-defined object names are stored.
6. How does CAStor provide data protection for content? top
There are several facets of data protection that CAStor provides. It protects against loss of content objects using replication where it creates one or more exact replicas (copies) of each file stored in the system. Each replica is stored on a different node within the CAStor cluster to ensure that if a node fails, another replica will be accessible through that different node. Content can also be replicated to geographically dispersed clusters for disaster recovery or business continuance purposes. CAStor also protects content by providing WORM storage and by enforcing retention policies to meet regulatory compliance and internal governance mandates. Content cannot be changed once stored and cannot be deleted until its retention period has expired.
7. How is data integrity assured by CAStor? top
CAStor employs a hash algorithm that computes a digest, sometimes referred to as a digital fingerprint, based on the bit sequence for each content object (file). The digest is used by CAStor's Health Processor (HP) that runs in the background and continuously checks the content's integrity to determine if there has been any corruption on disk. If an object is determined to be corrupt a new replica is generated from another correct replica stored in the system. This ensures that there is always the correct number of clean replicas available and accessible in CAStor.
The hash digest is also used as the Content Integrity Seal, which is a method to prove the authenticity of a content object as an original for compliance and evidentiary purposes in an open, customer auditable data structure. Unlike other CAS systems, CAStor separates the content address (UUID) from the digital fingerprint (digest) allowing the hash algorithm to be seamlessly upgraded if the original is compromised. This has happened with both the MD5 and SHA-1 algorithms and CAStor's patented, transparently upgradeable hash assures the long-term integrity of content.
8. What is required to scale and upgrade a CAStor cluster? top
Capacity can be increased in a CAStor cluster dynamically while the system is running. Simply add a new node to the cluster and the available capacity is automatically added to the available pool without the need to provision or configure the new storage. Upgrading the server hardware for CAStor nodes is similarly easy. Boot the new, updated server(s) into the cluster then gracefully retire the older node(s) to be removed. All the content on the server node being retired is replicated to other nodes in the cluster and when completed its disks are wiped clean and it can be removed. All this is done while the CAStor cluster is operational and without impact to applications or data availability. Caringo calls this hot scaling.
9. What is the level of effort to administer CAStor? top
The CAStor cluster is easy to administer. It eliminates the need to provision or configure storage when new capacity is added. Its self-healing characteristics allow the CAStor cluster to seamlessly recover from a failed node or disk without impacting data availability. If a node goes down, the cluster immediately recognizes its loss and the rest of the cluster works together to replicate all the content on the impaired node. This occurs without administrator intervention or impact to applications and data availability. The CAStor cluster is also self-balancing such that it will automatically balance stored content evenly across nodes in the cluster for optimal performance and to eliminate hot spots. All of these actions require minimal administrative overhead and a CAStor cluster can be managed from a central browser interface that is the same whether there are a couple of nodes or thousands of nodes in the cluster.
10. How does an application integrate with CAStor? top
Applications integrate with CAStor using standard HTTP 1.1. CAStor uses a simplified subset of the HTTP 1.1 standard called Simple Content Storage Protocol (SCSP) as the native interface to CAStor. It is an on-the-wire protocol that will never be outdated and will never require porting. Essentially, there is no proprietary API and any application or web service can be interfaced to CAStor in a matter of hours.
For applications where there isn't access to the original code or are based on traditional file-based protocols, Caringo offers the CAStor Content File Server (CFS), a Linux file system that supports all the major file protocols including CIFS, NFS, FTP, WebDAV as Mac clients and can also be run as a native Linux file system. CAStor CFS is not a classic file system. Rather it is a thin mapping layer that looks like a file system to applications, and speaks HTTP to CAStor. On the front-end it presents a standard file system interface and on the back-end it delivers a vast flat address space, massive scalability, high performance, and reliability.
11. How is CAStor different from traditional file systems using block storage? top
Unlike file systems that ride on top of block storage devices, CAStor provides a single, flat address space to store content and does not have the complexity of file hierarchies, folder names or physical disk locations associated with each file. It does not break a file up into blocks (bits and pieces) and then need to manage a number of individual blocks associated with each file. CAStor stores files on a whole object basis in contiguous disk space and only needs to manage a single UUID for each piece of content. That means CAStor CFS simply manages UUIDs giving it the ability to scale beyond the constraints traditional file systems encounter in terms of number of files and amount of capacity supported.
12. Does CAStor support custom metadata? top
Yes, CAStor allows custom metadata to be defined by applications to uniquely describe content objects. CAStor stores all metadata with actual content and is persisted through its life cycle. Other metadata elements include number of replicas to be maintained, retention period, content type, file name, originating application and others. CAStor also supports a special metadata element called the LifePoint that allows an application to describe how a file should be managed during is lifecycle in CAStor.
13. Is encryption built in to CAStor? top
There are many excellent encryption products available today so CAStor does not need to encrypt data, just simply store your encrypted data. Choose the encryption algorithm you like best, encrypt your content, store it in CAStor, and decrypt it when it comes out. CAStor will store whatever you put into it including encrypted data.
14. Does CAStor provide index/search functionality? top
Caringo is firmly on the side that believes index/search belongs at the application layer and not the storage layer. Most, if not all, content management products provide index/search as a feature and other stand-alone search engines can easily crawl a CAStor repository using CAStor CFS to index content and provide search functionality.
15. Does CAStor provide WORM storage? top
Yes, CAStor can provide WORM storage if specified when content is written. Once specified, WORM content can never be deleted. CAStor can also manage content lifecycle information automatically and one can specify that a file cannot be changed throughout its defined life and cannot be deleted until the retention period has expired. This addresses regulatory mandates such as SEC17a4, which is the most stringent regulatory requirement defined for data storage. CAStor has been designed to deliver the assurance that content can be stored safely over extremely long periods of time.
16. How can CAStor reduce my total cost of ownership (TCO)? top
CAStor has a dramatic effect on TCO components which is described in detail in the following information brief.
CAStor and the TCO Effect Information Brief
17. Does CAStor provide high performance for both small and large files? top
For small files CAStor performs approximately 700 writes/second in a 4-node cluster; that's about 51 million writes/day. It goes up to 3000 writes/second in a 32-node cluster. Large file performance is equally as high with an 8-node cluster providing over 200MB/second for reads and writes and a 32-node cluster performing at over 1GB/second on writes and approximately 900MB/second on reads.
18. Can you run CAStor in a VMWare environment? top
No, this is not recommended or supported for CAStor given the known and ongoing NTP/clock speed issues. CAStor already provides a hardware virtualization layer for all the hardware/servers that are under control of CAStor. This way, CAStor can make sure that the clock speed is accurate and synchronized.
19. Can you run Content File Server (CFS) and Content Router (CR) in a VMWare environment? top
Yes, given that both products do not have the same sensitivity to clock speed as CAStor, you can run both product under VMWare or as virtual instances on a server.