Caringo: Fixed Content Storage
Home | Products | Solutions | Partners | Customers | Resources | FAQ | News | Company | Contact
     




CAStor Content Storage Software
CAS: What is it and who needs it?  
CAS - Content Addressed Storage - was invented because the 30 year old file systems that people use today are sorely outdated. Invented in a time when hard drives were measured in the megabytes and number of files in the hundreds, file systems are not scalable, suffer from poor performance and break at the most inconvenient times.

Paul Carpentier, CTO of Caringo, invented CAS in the 1990's at a company called FilePool. That company was sold to EMC in 2001 and became Centera, EMC's fastest growing product line. EMC brilliantly positioned Centera as an archiving application so as not to erode their high speed (and high priced) SAN and NAS business. This was brilliant in two ways. First, they created a whole new market. Second, being positioned as an archive solution meant that companies would be handcuffed to Centera hardware for as long as their content had to exist. The CAS market is now a multibillion dollar market thanks to EMC and other hardware vendors evangelizing and selling CAS technology.

Caringo looked at the current state of the market and realized that much more could be done with CAS. Hardware vendors were holding customers hostage to their proprietary hardware and those same customers were being locked out of the best features of a pure CAS system.

CAS presents a huge, single, flat address space. Think about that for a minute. What could you do if you have a storage space that never had to reconfigure to add new disk drives. Applications could be free to access information anywhere, anytime. Hardware could expand the storage space with no effect on the applications or users. The flat address space would just grow and grow seamlessly. Scalability is one of a file system's biggest problems. CAS solves it.

What about performance? Lots of small files can bring a file system to its knees. Not in a properly designed CAS system. Performance is another dimension that can be improved by orders of magnitude. In fact, if one combines scalability with performance, one can imagine even greater performance due to additional entry points into the CAS store.

There is no reason why a well designed CAS system cannot run on commodity hardware today. In fact, the price/performance curve of commodity x86 based hardware plays to the benefit of the customer.

Caringo's CAStor runs on commodity hardware - thus it is much more cost effective than other proprietary hardware solutions. CAStor is also totally symmetrical. This means that each node in a cluster is independent. Each is an entry point into the cluster. As the cluster grows, the performance improves. As the cluster grows, recovery of a bad disk can occur faster than RAID. It is simply a beautiful example of putting the best price/performance hardware to work for you with an elegant software implementation.

So, CAS can improve performance. CAS can scale massively in a linear fashion. But only CAStor can free you from the bonds of proprietary vendor hardware lock in. But wait. There's more.

At Caringo, we looked at a number of elements that are torturing the industry. Performance, scalability and hardware independence are three of the biggest, but complexity, system management and maintenance, information integrity and disaster recovery also rank in the 'this keeps me awake at night' realm.

CAS to the rescue. The concept is simplicity itself, but the implementation is sheer magic. Traditional CAS (that is the second generation after FilePool. Centera and others), assigns a hash value to each piece of content. This is the content's unique ID. If the hash is run again at read time and the same value is derived, one can be assured that the content has not changed. Until recently this was true. Nearly two years ago, the MD5 hashing algorithm was cracked in the labs. As we now realize, hashing algorithms of any kind grow stale over time as computers get more powerful and hackers get smarter. As a result, petabytes of information is at potential risk in hundreds of CAS sites.

CAStor separates the unique ID of content from the hashing algorithm. Due to this elegant indirection, the hashing algorithm can be changed at any time with no effect on the content or the applications accessing the CAS system. Simple yet devastatingly powerful. CAStor assures data integrity for the lifetime of your content. Even if that means hundreds of years.

When you store content on a CAStor cluster, the information is replicated (you can set the number of replicas down to the object/file level). With a minimum of two replicas, you are always assured continuous data availability. If a disk goes down, another replica still exists. All other disks in the cluster join in to replace the lost replicas while the system still runs at nearly full speed. Compare this action to a RAID system When a RAID disk goes down, the system slows to a crawl while it takes hours to rebuild the disk. In CAStor, recovery takes minutes, not hours and your systems continue to serve the users.

System management and recovery are two of the most expensive items in an IT shop today. We carefully designed CAStor to minimize system headaches. To start a CAStor node, simply plug in a Caringo USB key, power it up and 60 seconds later you have a working CAStor node. Plug a USB key into another node, power it up and 60 seconds later you have a two node cluster. It's that simple. Each node automatically configures, joins the cluster and begins work. There is no partitioning, no configuration, no management, no allocation, none of the usual tasks. You can run CAStor on desktops, servers, rack mounts, any form factor. You can mix and match hardware and you never have to pre-buy lots more storage than you need today. Start small and grow as your business or applications demand. This strategy saves you a huge initial outlay for a large system you will grow into (as other vendors will say you need). Since you can add CAStor nodes at any time while the system is running, it means that chances are, nodes you add in six months will be faster and more cost effective than nodes you use today (remember, commodity hardware keeps getting faster and cheaper). The customer wins on every side with a CAStor cluster. If a disk or a node goes bad, the system will recover itself and flag the disk or node as bad. You can replace it at a time of your own choosing. No rush. No emergency service call. In fact, you can even retire older hardware with a simple 'retire' command. The node will throw its contents out to the rest of the cluster and when it is finished, it will shred any data on its drives and take itself out of service. Remove the node (again, at YOUR convenience) and replace it with newer hardware.

What long term storage system would be complete today without a disaster recovery plan? Again, CAS supplies a tremendous advantage here. Replicas in CAStor are identical and a cluster stores at least 2. This means your content is always available. But what about a disaster? There are two strategies Caringo provides. The first is to duplicate the cluster in the customer's disaster recovery center. Then assigns the primary cluster to write a third replica to the remote cluster. If the primary suffers a disaster, cutover to the duplicate can be seamless and immediate. Remember, all replicas of the content have the same unique ID. Since CAS is location independent, your applications never care about where the data is stored. The second alternative is to write a third replica to a shared CAStor Cluster available over the internet on a monthly fee basis. Same recovery scenario (gated only by the speed of your network connection), lower cost overall.

If you're beginning to get the feeling that CAS can solve many of the storage industry's woes, you are starting to understand the true power of CAStor. Here's another one for you. Imagine that your company uses CAStor for content storage and you just bought another company that uses CAStor for content storage. To integrate the content stores together, simply network the two CAStor clusters together and the integration is immediate and complete.

We believe that CAS is the new age of simple yet powerful storage architecture for content storage. And we think that CAStor is the best implementation of hardware agnostic, high performance, scalable, cost effective CAS on the market. We're shipping today. Please contact us for more information at info@caringo.com.
    1 of 3