DiVault, Archiving in the Cloud powered by Caringo

DiVault, an initiative of R5 Projects B.V. and EMID Consult B.V., has launched a new digital archiving in the cloud service powered by Caringo software.  The new service uses the Caringo Object Storage Platform with commodity hardware to guarantee the continuity, safety and security of data for organizations looking for a better way to structure, organize, store, search and retrieve their growing volume of digital information.

Targeting those in the government, industry, education, and financial markets, the new cloud-based service allows customers to store large quantities of data in data centers throughout the Netherlands to cost-effectively archive and retrieve business-critical information.  Caringo software delivers a secure, multi-tenant system utilizing a single namespace that can scale to billions of objects and petabytes per location with superb performance and guaranteed data protection.

Three of the key benefits that Caringo provides are 1) the ability to keep the solution responsive regardless of the number of files/objects or file size (4KB to 4TB+), 2) vendor neutral, plug and play scale out with no provisioning and automated storage balancing, and 3) a symmetric architecture with automated data integrity checking, self healing and rapid recovery that actually gets faster as the service grows.

So what does this mean for cloud storage service management staff? Well, it means that one IT administrator can manage over 10 PB of storage AND they can easily keep up with hardware power consumption and capacity improvements. We are proud to add DiVault to our expanding list of customers and look forward to their continued success.

Caringo Cloud Architecture Helps Dutch Real Estate Association Provide Reliable, Scalable and Cost-Effective Rich Media Storage

Nederlandse Vereniging van Makelaars (NVM), or in English the Dutch Association of Real Estate Agents, is the largest association of real estate agents and experts in the Netherlands. Founded in 1898, NVM promotes the interests of its 3,000 members among consumers and policy makers in the Netherlands. One of the tools it offers is an online environment called Tiara, an extensive database of images and real estate transactions from 1985 to the present. Tiara enables NVM members and other real estate parties, like the real estate listing and advertisement site Funda.nl, to access complete real estate information for analysis or sales purposes. Since 2004, development, management and maintenance of Tiara is provided by Mirabeau, a full-service agency that helps the largest companies in Holland with Internet strategy, concept design and maintenance.

The Challenge

Originally, storage for Tiara was delivered through two IP-SANs — one master that replicated to the second for redundancy and DR — hosted at a large global service provider. In 2011, the NVM changed its policy from keeping one image for historical records to keeping all images. Available disk space shrank rapidly. In addition, there was a reliability issue with the SANs where images were not available to agents for an entire day. It was clear that a more scalable, cost-effective and reliable solution was needed.

The Requirements

NVM tasked Mirabeau with finding a storage solution that delivers cost-effectively scalability, high availability and reliable performance. The solution needed to provide storage and access to the existing 25 million images and be able to support an additional 30 thousand images per day. They looked at several solutions including multiple file systems, SQL server, Mongodb and Jackrabbit. However, only the Caringo Object Storage Platform fulfilled all of its requirements, providing a “nice, clean good architecture” in a storage solution that was easy to evaluate and integrate.

The Solution

As part of the complete solution Mirabeau developed Fleuron, an interface layer to Tiara that is responsible for the processing, transcoding, metering, distribution and storage of various kinds of media files. Fleuron stores all current and historical files on Caringo CAStor. A two-cluster system is deployed with two replications of each file in the primary cluster and a third replica sent to the secondary cluster for DR purposes. Caringo Content Router automatically manages all replication. Media files in various formats are then made available to applications within the NVM and a growing number of applications of external real estate parties.

The Result

By using the Caringo Object Storage Platform, Mirabeau was able to offer NVM a cost-effective storage solution that can easily scale utilizing standard x86 server hardware with any size hard drives, providing a unified storage system that can scale to billions of files and hundreds of petabytes per location. NVM can now increase the number of images stored per house including support for high-resolution images for print or higher-quality transcoding. In addition, NVM can now expand its business by providing a central database for real estate organizations with outdated libraries that contains all relevant information in addition to just images.

Caringo Used as Cloud Infrastructure for Medical Record Cloud

Our friends at the the Register posted a great article today giving an overview on how Caringo, VMWare and Intel are playing a key role in what will become the central, secure storage repository of digital records for 330 million people as part of a medical imaging cloud for patients across the United States.  The solution is a project developed through a partnership between John Hopkins University and Harris Corporation called Peake Healthcare Innovations.

The medical record cloud called PeakeSecure, uses our object storage platform as cloud storage infrastructure to automatically protect hundreds of millions of files each with 3 replicas. When dealing with an archive of this size drive failure is assured but by using replicas our software can reliably protect billions of files and hundreds of petabytes of records. This simply isn’t possible with RAID 5 or 6 because of performance issues associated with striping and rebuild times that increase as capacity increases.

PeakeSecure also uses our adaptive power consumption technology (Darkive), which allows CPU and the disks within the array to spin up or down as access patterns to data changes, resulting in a huge savings on operational and energy costs. Darkive is especially useful for cost savings in the medical industry where records need to be stored for a minimum of 7 years and are rarely accessed after a patient is treated. We have some customers that store children’s medical records that plan to store them indefinitely – imagine the cost if those discs kept spinning forever.

The solution has already been successfully tested in a private cloud utilized by two Johns Hopkins hospitals with a full version rolling out to its university hospital system in March.  The public cloud version is scheduled to be completed later this year.

What the Storage Industry Can Learn From Volvo

In the fifties and early sixties, year over year automobile progress was being written in terms of front grille chrome and whitewall tires. Menial topics such as occupant safety were off-limits, unmentionable. You simply didn’t want to risk scaring off potential buyers of your latest shiny gas guzzler. Until one company did and started talking up safety in its marketing and advertising. Volvo. It mentioned the unmentionable and in doing so totally transformed the market as well as the perception of the automobile. The fact reflects onto the Volvo brand until this day.
I often think we may need a similar shock to happen in the storage industry. No storage vendor will ever spontaneously mention data loss, even though in terms of reality it’s right up there with death and taxes. It will happen, period. There are just two things about it that we can influence to a certain extent: probability and size, which are simply two sides of the same coin.

Now strangely enough, people really only seem concerned with the former of the two: probability of “something” happening. As in the question: “in your storage system, how many simultaneous disk failures can you afford before you sustain any data loss?”. An interesting question indeed, as it totally disregards the other question of **exactly how much** data you lose when you do.

For example, take a file system on a RAID configuration consisting of 5 drives plus 1 parity drive. Can take any single disk failure without data loss, with just 20% overhead. Nice. Except: lose another drive during recovery and you just lost everything. 100%. Bet you knew already.

Now consider this: use the same drives in a CAStor object storage cluster configuration, and specify two replicas for each object stored. Contrary to popular belief, you’ll be able to store about the same (!) net amount of data ( because of CAStor’s avoidance of filesystem related overhead at multiple levels – details in our white paper at http://goo.gl/4F9Bb ). Lose a drive. No sweat, no data loss. Lose a second drive during recovery. Oops. Data loss. But how much? Less than 3.3% of the objects. 1.7% on average (*). Whole objects, not fragments. While the integrity of the remaining objects remains unaffected and guaranteed.

Summarizing: same disks, same events. First question: data loss? Yes, in both cases. Second question: how much? 100% in one, 1.7% in the other.

So now, how many disks can you afford to lose? Make sure you ask the second question.

– Paul

(*) It’s easy to compute: 2 replicas of each object, spread over 6 drives, with replicas of the same object always on different drives. Have one drive fail. Now one sixth of the objects are at risk i.e., without replica. Now have another drive fail: one fifth of the aforementioned objects will be affected and have no replicas left – a data loss of one fifth of a sixth i.e., 3.33%. That latter value is a maximum that is only valid if both failures happen at exactly the same moment. If they don’t, some recovery will already have taken place. That sliding window effect reduces the average loss to half of the maximum i.e. 1.67%.

BTW, if computing data loss probabilities triggers that funny feeling in your stomach, please remember that CAStor will gladly maintain 3, 4 or more replicas, selectable on a per object basis and automatically varying over the life cycle of the object. Hey, your choice.

5 reasons Object Storage will take over the cloud in 2012

1. Object Storage will bring Cloud economics to any organization in 2012

A key to the affordability delivered with cloud storage is to begin with commodity components in a redundant, multi-tenant architecture with highly efficient disk utilization. These benefits are driven by object storage infrastructure which most assume is due to economies of scale and proprietary development, however advancements in object storage including significant progress in standardization of interfaces and a maturing ISV ecosystem, are making object storage one of the best options for organizations looking to cost effectively store their unstructured data.

2. The Relentless Growth of Data Will Expose the Limitations of SAN and NAS
The complexities and costs of traditional NAS and SAN storage arrays will continue to become increasingly prohibitive as a viable long-term storage option for unstructured data. Information created by organizations will need to be stored and accessible for indefinite periods of time driven by the ability to re-use and by regulatory compliance purposes. Object storage will be one of the few ways that organizations can easily and cost effectively store massive libraries of information that are instantly accessible.

3. Object Storage will enable the Adoption of Hybrid and Private Cloud 
Organizations will increase their adoption of hybrid and private cloud storage solutions as IT departments look to solve management and cost issues associated with data growth. The move to hybrid and private cloud solutions will be based primarily on the ability to guarantee the security and integrity of their information while still benefiting from cloud economics.

4. Object Storage Will Help with File Count in Addition to Storage Capacity
In addition to an increase in capacity organizations are also seeing an increase in number of files driven by Big Data applications, research equipment, and continued optimization of web delivered information. These files range from a few KBs to tens of TBs and are exposing the limitations of file system. As the number of files increase file systems become less responsive requiring IT to purchase new storage systems to increase performance and responsiveness instead of capacity. Organizations will turn to object storage to provide a flat and highly efficient address space with no limit in file count or capacity.

5. Data Growth Will Prohibit Backups
As data sets grow backup windows get longer and longer until they ultimately become unmanageable and IT must decide what data to backup or even worse, not backup at all. Object storage will be turned to as organizations realize that they can use file replication and integrated self-healing, self-optimization and metadata driven data lifecycle management to eliminate backups altogether.