Object storage momentum grows with Dell DX wins

For many of us in the object storage market, 2012 felt like a tipping point. The value that true object storage brings over file system based storage is the only way that organizations can handle the volume, velocity, and variability in data sets. Some refer to this as Big Data but these characteristics are increasingly becoming commonplace throughout all unstructured enterprise data sets. We can get into storage growth curves and stats to show you why object is needed but nothing beats highlighting real world use cases and recent wins. To that point, I wanted to direct you to some wins and write-ups for the Dell DX, one of the most popular object storage solutions on the market (which just happens to be powered by Caringo).

 

The Army and Navy will use the DX as part of their next generation healthcare solution that will provide an on-premise, vendor neutral archiving system to improve the quality of care for military families  – read more here.

 

Memorial Hospital of Union County has been using the DX for their long-term picture archiving and communications system (PACs) for over a year with 100% availability! – download the use case here.

 

 

ViaLink, a digital signature company, is using the DX to automate storage infrastructure giving them greater data protection and agility – enabling to offer more value added services – download the use case here.

 

The University of Illinois was awarded a grant to integrate the Dell DX with Fedora Commons, a popular open source architecture used for storing , managing and accessing digital content used by many libraries, museums and cultural archives – read more here.

Steve Jobs – an Object Storage Champion?

The recent launch of Mountain Lion, updates to iCloud and continued proliferation of iOS devices is giving everyone a view of what a file system-less world will look like (and it looks pretty darn good to us at Caringo). Of particular interest is what Steve Jobs had to say all the way back in 2005 and how, with all of the recent software and hardware launches, Apple is rapidly removing traditional file system conventions from their product line. Why? It can be summarized in two words – simplicity and portability. The subject was discussed a bit further in Fortune Mag’s blog – Steve Jobs: “Why is the file system the face of the OS?”. Job’s exact words were:

in every user interface study we’ve ever done […], [we found] it’s pretty easy to learn how to use these things ’til you hit the file system and then the learning curve goes vertical. So you ask yourself, why is the file system the face of the OS? Wouldn’t it be better if there was a better way to find stuff?

What Job’s was really highlighting is the way files are stored and addressed. In a traditional file system you store files on a device, then a volume (C:, D:…), then in a directory or bunch directories, then with a name and file type.

File systems = complexity

What goes on behind the scenes (in a corporate filer) is even more complex with each file being split into thousands of blocks each addressed with an inode and then stored in different places on a disk. It is all this complexity that ultimately leads to what Jobs described as “learning curve going vertically”. Users need to remember where the files are, the name of the file, to back up the file and in a corporate world, it gets even more complicated. But the real issue is that file systems have limitations and in today’s “cloudy” world they lock files into a single physical location making data portability extremely difficult.

File system don’t = portability

The way that file systems address content also locks it into a particular location making it very difficult to manage content portability across devices and platforms. Think about the opposite action of retrieving a file through a file system. You need to know the entire path (\\serve\volume\directory..\filename.ext). Again what goes on behind the scenes is even more complex. Every inode is looked up, the blocks that represent a file are all found on the disc and stitched back together and the file is delivered. Now think about this for thousands, millions or billions of files on different devices all trying to be synchronized. It is just way too complex to do it in a practical fashion.

So… what is Apple doing?

What Apple is doing is leveraging a key/value method of storage and addressing which is the same in most object storage solutions. What this does is assign a unique value on content which is stored with meta data that can be presented to give more information on what is in the content and how it should be managed. They can then store everything in one big pool of storage and access that piece of content with only that one key/value. So the application doesn’t need to know the exact location at all times and doesn’t need to stich everything together. This also dramatically improves storage management and access. Instead of trying to manage a long directory path and thousands of inode addresses, applications only need to manage a single value and metadata which can all be synched to the same repository (like iCloud).

User-friendly storage (for end users and admins)

We leverage these principals in the Object Storage Platform and like Apple, think storage should be simplified to the extent that IT administrators don’t think about it and are free to focus on what is core to their business. So we agree with Job’s comments and want to extend it a bit to ‘why is the file system the face of the OS, and why is it the face of storage’. There is an easier and better way to store content that you are most likely already benefiting from today and it is easier than you think to setup and use throughout your organization.

Redefining Elasticity

You may have noticed the market buzz around erasure coding in object storage lately, essentially presenting sliced objects as the best thing since sliced bread.

Erasure coding (EC), sometimes also referred to as Reed-Solomon encoding, is a RAID like technique using Forward Error Correction codes where you slice up an object into say 5 chunks and you generate 2 additional parity chunks from those. Then you store all 7 chunks on different devices to protect the content: even if devices fail, as long as you have any 5 out of the 7 original chunks, you can regenerate the full set. In the above case you would be able to sustain 2 device failures without data loss, while only incurring 40% disk footprint overhead.
The interest is clear: save disk space, data center footprint, power, cooling. Additionally, by increasing the number of parity slices, like in a 10 + 6 scheme for example, the level of protection against device failures can be raised at will, way beyond what replication based schemes can offer.
So, what do you think: a free lunch? Well, as usual, not really; or at least, not always. If you cut up an object into a number of slices, you increase the object count in your repository accordingly, and that comes with overhead, two kinds of it:
1. it increases the index size: a RAM based index (like in Caringo’s CAStor Object Storage SW) will need more RAM, while an SSD or hard disk based index will be even slower than it already was before.
2. minimum object size overhead: CAStor doesn’t use a file system to park its objects and can store on tiny 1KB boundaries, but some file system based offerings out there come with 64 or even 128KB minimum object sizes! With the 10+6 erasure code scheme that means up to a full megabyte of footprint overhead regardless of object size. It’s quite obvious that EC simply does not apply to small objects, especially when riding on/ridden by file systems.
In addition, to achieve its protection potential, EC needs to dispose of a very large number of devices in the cluster to reduce data loss risks associated with simultaneous failures. So it cannot effectively support smaller cluster sizes without compromising protection either. Which brings us to a point that one-trick-pony EC object storage companies keep very mum about: the mechanism may bring benefits in large file, large cluster use cases, but it isn’t nearly as versatile as replication.
For CAStor, we simply would not even think of sacrificing our wide application range, so we did only introduce erasure coding when we saw a way of integrating it with our flexible replication and fast recovery capabilities. Which does make CAStor 6 with Elastic Content Protection the only object store in the market that effectively combines the strength of both approaches while compensating for their mutual drawbacks. CAStor was already able to specify individual replication schemes (2, 3, 4, …) as lifepoint metadata on a per cluster, per domain, per bucket or even per object basis; now we have added erasure codes to that scheme as if they were just another form of replication (5+2, 7+3, 10+6,…). A settable threshold (e.g. 1 Mbyte) specifies the automatic boundary below which replication will be used, and above which erasure coding. All of this inside the same infrastructure.
Lat but not least, our EC implementation brings another major exclusive: fast, active recovery of segments lost in device failure. Where other EC offerings tend to rely on passive background or read based discovery and repair of EC sets, CAStor leverages its “turbo” Fast Volume Recovery scheme as used in its replication; EC sets are back to spec minutes after device failure, rather than hours, days or longer. This does make a huge difference in protection level, of course, and it means net footprint savings as less redundancy may still guarantee a similar SLA.
Yes, I agree, object storage is becoming intricate stuff to keep track of. But it’s worthwhile to study it, as it is here to stay. And it pays to get into the details. Literally.
.

DiVault, Archiving in the Cloud powered by Caringo

DiVault, an initiative of R5 Projects B.V. and EMID Consult B.V., has launched a new digital archiving in the cloud service powered by Caringo software.  The new service uses the Caringo Object Storage Platform with commodity hardware to guarantee the continuity, safety and security of data for organizations looking for a better way to structure, organize, store, search and retrieve their growing volume of digital information.

Targeting those in the government, industry, education, and financial markets, the new cloud-based service allows customers to store large quantities of data in data centers throughout the Netherlands to cost-effectively archive and retrieve business-critical information.  Caringo software delivers a secure, multi-tenant system utilizing a single namespace that can scale to billions of objects and petabytes per location with superb performance and guaranteed data protection.

Three of the key benefits that Caringo provides are 1) the ability to keep the solution responsive regardless of the number of files/objects or file size (4KB to 4TB+), 2) vendor neutral, plug and play scale out with no provisioning and automated storage balancing, and 3) a symmetric architecture with automated data integrity checking, self healing and rapid recovery that actually gets faster as the service grows.

So what does this mean for cloud storage service management staff? Well, it means that one IT administrator can manage over 10 PB of storage AND they can easily keep up with hardware power consumption and capacity improvements. We are proud to add DiVault to our expanding list of customers and look forward to their continued success.

Caringo Cloud Architecture Helps Dutch Real Estate Association Provide Reliable, Scalable and Cost-Effective Rich Media Storage

Nederlandse Vereniging van Makelaars (NVM), or in English the Dutch Association of Real Estate Agents, is the largest association of real estate agents and experts in the Netherlands. Founded in 1898, NVM promotes the interests of its 3,000 members among consumers and policy makers in the Netherlands. One of the tools it offers is an online environment called Tiara, an extensive database of images and real estate transactions from 1985 to the present. Tiara enables NVM members and other real estate parties, like the real estate listing and advertisement site Funda.nl, to access complete real estate information for analysis or sales purposes. Since 2004, development, management and maintenance of Tiara is provided by Mirabeau, a full-service agency that helps the largest companies in Holland with Internet strategy, concept design and maintenance.

The Challenge

Originally, storage for Tiara was delivered through two IP-SANs — one master that replicated to the second for redundancy and DR — hosted at a large global service provider. In 2011, the NVM changed its policy from keeping one image for historical records to keeping all images. Available disk space shrank rapidly. In addition, there was a reliability issue with the SANs where images were not available to agents for an entire day. It was clear that a more scalable, cost-effective and reliable solution was needed.

The Requirements

NVM tasked Mirabeau with finding a storage solution that delivers cost-effectively scalability, high availability and reliable performance. The solution needed to provide storage and access to the existing 25 million images and be able to support an additional 30 thousand images per day. They looked at several solutions including multiple file systems, SQL server, Mongodb and Jackrabbit. However, only the Caringo Object Storage Platform fulfilled all of its requirements, providing a “nice, clean good architecture” in a storage solution that was easy to evaluate and integrate.

The Solution

As part of the complete solution Mirabeau developed Fleuron, an interface layer to Tiara that is responsible for the processing, transcoding, metering, distribution and storage of various kinds of media files. Fleuron stores all current and historical files on Caringo CAStor. A two-cluster system is deployed with two replications of each file in the primary cluster and a third replica sent to the secondary cluster for DR purposes. Caringo Content Router automatically manages all replication. Media files in various formats are then made available to applications within the NVM and a growing number of applications of external real estate parties.

The Result

By using the Caringo Object Storage Platform, Mirabeau was able to offer NVM a cost-effective storage solution that can easily scale utilizing standard x86 server hardware with any size hard drives, providing a unified storage system that can scale to billions of files and hundreds of petabytes per location. NVM can now increase the number of images stored per house including support for high-resolution images for print or higher-quality transcoding. In addition, NVM can now expand its business by providing a central database for real estate organizations with outdated libraries that contains all relevant information in addition to just images.

Caringo Used as Cloud Infrastructure for Medical Record Cloud

Our friends at the the Register posted a great article today giving an overview on how Caringo, VMWare and Intel are playing a key role in what will become the central, secure storage repository of digital records for 330 million people as part of a medical imaging cloud for patients across the United States.  The solution is a project developed through a partnership between John Hopkins University and Harris Corporation called Peake Healthcare Innovations.

The medical record cloud called PeakeSecure, uses our object storage platform as cloud storage infrastructure to automatically protect hundreds of millions of files each with 3 replicas. When dealing with an archive of this size drive failure is assured but by using replicas our software can reliably protect billions of files and hundreds of petabytes of records. This simply isn’t possible with RAID 5 or 6 because of performance issues associated with striping and rebuild times that increase as capacity increases.

PeakeSecure also uses our adaptive power consumption technology (Darkive), which allows CPU and the disks within the array to spin up or down as access patterns to data changes, resulting in a huge savings on operational and energy costs. Darkive is especially useful for cost savings in the medical industry where records need to be stored for a minimum of 7 years and are rarely accessed after a patient is treated. We have some customers that store children’s medical records that plan to store them indefinitely – imagine the cost if those discs kept spinning forever.

The solution has already been successfully tested in a private cloud utilized by two Johns Hopkins hospitals with a full version rolling out to its university hospital system in March.  The public cloud version is scheduled to be completed later this year.

What the Storage Industry Can Learn From Volvo

In the fifties and early sixties, year over year automobile progress was being written in terms of front grille chrome and whitewall tires. Menial topics such as occupant safety were off-limits, unmentionable. You simply didn’t want to risk scaring off potential buyers of your latest shiny gas guzzler. Until one company did and started talking up safety in its marketing and advertising. Volvo. It mentioned the unmentionable and in doing so totally transformed the market as well as the perception of the automobile. The fact reflects onto the Volvo brand until this day.
I often think we may need a similar shock to happen in the storage industry. No storage vendor will ever spontaneously mention data loss, even though in terms of reality it’s right up there with death and taxes. It will happen, period. There are just two things about it that we can influence to a certain extent: probability and size, which are simply two sides of the same coin.

Now strangely enough, people really only seem concerned with the former of the two: probability of “something” happening. As in the question: “in your storage system, how many simultaneous disk failures can you afford before you sustain any data loss?”. An interesting question indeed, as it totally disregards the other question of **exactly how much** data you lose when you do.

For example, take a file system on a RAID configuration consisting of 5 drives plus 1 parity drive. Can take any single disk failure without data loss, with just 20% overhead. Nice. Except: lose another drive during recovery and you just lost everything. 100%. Bet you knew already.

Now consider this: use the same drives in a CAStor object storage cluster configuration, and specify two replicas for each object stored. Contrary to popular belief, you’ll be able to store about the same (!) net amount of data ( because of CAStor’s avoidance of filesystem related overhead at multiple levels – details in our white paper at http://goo.gl/4F9Bb ). Lose a drive. No sweat, no data loss. Lose a second drive during recovery. Oops. Data loss. But how much? Less than 3.3% of the objects. 1.7% on average (*). Whole objects, not fragments. While the integrity of the remaining objects remains unaffected and guaranteed.

Summarizing: same disks, same events. First question: data loss? Yes, in both cases. Second question: how much? 100% in one, 1.7% in the other.

So now, how many disks can you afford to lose? Make sure you ask the second question.

– Paul

(*) It’s easy to compute: 2 replicas of each object, spread over 6 drives, with replicas of the same object always on different drives. Have one drive fail. Now one sixth of the objects are at risk i.e., without replica. Now have another drive fail: one fifth of the aforementioned objects will be affected and have no replicas left – a data loss of one fifth of a sixth i.e., 3.33%. That latter value is a maximum that is only valid if both failures happen at exactly the same moment. If they don’t, some recovery will already have taken place. That sliding window effect reduces the average loss to half of the maximum i.e. 1.67%.

BTW, if computing data loss probabilities triggers that funny feeling in your stomach, please remember that CAStor will gladly maintain 3, 4 or more replicas, selectable on a per object basis and automatically varying over the life cycle of the object. Hey, your choice.

5 reasons Object Storage will take over the cloud in 2012

1. Object Storage will bring Cloud economics to any organization in 2012

A key to the affordability delivered with cloud storage is to begin with commodity components in a redundant, multi-tenant architecture with highly efficient disk utilization. These benefits are driven by object storage infrastructure which most assume is due to economies of scale and proprietary development, however advancements in object storage including significant progress in standardization of interfaces and a maturing ISV ecosystem, are making object storage one of the best options for organizations looking to cost effectively store their unstructured data.

2. The Relentless Growth of Data Will Expose the Limitations of SAN and NAS
The complexities and costs of traditional NAS and SAN storage arrays will continue to become increasingly prohibitive as a viable long-term storage option for unstructured data. Information created by organizations will need to be stored and accessible for indefinite periods of time driven by the ability to re-use and by regulatory compliance purposes. Object storage will be one of the few ways that organizations can easily and cost effectively store massive libraries of information that are instantly accessible.

3. Object Storage will enable the Adoption of Hybrid and Private Cloud 
Organizations will increase their adoption of hybrid and private cloud storage solutions as IT departments look to solve management and cost issues associated with data growth. The move to hybrid and private cloud solutions will be based primarily on the ability to guarantee the security and integrity of their information while still benefiting from cloud economics.

4. Object Storage Will Help with File Count in Addition to Storage Capacity
In addition to an increase in capacity organizations are also seeing an increase in number of files driven by Big Data applications, research equipment, and continued optimization of web delivered information. These files range from a few KBs to tens of TBs and are exposing the limitations of file system. As the number of files increase file systems become less responsive requiring IT to purchase new storage systems to increase performance and responsiveness instead of capacity. Organizations will turn to object storage to provide a flat and highly efficient address space with no limit in file count or capacity.

5. Data Growth Will Prohibit Backups
As data sets grow backup windows get longer and longer until they ultimately become unmanageable and IT must decide what data to backup or even worse, not backup at all. Object storage will be turned to as organizations realize that they can use file replication and integrated self-healing, self-optimization and metadata driven data lifecycle management to eliminate backups altogether.