Bookmark and Share

Advancements in research instrumentation such as gene sequencers, and mass spectrometers combined with increases in image resolution are leading to a tremendous amount of data. Life Sciences and bioinformatics organizations often need to reduce the speed of their research to ensure data can be stored, accessed and preserved at the rate that it can be created. This results in inefficiencies that ultimately lengthen research times and could potentially impact clinical trials or a product launch and put you at a competitive disadvantage. Caringo object storage software powered by CAStor solves these issues by providing highly automated storage with plug-and-play capacity and performance all in one solution you don't need a PhD to manage.

Highly Automated Storage

  • Massively scale capacity – meet any demand — Scale by simply inserting one, or more nodes into the cluster. Additional capacity is automatically available; no provisioning required by an administrator. Minimal administrative overhead makes it easy to increase storage up to Petabytes of capacity in a single cluster.
  • Simple, adaptive, symmetric software — CAStor uses symmetric clusters to evenly distribute processing across all nodes. Performance is optimized by using adaptive algorithms to identify the best-equipped nodes, at any point in time, for requested operations.
  • No file system limitations — Universally unique IDs stored in a flat address space eliminate the complexity of hierarchical file system architectures. There are no limits on quantity of files and no capacity limits. Achieve the performance of primary storage for content of all types, large or small
  • Self-managing cluster. — Symmetric cluster architecture evenly distributes processing across all nodes. Operation requests go to the nodes best able to handle them to optimize performance. CAStor organically balances storage and CPU loads. There are no specialized nodes so there are no bottlenecks.
  • Continuous integrity checks — CAStor's Health Processor continuously monitors data integrity and cardinality (number of replicas) and will automatically heal any degradation or non-conformity. Metadata life point rules are also checked and enforced.

Johns Hopkins University CIDR protect their genotyping and statistical genetics data.

Johns Hopkins University Center of Inherited Disease Research (CIDR) is a centralized facility providing genotyping and statistical genetics services for investigators seeking to identify genes that contribute to human disease. The CIDR is a high throughput genotyping lab for projects funded by the National Institute of Health (NIH) and can generate thousands of Gigabytes of new data daily performing genetic analyses.

The Challenge

  • Huge storage capacity needs on a shoestring budget
  • NAS systems could not deliver throughput needed
  • Need to protect legacy hardware investment while progressing to object storage
  • Requires integration to industry standard CIFS file system

The Solution

  • Mix and match old and new hardware in the cluster for massive CapEx savings
  • Start small and seamlessly scale configuration 70 fold since initial deployment
  • Multi-protocol support of CIFS and HTTP to integrate scan images and workstations
  • Expand capacity and server nodes at lower TCO each and every year guaranteed
  • Storage expansion with no provisioning ever versus SAN and NAS complexity

The Results

  • Investment protection and future proof with standard storage servers
  • Self-managing and self-healing cluster hits the bottom line with compelling OpEx savings
  • Exceed expectations for the most stringent throughput and retention SLAs
  • Dynamic configuration management equates to continuous data availability and no downtime

The Configuration

  • A CAStor storage solution for genome research applications utilizing multi-protocols to integrate TBs per day of real-time scanned images loaded into the object storage repository and accessed by a network of remote research workstations.
CIDR uses Caringo CAStor object storage