Everywhere you look these days, there are articles about new scientific breakthroughs. The storage world is abuzz about the the first real picture of a supermassive black hole. This black hole is at the center of galaxy M87, and it took about 3.5 PBs of data to generate the picture. In total they collected 5PBs, that is 5,000TBs or about 625x8TB drives. Amazingly enough, hard drives with the data were shipped via airplane from different locations to be consolidated!
Why in the world would you ship data around on drives instead of using the cloud or FTP? The problem was not storage capacity (5PBs is easy enough to store in AWS). The issue was the transferring of this amount of data in a reasonable time frame.
A World Full of Data
Scientific data is often collected from an eclectic mix of sources, and can easily fall victim to the age-old curse of storage silos.
Consider just a few of the various sources for data:
- Historical records on various types of storage (from handwritten notes to archival tape to various storage platforms)
- IoT devices, telemetry units, telescopes, etc.
- Surveys & Interviews
- Observation (by researchers or by video)
Scientific Organizations Arm Researchers with Technology
As Science progresses, research organizations around the world strive to arm their researchers with the technology to continue making advancements, and data storage is an important tool in the world of high-performance computing (HPC). However, similar to the point made in AJ Herrera’s recent blog What are the 5 tiers of Storage for New Video Production Workflows, one tier of storage does not fit all.
In a research setting, a well-designed storage infrastructure integrates various tiers (or types) of storage to enable the collection, storage and analysis of scientific data. However, recent advances in globally distributed workflows and the resulting access requirements are driving a paradigm shift from distributed and parallel file systems to object storage.
Can Object-Based Data Storage Replace Parallel File Systems?
“Yes! For read intensive workloads” concluded CEO Tony Barbagallo when he posed the “Can Object Storage Really Replace Parallel File Systems?” question in our blog. To say this another way, object storage (on the appropriate underlying infrastructure) can enable high-throughput managed access to research streaming distributed access to data and reducing time to discovery. An example of this is how the UK’s Science and Technology Facilities Council (STFC) Rutherford Appleton Laboratory (RAL) Space uses Caringo Swarm Object Storage as part of their JASMIN super data cluster. Prior to selecting Caringo Swarm, STFC performed extensive benchmark testing on a number of Object Storage Solutions to determine which best met the requirements for the project.
It is going to take more than just massive amounts of data storage for the scientific community to streamline distributed collaboration. It will take a coordinated approach between storage, networking and data analysis tools, such as those provided by our partner Globus. Globus is a secure, reliable research data management service used by thousands of organizations to move, share and discover data via a single web browser interface.
Learn more about Globus and how it works with Caringo Object Storage to solve issues by combining the benefits of S3-enabled private cloud storage with secure, reliable research data management services by reading our solution brief.