vector work of using the swarm ui on a tabletQuick. Fast. Performant. We often hear requirements for high performance when talking about storage. When we drill deeper, that translates to “fast, we need it to be fast.” This response is common and troublesome.

How Fast is Object Storage?

Fast is relative. Compared to a semi-truck, a Ferrari is fast. But, each serves a different purpose. When it comes to storage, measuring “fast” and what that means in the real world can be a complicated endeavor. There are two metrics measured, input output operations (IOPS) and throughput.

What is the Difference Between IOPS vs Throughput?

IOPS measure the speed of operations, which is a useful measurement of performance for file-system-based solutions since files are essentially shredded into thousands of pieces and need to be stitched together quickly on read. Throughput, however, is more about the total amount of data that can be read from a storage system.

IOPS is a great measurement to determine if you can support a specific application—like a 4K or 8K editing suite (as one example). Alternatively, throughput is often used in the context of how quickly you can deliver content to different applications, clients and users.

For object storage, throughput is the main performance characteristic to measure with IOPS being a secondary measurement.

What is the Throughput of Object-based Storage?

When measuring the performance of an object storage system, not only must raw throughput be taken into account for both data ingest and retrieval, but data protection and recovery speeds need to be accounted for as well. What use is a massive ingest rate if the data is not protected? Being able to claim that you can lose data faster than the competition is not smart marketing. And, not being able to keep up with a client workload is problematic.

Why Should I Conduct Performance Testing of Object Storage?

While being able to quote large throughput numbers is impressive, performance testing storage systems helps us out in many other ways as well. It helps us fine tune the software and underlying mechanisms. It shows us how the software scales as environments grow in capacity and complexity. Most of all, performance testing lets us know if the storage solution will meet the needs of the customer workload, as it did in our recent object storage performance benchmarking for the UK Science and Technology Facilities Council’s (STFC) JASMIN super data cluster. Download the STFC Object Storage Benchmarking Case Study & Whitepaper.

STFC-Rutherford and swarm for JASMIN project

Lies, Damned Lies, and Statistics

It has been said, “there are three kinds of lies: lies, damned lies, and statistics.” One thing to watch for when reviewing performance statistics is how the testing was performed. For example, if a system makes impressive numbers and performs fantastically with 100 byte files but performance falls off dramatically for files over 1kB then the usefulness of that system would be very limited. Or was the testing performed in simulation only vs using real world tools and data? What does the test data actually measure, writing to cache or the final write to the target media (usually HDD for object storage)? Useless metrics are just that⏤useless.