To meet the business demands for infinite-scale storage and flexible access to data for their applications, IT organizations are utilizing object storage deployed on-premises, in public clouds, or a hybrid combination of both. While there are organizations that use public cloud for 100% of their storage and application infrastructure and there are those that will keep storage and applications completely on-premises for the foreseeable future, many are somewhere in the middle. They are exploring ways to incorporate public cloud storage as a component of their overall data storage infrastructure.
The first-order benefits of using a public cloud service like Microsoft Azure include off-site data protection as part of a disaster recovery plan or as a cost-effective way to store infrequently accessed archival data. Higher-order benefits of placing data into a public cloud come about when the cloud provider’s computational and analytics services are utilized as part of a comprehensive data processing and storage architecture.
So, what does that have to do with your storage provider? In short, you will be encumbered or empowered by their vision for data management and by how they architect their product to use public cloud.
When evaluating hybrid cloud off-site data protection, an important consideration is the tools the vendor has provided to specify which data should be sent to the cloud and which data should not. Due to internal policies or legal requirements, some data may not be appropriate to be sent to a 3rd-party provider. A hybrid-cloud storage system should provide the ability to designate when data is transmitted outside of the organization.
An equally important consideration is that, due to WAN bandwidth limitations, it may be infeasible to transmit the entire storage system to a public cloud. For example, sending 100TB of data over a WAN could easily take six months with a 100Mbps connection. And, for those organizations using a 1Gbps or faster connection, unless special routing equipment that contains TCP optimizations is used, the transmission rates may be no better.
Since bandwidth cost and availability are often barriers to transmitting huge data sets in a reasonable amount of time, public cloud providers offer physical shipment options to bulk-load the initial, and presumably largest, set of data into their clouds. If the data set is reasonably static and the future ingest rate is within an organization’s WAN bandwidth capabilities, these physical shipment options make sense. If, however, the data changes rapidly such that the WAN transmission cannot keep up, it will be infeasible to use a public cloud without the ability to be selective as to what data is transmitted and what data remains solely on-premises. In addition to transmitting data from an on-premises storage system to a public cloud, the storage system’s capabilities to recover your data in the event of corruption/loss in the on-premises system is also an important consideration.
A well-architected storage system allows an administrator to use the public cloud data set(s) as source to re-populate some or all the original on-premises data. This includes the possibility that the original on-premises hardware and its location are completely different. Additionally, the storage system should automatically determine which files are missing or corrupted and recover only those without needlessly transmitting and overwriting perfectly good files that are already in the on-premises system.
The architecture decisions and data management capabilities that a storage vendor incorporates into their product will determine if you are encumbered or empowered in managing your data. This affects your control over what goes into the cloud and your ability to use that data if it needs to come back to the on-premises storage.
In the second part of this article, I will cover the higher-order benefits of placing data into a public cloud and how to evaluate the hybrid-cloud functionality of an object storage system.