Is your storage provider empowering you or encumbering you? This second part of the series discusses high-order benefits of placing data into a public cloud when your storage provider empowers you to use your data.
The first part of this article showed the importance of providing tools for designating data that is sent to the public cloud with regard to off-site data protection. This ability to manage data sets by organizational units within the storage system and select data using object metadata combined with data management policies becomes crucial when going beyond hybrid-cloud basics.
When using public cloud for data tiering, the on-premises storage acts as a cache for data ingest and holds the working set data that is still active. Typically, data that is inactive or that meets other policy rules is moved into the public cloud for long-term storage using a least recently used replacement strategy. Even though some or all of the content may reside off-site, all of the data appears to reside locally and users retain full use of their data, including searching based on content metadata. (For more on metadata, read this article by our CEO and Co-Founder Jonathan Ring.)
By using public clouds for tiering content, storage administrators can reduce the overall footprint of their physical hardware and take advantage of the public cloud provider for their scale-out capabilities. Simultaneously, the additional benefits discussed in this article are also realizable.
If backup and tiering will forever be your only goals for using public cloud, then the in-cloud data representation is not a primary concern for you or your users. However, the real power with public cloud platforms is in the data processing services that are available to multiply the value of your data. These high-order gains from public cloud are only possible when the representation of the data—including its metadata—are native to the cloud provider’s object storage system.
This native format for the in-cloud data means that your storage vendor must not use their own, homegrown data structures for the data and must not break it apart into pieces that are meaningless to the other cloud services. Your storage vendor should also preserve the organizational structure of the on-premises data when it is placed into the public cloud. This means that the multi-tenancy groups, containers/buckets, and even object names should be translated to the native format of the cloud provider’s system. Compromising on these will limit your ability to utilize the full power of the public cloud provider.
As mentioned, the real gains from using public clouds for more than bulk data storage come about as the other cloud services are brought to bear for data processing. The ability to designate data sets becomes crucial so that end-users and content owners, not just storage administrators, are able to direct the movement of data into the cloud so that additional tools can operate on it.
Example use cases for copying targeted data sets into a public cloud include:
- CDN for high-bandwidth content delivery to customers
- Image analysis and facial recognition
- Video transcoding and scene analysis
- Voice analysis and transcription
To design, host, and manage these use cases on-premises take more money and specialized personnel experience than many organizations can reasonably allocate. Selective data set movement to a public cloud combined with using these additional services allows for rapid application augmentation. Without this capability, many organizations would be limited in the variety of data analytics that they can use.
After data sets are processed in a public cloud, your storage vendor should provide the tools for your end-users and data owners to combine the results of the in-cloud data processing back into the storage system. Additionally, your users should be able to direct how the in-cloud data sets are handled after processing.
Does your storage provider encumber or empower you? Their vision to fit public cloud storage into their own storage product determines the answer to that question. Will you be able to do more than bulk storage with public cloud and use the data processing tools within the cloud? Look for storage systems that will address the first-order needs like offsite data protection and bulk storage and, simultaneously, empower your users to use the high-order data processing tools available within the cloud.
To learn more, watch our webinar on demand.
From monitoring volcanoes and earthquakes to crop yield analysis, wildlife and insect migratory patterns, JASMIN is giving mankind unrivaled insight into our natural world. The JASMIN facility is a "super-data-cluster" that delivers infrastructure for data … More Details »