“Data is the new oil”, which means that in the coming future data will be one of the most crucial part of any organization or you may say the development of mankind will depend on. It is said that every company is a data company that means each and every company will generate data either less or more. And data is a very powerful resource which if used correctly, can change the fortune of any company. And due to these reasons, it becomes extremely important to store and preserve the data that can be continuously used for analysis and getting insights. Normally data is just a raw form of information, then by processing and analyzing it, we convert it into a useful form which will be used to take data driven business decisions. So, for this, generally companies require to store data in their private network storage. But nowadays, most companies prefer cloud storage services offered by companies like Google, Amazon or Microsoft which are a lot cheaper and easier to use than maintaining their own hardware.
Cloud storage option is very useful given that it is properly optimized for. It can save a lot of cost and provide availability of data. The main task in optimizing cloud storage is selecting location and storage class.
Cloud stores the data in the form of objects. Google cloud calls it a bucket in which objects are stored. So, while setting up a new bucket, we must select where our data has to be geographically located. Storage is generally segregated into zones and regions. If the users are in a nearby location, then you can select regional storage. Regional storage means that the data is stored in 3 or more zones in a single region separated by a few 100 kilometers. This is used when you know that the users or the application accessing the data is located near that region. Single region storage is less costly, but it has a problem of availability. Users located far away from that region may face delay or less speed and accessibility. For example, if your storage is to be connected with a compute engine then it would be recommended that both the compute and storage are from the same region to increase the speed. In case of a region wide outage or a natural calamity, your data might be lost (which is very unlikely). Multiregional storage means the data is copied and then stored in multiple regions. This is used in cases where the data is to be accessed by a lot of people across the regions and the data must have high availability in all the regions. This is a little costlier but provides improved performance. For example, you have a video streaming platform, and the video is accessed across the globe, then it is better to store the video in multiple regions for faster access.
Storage class :
The next option to select while creating a bucket is to select the type of storage class. There are many types of storage classes available according to the service providers. But basically, few of them are available in all cloud services. The first one is a standard class which has a higher cost but gives you very high availability and accessibility. It has very little retrieval cost. The second one is a storage class basically used for archiving data. It has a very low cost annually which is mostly used to store data which would only be used once a year or even less. It has high retrieval cost whenever you access the data, but it is best if you know your data will not be frequently accessed. There are many other classes created by different service providers which lie in between these two classes which you can choose specifically according to your application. A best practice in the case of storage class is to set up a lifecycle management for the object. For example, you can set an object to be in standard class and after one year, it should be moved to archiving class. In this way only the older data will be archived, while the newer data will have high availability for you to use for one year. This will reduce the cost of storing objects for a longer time.
Storage location and class optimization is a very subjective topic. The best person to choose these parameters would be the one who has knowledge about all the storage classes and location options and use of the storage in your application. Based on location of users, availability, cost, storage period of data , etc, you have to carefully choose the options. If you are able to do it, it will save you a lot of money and at the same time boost the performance of your project.