GCP Series ~ Part 1: What is Google Cloud Storage?

Gobalakrishnan Viswanathan
12 min readMar 4, 2024

from official documentation,

Cloud Storage is a managed service for storing unstructured data. Store any amount of data and retrieve it as often as you like.

Cloud Storage is for storing any data which is unstructured not like (RDBMS databases or NoSQL Databases.).

  • This storage is meant to store anything like Images, Videos, Large volumes of text documents, any files, and possibly everything as objects.
  • Used to store Archived data that are not used often like company policies, and legal documents.
  • To Store temporary data. When we use other GCP services, the output of one service can be used by another service in the same pipeline. We can leverage GCS as a temporary storage to have the temporary outputs.
  • Objects stored in the GCS are globally accessible and Web-accessible using URLs.

Unofficially for understanding, GCS is something like a Windows folder where you can store anything. But it is not a file system but an Object storage system.

Objects Organization:

  • A high-level construct in the GCS is called a bucket. And within buckets, we can have folders. Objects can be stored inside folders or directly in the folders.
  • Here, objects are typically different types of files.
  • Since objects in the GCS are web-accessible, the bucket name should be unique across the GCS. This will be taken care of by GCP itself at the time of bucket creation.

Buckets:

Buckets are the basic containers that hold the objects. All the objects we store in Cloud Storage must be inside a Bucket. We can use these Buckets to organize our data based on our incoming data something like Raw Bucket to handle RAW information coming from Servers and Processed Bucket to store the processed data created from the RAW Bucket data.

  • There is no limit to the number of buckets we can have in a project, But there is a limit to the rate at which we create/delete the buckets.
  • Bucket names should be unique across GCP. The default bucket name size is 63 characters. when the bucket name contains `.`, the name can be up to 222 characters.

Storage Locations:

Bucket locations are to define where we need to store our data based on our availability needs. This location tells where the actual buckets and their data objects are stored.

  • When we create our bucket, we give this location where our bucket and data need to be stored. Once the bucket is created, we can't change the bucket location but we can move the data to another bucket that is in another location.
  • Three types of location types :
    1. Single Region (Single Geo Location)
    2. Dual Region (Two Geo Locations)
    3. Multi-Region (Which has many Geo locations)
  • Location type determines how your data is replicated and priced. From the names themselves, we can understand that in a single region, the data will be stored and replicated only within a region that does not give maximum availability when something unwanted happens.
  • The other two regions are storing data in diverse geo locations that give good availability, and durability with maximum on time. Replication of the data also happens across the regions which offers immediate backup when outages happen. Took the below table from the GCP site to understand the Availability, Performance, and Pricing of the different regions.
comparison between storage locations.

When we need more availability, we can go for dual or multi-region storage locations. when our data availability is not required worldwide, It is good to go with a single region.

Storage classes in GCS:

The storage class is a piece of metadata that is used by every object in the GCS. It defines the object's availability and pricing.

  • We can change the storage class of the existing object by either rewriting the object or using Object Lifecycle Management.
  • Or we can enable AutoClass feature on a bucket level to let GCP handle the transitions from one storage class to another automatically.
  • When the bucket is created, we can specify the default storage class. when we add objects to this bucket, Objects will inherit this storage class metadata unless we set a different storage class.
  • When we don't specify the storage class at the time of bucket creation, the default storage class will be set. The default storage class is Standard Storage.
  • Changing the default storage class of the bucket will not affect the objects that are already present in the bucket.

Before moving into types of storage classes, let’s see some aspects that apply to all storage classes.

  • Unlimited storage with unlimited access.
    Unlimited data can be stored in all types of classes and Also offers unlimited access.
  • Data can be stored in worldwide locations with worldwide accessibility.
  • Very low latency and high durability (11 9’s).
  • Data can be replicated anywhere so that it is always available.
  • These storage classes also offer security, tools, and API’s for handling the data effectively.

Available Storage classes:

  1. Standard Storage
  2. Nearline Storage
  3. Cold Storage
  4. Archive Storage

To learn about the storage classes better, I took the below table from GCP itself.

Standard Storage class:

  • This is the best storage class for the data that frequently needs to be accessed and can be termed “hot-data”. Typically, the data is stored for only a brief period in this class.
  • When Compute Engines or Kubernetes engine clusters need to access this data we can store this in a single region which will be useful for data-intensive operations & reduce the network charges.
  • When we store the data in dual regions, we still can get good performance when the data is accessed by other Google products that are running in the associated regions. We also achieve data availability when store data in different regions.
  • Multi-region is appropriate for standard class when the data needs to be accessed worldwide like static web pages, data-oriented mobile applications, etc.
  • Availability % for standard storage class across the regions:
  • As per the SLA (Service Level Agreement), It is GCP’s responsibility to provide the above-given availability percentages for standard class for all types of Location/Region levels. When it fails to meet the uptime, Users are eligible to receive credits based on the given SLA.

Nearline Storage class:

  • This is a low-cost, highly durable storage class for storing infrequently accessed data. This is good when you think the availability can be slightly lower when compared with the Standard class.
  • The minimum storage duration of this class is 30 days, there is a cost for data access which is an acceptable trade-off for low data storage costs.
  • When we need to store a lot of files and will access those files only few times a month, then this class could be a great choice.
  • Availability % for Nearline storage class for all the region-level storages,

ColdLine Storage class:

  • Coldline storage is a low cost highly durable storage for storing infrequently accessed data. This is the best choice when we can afford slightly lower availability than Standard and near-line storage classes.
  • The minimum storage days in this class is 90 days.
  • higher cost of accessing data and lower cost for storing.
  • This can be selected when data will be accessed only almost once a quarter.
  • Availability % for ColdLine storage class for all the region-level storages,

Archive Storage Class:

  • Archive storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. Our data is available within milliseconds, not hours or days.
  • This has higher costs for data accessibility and 365 days minimum storage days. This is suitable for data accessed only once a year.
  • For Disaster recovery purposes also we can use this as a backup storage.
  • Availability % for ColdLine storage class for all the region-level storages,
SLA for Archive Storage class

Control on Objects:

We can control who has access to our buckets/objects and the level of access. When we create a bucket, we can decide the access of the objects using two types.

  1. Uniform Access
  2. Fine Grained Access.

Uniform Access uses Identity and Access Manager (IAM) to define the permissions to your objects. IAM applies the permissions to all the objects available in the bucket or group of objects based on common-name-prefixes. IAM also gives more options like the below-given sample use cases.

  • Applying permissions on managed folders
  • Conditioning permissions on the objects. For example, Temporary access to the user to solve the latest issue, giving access only to the requests from the office connections.

Fine-grained access allows to use of both IAM and Access Control Lists (ACL) together to manage object permissions. ACLs allow you to specify permissions to the object level. Since this is a fine-grained access system, There is a chance for unintentional data exposure because we may need to use both kinds of access control methods (IAM & ACL). GCP recommends Uniform level access for better clarity and management unless it is highly necessary.

Once we enable uniform access to a bucket, we have 90 days to switch back to Fine-grained access otherwise Uniform access will be permanent. Apart from these methods, there is other methods available to update the access of the objects.

  • Signed URLs are used to give time-limited access to the objects in the bucket. By this, we generate a URL that will have read/write access for a limited time. Then we can share the URL with the people to whom we want to give access. This will work irrespective of whether they have a Google account or not. We can use this option in addition to IAM and ACLS. For example, you can give bucket-level access to the few users inside your organization, and by using the Signed-URL option, you can give access to other users who are not part of your organization.
  • Singned-Policy documents are used to define what can be added to the bucket. This allows greater control over what should be added to a specific bucket. This can be created based on the type of object, size of the object, and other object characteristics. For example, you allow users of your website to upload some specific types of photos. This signed-policy documents method can be used to control what they upload by creating some criteria.. cool right?
  • There are a few more options available in GCP to give the permissions on the objects.

GCS — Hands On

To do the hands-on, We can leverage the Free-tier option given in the GCP which is enough to understand the working of most of the services offered. Use this link GCP Free Tier to create your free account. For this, you need Credit Card details for auto-payment. Mostly, we don't need to pay anything for this learning. But this is the process of GCP to get the CC details for payment procedures. So we must be careful in spinning the Compute Engines and other costlier services which may lead to pay.

Once you create a Free-tier account, There will be already a Project created for you under the name “My First Project”. You can continue with this or create a new project. When you click on the New Project, you will see something like below. Fill in the name of your project and GCP will do all the other required stuff. Once you create the new project, You can use the project dropdown to select the new project you created.

Project creation

Bucket Creation:

The next important thing in the GCS is to create a bucket to store our objects. Let’s do that. From below images, we can see how I created the bucket.

name your bucket.
  • When we type the bucket name, GCP automatically validates the name which needs to be unique across the GCP, and also gives rules to be followed when giving the name.
  • Once you give the name, Labels are the optional param we can give. It will be useful when we have many buckets in our project. We can use this label to group our buckets like Buckets related to batch data, streaming data, etc. Once we complete naming our bucket, let’s proceed to fill up the configurations to where we store our data.
Storage Location Option
  • We see that I selected a single region storage which is more than enough for the practice purpose. But in real-world scenarios, when availability and performance are key, We have to choose appropriate storage locations. Keep in mind that going for multiple regions also could increase the close of the storage.
  • The next option we see is storage classes which we discussed earlier.
Storage classes
  • There are a few options available to specify the storage class details when creating the bucket. When we are not sure about the accessibility of the data, we can use the AutoClass to handle the transitions between the storage classes which will be taken care of by GCP itself automatically.
  • Otherwise, we can set a default class to the bucket from the available classes which can be changed later when required.
Data protection
  • Google always offers high priority for data protection. But, if we need we can add a few rules to make our data protected from accident deletions, etc. Object Versioning is one of the options to restore the deleted or overwritten objects. This may also lead to cost, It is better to limit the number of versions for the particular objects stored. We can schedule to remove the older versions after some days. Also, there is an option to make sure a particular object is not deleted/modified for a specific period. This is called retention.
  • There is one more option on Data Encryption. Data in the GCP is always encrypted. But we can change this behavior to use our key for the encryption process (Customer-managed Encryption) otherwise, we can let GCP do this encryption for us at no additional cost.
  • When you click on the Create Button for creating a Bucket, there is a pop-up show about Public access to your objects. When it got selected, no public access was given to your objects. If you have resources in your bucket that are used in a static website, it is required to give public access to the bucket so that can be accessed publicly. So based on our use cases, we can decide on this feature.

We created our first bucket, and I see this in the buckets section.

Bucket creation
  • We can click on the bucket name and see so many options given like upload files, upload folder, create folder, Transfer data, and much more. We can play around with the options by adding any type of files, or folders, etc. Also, we see many options like Objects, Configurations, Permissions, Protection, etc..

Now it is all our work to get into dirty hands-on with GCS. We can also access our bucket using gsutil tool available in the Google Cloud shell. you can access the Cloud Shell from the top left corner.

This will open a terminal where you can use gsutil option to explore our buckets. Please do your own explorations on this tool. just typing the gsutil on the command line will give all the options available with it. below are some commands I ran to check out how it works.

gsutil

I think, I have covered most basic topics on GCS and did some hands-on stuff. (I didn't add all the hands-on-related images in this post. I am leaving that part to you guys to get your hands dirty with GCS). The most important features I loved in GCS are,

  • Object Versioning and control over it.
  • Singed URLs and Signed Policy Documents along with Uniform and Fine-Grained Access.
  • And we can use the GCS to host our static website. how cool this is. (I will explore this and will create a new post on this.)

Do comment if we have any other important interesting features, or use cases in the GCS, I love to explore, try and post them. Also, we can have learning conversations in the comments section to increase our knowledge of GCS. Looking forward to adding more posts in GCP services. ta-ta for now.

Gobalakrishnan Viswanathan.

--

--