Another cheat sheet: 18 load balancing options on GCP (2024)

Yaron Hollander

5 min read

May 5, 2020

Like my two previous posts, this one contains yet another cheat sheet for Google Cloud Platform (GCP) architects. And like the previous ones, I re-use here bits of the concise overview of GCP from my book, “The Professional Cloud Architect’s Big Fact Sheet”. You can read more about the book at economo.tech. I’ll focus this time on choosing the right approach to load balancing.

Load balancing has an important role in ensuring the high availability and scalability of your system. Load balancing is about distributing incoming requests across your resources so that the whole thing works in a reliable way and doesn’t get overwhelmed by traffic.

There are systems with and without load balancing, and when it’s “with”, it can be done in different ways. In GCP, under the title Cloud Load Balancing (CLB), Google offer several different services that implements the “anycast” routing concept: a single IP address lets users access multiple “backend” destinations, automatically selecting the right one. The right one could be the one with most capacity, or closest to the user, or other options based on the balancing mode. CLB will decide how to split traffic after it has done a health check of the available resources. CLB lets you choose whether you’d like a load balancer that acts as “reverse proxy”, hiding the users and the backend from each other. I’ve listed all the important facts about all of these in the book.

In this post I actually want to highlight a different point: GCP offers a lot of load balancing also when it doesn’t use the CLB title. Depending on what your system needs, a similar balance of the load can sometimes be achieved in a simpler way than what CLB offers, or sometimes, using GCP services that have different names for things. In an exam, interview or a conversation at work, a discussion about having to balance incoming requests shouldn’t always lead you to choosing CLB; there are alternatives which might meet your requirements more economically.

The table later in this post reviews 18 different GCP services or features that may, in some situations, give you the load balancing you need. I’ve used the same funny faces as in my database choice cheat sheet. The green smiley faces in the table mean “yes” or “comprehensive”. The red sad faces mean “no” or “very limited”, and the amber ones are somewhere in the middle. Note that even though the red face means that the GCP service (the row) doesn’t support a certain feature (the column), this isn’t always a bad thing because sometimes you don’t need this feature.

Let me explain what each column in the cheat sheet is for.

“Can spread load across healthy instances”. This column simply indicates whether this service performs this basic function of load balancing.
“Can spread load across VMs”. This column clarifies whether a “yes” in the first column applies to resources that sit in their own virtual machines. Often these VMs are added to an “instance group” that gets connected to the load balancer.
“Can spread load across services”. By “services” here I mainly refer to services that run in containers. So this column clarifies whether a “yes” in the first column applies to containerised services. Often these services are defined as a “network endpoint group” that gets connected to the load balancer.
“Auto-scaling”. I know, auto-scaling isn’t the same as load balancing; it’s about the ability of your solution to grow or shrink based on demand. Still, there are cases when you’re asked to ensure that your backend resources have the availability needed to serve all the requests in a balanced way, and you can achieve this by auto-scaling alone. Also, many services combine load balancing with auto-scaling.
“Session affinity”. When a client sends multiple requests to a server as part of one session, the “affinity” is the ability of the load balancer to send the later requests to the same backend that the client has already started talking to. This is important if earlier in the session the backend stored some parameters for this client. On the other hand, if requests remain “loyal” to the backend they already spoke you, then less load balancing actually takes place.
“Content-based”. This is when the load balancer can examine the request and decide where it needs to go, for example based on the path info in the URL. Since such load balancers aren’t blind, you can use one load balancer for multiple services, each one with its own set of backends.
“Global”. Some load balancing options allow you to balance the load across resources in different regions, so that users can send requests to your system in a single universal way.

There are many facts behind this cheat sheet, all of which are explained in more detail in the book, but I’d like to point out that it’s also largely based on my own judgement, and you might have a different opinion on some things. For example, I’ve given Cloud Pub/Sub a green face in the “content-based” column, because you can control the distribution of messages by the way you create topics. It’s a different approach from having a URL map in a CLB, but in my opinion it’s still a powerful content-based load balancing approach. See also the points below (after the table) about why I chose an “amber” face in some places.

There’s a small number of empty cells in the cheat sheet — they are for cases where the infrastructure is all handled for you, so you don’t need to worry about whether it’s a VM or a container.

The acronym GKE means “Google Kubernetes Engine”.

Another cheat sheet: 18 load balancing options on GCP (3)

In case you’re curious what was my thinking behind the “amber” faces, here are a few examples of how I got there.

I gave Pub/Sub an amber for spreading the load because it ensures the load of messages is distributed well between data centres, but distribution to the final destinations depends on your Pub/Sub topics and on the subscriber systems you set up at these destinations.
I gave Cloud Tasks am amber for spreading the load and auto-scaling because it doesn’t exactly spread and scale; if it can’t handle the workload now, it puts the task in a queue and runs it when it can, which can be a legitimate alternative to classical load balancing in some situations.
I gave the GKE cluster auto-scaler an amber for session affinity because GKE apps are generally happier when you use them for stateless systems that don’t rely on sessions.
The amber for the TCP proxy is because it sends all requests from the same client to the same backend, which is more crude than session affinity.
The TCP/UDP load balancers aren’t fully content-based, but you can include multiple instance groups in a target pool and multiple forwarding rules.
I gave the external TCP/UDP load balancer an amber for balancing the load because it relies on a legacy method, unlike most other load balancers.
I gave Istio and Anthos an amber in the VM column because you can probably somehow use VMs, but you’d typically use these with micro-services.

I hope this sheet doesn’t throw you off balance... If it does, consider using one of these 18 load balancing options.