The Four Golden Signals: What Google Engineering Team Taught Me
What SRE at Google taught me about watching over systems
Imagine you have a big, cool robot. It's working hard, moving parts and making things. Your job? Make sure it keeps working and doesn’t break down.
Not fix it after it breaks — spot the warning signs before it even thinks about breaking.
That's a little like what Site Reliability Engineers (SREs) at Google do. And in their world, they talk about something called The Four Golden Signals.
I still remember the first time I heard about them. It was like someone finally handed me a map when I had been hiking blindfolded.
So today, I want to pass that map to you.
Why Do We Need "Signals"?
Before we dive into what the four golden signals are, let’s ask: why even have them?
If you’re building or running systems — websites, apps, APIs, whatever — you can’t watch everything. There’s just too much.
Instead, you want to look at a few key indicators that whisper, “Hey, something’s not right here...” before a full-blown disaster hits.
That’s what these signals are: a simple, powerful set of checks to keep your system healthy without drowning you in noise.
The Four Golden Signals (a Quick Peek)
Here’s the cheat sheet:
Latency: How long does it take?
Traffic: How much is happening?
Errors: How often does it fail?
Saturation: How full is it?
That's it. Just four.
But the magic is in how they help you see your system like an SRE does — catching problems early and understanding them fast.
Let’s walk through each one in a simple, real-life way.
1. Latency — "Are We Slow?"
Latency is about speed.
Think about when you open an app and it takes forever to load.
That frustration? That’s latency showing its ugly face.
For services, it’s how fast a request gets a response. It can be measured in seconds or milliseconds per endpoint or component, depending on where you want to measure it.
If latency spikes, users start noticing. And trust me, they’re not forgiving.
👉 Tip: Pick a time limit that feels 'normal' for your system — and set up a simple alert if things start getting slower. It's easier to fix a slow system early than after users start leaving.
2. Traffic — "How Busy Are We?"
Traffic measures how much work your system is doing.
Imagine running a coffee shop. Traffic is how many customers come through the door.
Ten per hour? Manageable.
A thousand per hour? Panic.
In systems, it might be the number of requests per second, the number of active users, or the size of the data moving around.
👉 Tip: Sudden drops or crazy spikes in traffic can hint at bigger problems — a system crash, a DDoS attack, or even just a trending TikTok.
3. Errors — "Are We Messing Up?"
Errors track how often things go wrong.
Simple enough, right? You want to know when requests fail, when weird bugs pop up, or when users get angry 404 pages instead of what they asked for.
👉 Tip: Track different types of errors separately, like 5XX (server problems) and 4XX (user mistakes). This helps you quickly spot if the problem is something you need to fix on the server, or if users are just sending bad requests.
4. Saturation — "Are We Almost Full?"
Saturation is about capacity.
How close are you to the edge? How much strain can your system take before it tips over?
Think about your laptop running a million tabs. It slows down, fans roar, and eventually... it crashes.
Servers do the same when they’re saturated.
👉 Tip: It’s not just about CPU. Look at memory, disk space, database connections — anything that can run out.
Why the Golden Signals Matter (and Always Will)
The Four Golden Signals aren’t just "a Google thing."
Usually, when engineers talk about Google's best practices, they say things like, 'Oh, that's only applicable to large-scale companies like Google.' But I actually think the four golden signals are a great starting point for observability, even for your pet project. Let me know in the comments if you disagree — I'm not sure my opinion will change, but let's see.
They simplify complexity.
They help you focus on what matters.
And they make sure you spend more time building cool stuff — not firefighting.
Whenever I build a new app, design an API, or spin up a service, I think about these four. They are my early warning system. My safety net.
And honestly? They’ve saved my butt more times than I can count.
A Final Thought
If you’re running anything important (and these days, what isn’t important?), start watching these four signals.
You’ll sleep better. Your users will thank you. Your team will think you have superpowers.
Until next time,
Adlet
🎉 Nearly 1,800 Readers soon! Thank You for Your Support!
You’re the real MVPs!
Loved this post? 💙 Hit that like button—it means the world to me and helps me grow.
Stay curious!
Thanks for this article, I find it very valuable and will improve my monitoring at work. Generally speaking, what's your go to tool / set of tools / libraries you're using for monitoring? Are you automatically implementing otel metrics even for pet projects / new projects?