m@ksim.pro
Back to all posts
IT 4 min read

Public cloud SLA: what it says and what it does not

A breakdown of where the provider's responsibility ends and the customer's begins - and why this matters before an incident, not after.

When a company decides to move to a public cloud, one of the main selling points sounds like this: "There is a guaranteed SLA, we do not have to worry about reliability anymore." That is not entirely wrong. But it is far from the whole story. The decision to move to the cloud deserves careful framing from the start - own datacenter, hosting, or cloud is the first choice, and virtualisation is not the cloud is a boundary that is often blurred.

I have seen situations where, after a serious incident, a company discovered that the provider had done everything according to the contract - and the business was still down for several hours. The problem was not the provider. The problem was that nobody had read the SLA carefully.

What the provider actually guarantees

A typical public cloud SLA guarantees the availability of a specific service - a virtual machine, storage, a load balancer. Usually 99.9% or 99.95% per month. That works out to roughly 45 minutes or 22 minutes of allowed downtime per month respectively.

Details that are also in the document, but less often read:

  • the guarantee covers one service, not the entire application;
  • compensation for SLA breach is a credit toward future invoices, not reimbursement for damages;
  • planned maintenance is often excluded from the SLA or counted under separate terms;
  • the SLA only kicks in once an incident is acknowledged by the provider through their own procedures.

What the provider does not guarantee

This part matters more. The provider does not guarantee:

  • the availability of your application - only the infrastructure under it;
  • that the configuration your team built is correct;
  • that data is preserved if backups were not set up or were set up incorrectly;
  • recovery within your timeline, unless you have your own DR plan;
  • behavior in the region you chose, if the outage hits exactly that region.

The boundary of responsibility runs exactly where the provider's control ends. Beyond that, it is your zone.

The customer's zone: where problems usually appear

In practice, most incidents that feel like "the cloud went down" happen in the customer's zone:

  • one availability zone instead of two, because "it costs a bit more for now";
  • no automatic failover, because "we'll sort that out later";
  • backups configured but never tested;
  • dependency on a single region with no fallback;
  • network configuration that only one person on the team understands.

None of these problems appear in the provider's SLA - because they are not the provider's responsibility.

How to read an SLA properly

Before signing a contract or finalizing an architecture, I recommend answering a few questions:

  1. Which specific service does this SLA cover - and is that the only service the application depends on?
  2. How does the provider measure availability - from outside or inside?
  3. What happens to data if the service is unavailable beyond the guaranteed window?
  4. What does the compensation process look like, and how long does it take?
  5. Do we have our own SLA toward the business - and does it align with what the provider guarantees?

The last question is the most uncomfortable one. If the business expects recovery in 15 minutes, and the provider allows 45 minutes of downtime per month under the SLA - those do not match. The gap has to be closed with architectural decisions on your side.

The practical conclusion

An SLA is a contract about minimum guarantees, not about the reliability of your product. Product reliability is a separate engineering challenge that the cloud makes easier, but does not solve for you.

A useful check: take the last real incident, or model a hypothetical one, and walk through every layer explicitly - what fails, who fixes it, by what time. That is usually where you find out where the real boundary actually sits.

Back to all posts
Contact

If this resonated, write to me. I reply personally.

WhatsApp