What is the difference between SLA, SLO, and SLI?

SLI (Service Level Indicator) is the metric itself -- for example, the percentage of requests completed in under 200ms. SLO (Service Level Objective) is the internal target for that metric -- say, 99.95% of requests under 200ms. SLA (Service Level Agreement) is the contractual promise to customers, with financial penalties if breached -- typically set below the SLO to provide a buffer. Google Cloud's approach: SLIs measure, SLOs target, SLAs commit.

What happens when an SLA is breached?

Most SaaS SLAs include service credits as the remedy. AWS, for example, offers 10% credits for monthly uptime between 99.0-99.99%, 25% for uptime between 95.0-99.0%, and 100% credits below 95.0%. Customers must typically file a claim within 30 days. In enterprise contracts, SLA breaches can trigger penalty clauses, contract renegotiation rights, or even termination rights.

Service Level Agreement (SLA)

Definition

A Service Level Agreement (SLA) is a formal contract between a service provider and a customer that defines the expected level of service -- typically covering uptime (availability), performance (latency, throughput), and support responsiveness. SLAs include measurable targets and specify what happens when those targets are missed, usually in the form of financial credits or penalties.

The SLA sits at the top of a three-level hierarchy. Service Level Indicators (SLIs) are the raw metrics: request latency, error rate, uptime percentage. Service Level Objectives (SLOs) are the internal targets engineering teams aim for. SLAs are the external, contractual commitments -- always set at or below the SLO to provide a margin of safety. Google popularized this hierarchy in their Site Reliability Engineering (SRE) book.

Common SLA tiers in SaaS: 99.9% uptime (the "three nines") allows roughly 8.7 hours of downtime per year. 99.99% ("four nines") allows about 52 minutes per year. 99.999% ("five nines") allows about 5 minutes per year. Each additional nine requires exponentially more engineering investment in redundancy, failover, and monitoring.

Why It Matters for Product Managers

SLAs constrain what you can ship and how you ship it. If your product guarantees 99.99% uptime, you cannot deploy changes during business hours without zero-downtime deployment practices. Maintenance windows shrink. The bar for testing before production rises. Every architecture decision must consider failure modes.

For PMs at B2B SaaS companies, SLAs are also a competitive differentiator and a pricing lever. Enterprise customers evaluate SLAs during procurement. A startup offering 99.9% uptime will lose deals to a competitor offering 99.99% -- assuming both can actually deliver. Promising an SLA you cannot meet is worse than not offering one, because breaches erode trust and cost real money in service credits.

PMs also need to understand SLAs when their product depends on third-party services. If your payment processor has a 99.9% SLA and your notification provider has a 99.9% SLA, your combined checkout-plus-notification flow has a theoretical maximum of roughly 99.8% availability. Each dependency in your stack compounds the risk.

How It Works in Practice

Define SLIs -- Work with engineering to identify the metrics that matter most to customers. For a web application: availability (percentage of successful responses), latency (95th percentile response time), and error rate. For an API: all of the above plus throughput and rate limit fairness.

Set SLOs -- Establish internal targets that are stricter than the customer-facing SLA. If the SLA promises 99.9% uptime, the SLO should target 99.95%. This gives the team an error budget -- a known amount of acceptable downtime that can be "spent" on risky deployments or experiments.

Formalize the SLA -- Legal and product teams draft the customer-facing agreement specifying the commitment, measurement methodology, exclusions (e.g., scheduled maintenance, customer-caused issues), and remedies (service credits). Salesforce, AWS, and Azure all publish their SLAs publicly.

Monitor continuously -- Automated dashboards track SLIs against SLOs in real time. When an SLI approaches the SLO threshold, alerts fire. Teams like Google's SRE use error budgets: if the monthly error budget is 50% consumed by mid-month, the team freezes risky deployments.

Report and remediate -- Provide customers with regular uptime reports (monthly or quarterly). When the SLA is breached, issue service credits proactively rather than waiting for claims. This builds trust even when things go wrong.

Common Pitfalls

Promising more than you can deliver. An SLA of 99.99% uptime requires redundant infrastructure, automated failover, zero-downtime deployments, and 24/7 on-call coverage. If your engineering team does not have these capabilities, set a lower SLA and invest in the infrastructure to raise it over time.

Measuring the wrong SLIs. Server uptime is not the same as user-perceived availability. Your server might return 200 OK while serving a blank page due to a front-end bug. Measure what customers experience, not what your server reports.

Ignoring the cost of each nine. Going from 99.9% to 99.99% might double your infrastructure bill and require hiring an SRE team. PMs should model the ROI: does the additional reliability win enough enterprise deals to justify the cost?

SLAs without error budgets. An SLA without an internal error budget means the team either never takes risks (no deployments, no experiments) or constantly violates the SLA. Error budgets provide a structured way to balance reliability with velocity.

DevOps -- the practices and culture that enable teams to meet SLA commitments through automation and monitoring

Continuous Delivery -- the deployment practice that supports SLA compliance through safer, incremental releases

Dependency -- each external dependency in your stack affects your ability to meet SLA targets

Service Level Agreement (SLA)

Definition

Why It Matters for Product Managers

How It Works in Practice

Common Pitfalls

Related Terms

Frequently Asked Questions

Explore More PM Terms

Service Level Agreement (SLA)

Definition

Why It Matters for Product Managers

How It Works in Practice

Common Pitfalls

Related Concepts

Related Terms

Frequently Asked Questions

Explore More PM Terms