Curated source intelligence

Incident Library

Postmortems, reliability notes, and operational lessons from engineering teams.

Source-backed
Cloudflare Cloudflare Post Mortems - May 1

Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

We have completed a massive engineering effort to make our infrastructure more resilient. Through new tools like Snapstone and the Engineering Codex, we've implemented safer configuration changes and automated best practices to prevent future incidents.

Incidents
Cloudflare Cloudflare Post Mortems - Feb 21

Cloudflare outage on February 20, 2026

Cloudflare suffered a service outage on February 20, 2026. A subset of customers who use Cloudflare’s Bring Your Own IP (BYOIP) service saw their routes to the Internet withdrawn via Border Gateway Protocol (BGP).

Incidents
Cloudflare Cloudflare Post Mortems - Jan 23

Route leak incident on January 22, 2026

An automated routing policy configuration error caused us to leak some Border Gateway Protocol prefixes unintentionally from a router at our Miami data center. We discuss the impact and the changes we are implementing as a result.

Incidents
Cloudflare Cloudflare Post Mortems - Jan 14

What came first: the CNAME or the A record?

A recent change to 1.1.1.1 accidentally altered the order of CNAME records in DNS responses, breaking resolution for some clients. This post explores the technical root cause, examines the source code of affected resolvers, and dives into the inherent ambiguities of the DNS RFCs.

Incidents
Cloudflare Cloudflare Post Mortems - Dec 19

Code Orange: Fail Small — our resilience plan following recent incidents

We have declared “Code Orange: Fail Small” to focus everyone at Cloudflare on a set of high-priority workstreams with one simple goal: ensure that the cause of our last two global outages never happens again.

Incidents
Cloudflare Cloudflare Post Mortems - Dec 5

Cloudflare outage on December 5, 2025

Cloudflare experienced a significant traffic outage on December 5, 2025, starting approximately at 8:47 UTC. The incident lasted approximately 25 minutes before resolution. We are sorry for the impact that it caused to our customers and the Internet. The incident was not caused by an attack and was due to configuration changes being applied to attempt to mitigate a recent industry-wide vulnerability impacting React...

Incidents
Cloudflare Cloudflare Post Mortems - Nov 18

Cloudflare outage on November 18, 2025

Cloudflare suffered a service outage on November 18, 2025. The outage was triggered by a bug in generation logic for a Bot Management feature file causing many Cloudflare services to be affected.

Incidents
Cloudflare Cloudflare Post Mortems - Sep 13

A deep dive into Cloudflare’s September 12, 2025 dashboard and API outage

Cloudflare’s Dashboard and a set of related APIs were unavailable or partially available for an hour starting on Sep 12, 17:57 UTC. The outage did not affect the serving of cached files via the

Incidents
Cloudflare Cloudflare Post Mortems - Sep 2

The impact of the Salesloft Drift breach on Cloudflare and our customers

An advanced threat actor, GRUB1, exploited the integration between Salesloft’s Drift chat agent and Salesforce to gain unauthorized access to Salesforce tenants of Cloudflare and many other companies.

Incidents
Cloudflare Cloudflare Post Mortems - Aug 22

Cloudflare incident on August 21, 2025

On August 21, 2025, an influx of traffic directed toward clients hosted in AWS us-east-1 caused severe congestion on links between Cloudflare and us-east-1. In this post, we explain the details.

Incidents
Cloudflare Cloudflare Post Mortems - Aug 8

Redesigning Workers KV for increased availability and faster performance

Workers KV is Cloudflare's global key-value store. After the incident on June 12, we re-architected KV’s redundant storage backend, remove single points of failure, and make substantial improvements.

Incidents
Cloudflare Cloudflare Post Mortems - Jun 12

Cloudflare service outage June 12, 2025

Multiple Cloudflare services, including Workers KV, Access, WARP and the Cloudflare dashboard, experienced an outage for up to 2 hours and 28 minutes on June 12, 2025.

Incidents
Loading more...