Cloudflare 1.1.1.1 Outage Explained: How CNAME Ordering in RFC Specs Caused the Incident (2026)

How CNAME Ordering in RFC Specs Caused Cloudflare 1.1.1.1 Outage: A Technical Deep Dive

The Unseen Impact of DNS Standard Ambiguity

In the world of internet infrastructure, even small changes can have significant consequences. This is exactly what happened when an unclear specification in the DNS standards caused a major outage for Cloudflare's 1.1.1.1 service. But here's where it gets controversial: the issue lies in how CNAME records are ordered in DNS responses, and the subtle nuances that can cause widespread disruption.

On January 8, a routine update to the DNS service changed the order in which CNAME records appeared in responses, causing some DNS clients to fail when resolving names. While most modern software treats the order of records in DNS responses as irrelevant, the Cloudflare team found that some implementations expect CNAME records to appear before all other record types. This discrepancy led to a significant outage, affecting millions of users who rely on Cloudflare's public DNS service.

The Root Cause: Unclear RFC Specifications

The issue stems from the ambiguity in older DNS standards regarding the order of records. When a DNS resolver looks up a name with a CNAME record, it may see a series of alias records linking the original name to a final address. The resolver caches each step with its own expiry time. However, if part of this chain has expired in the cache, the resolver only re-fetches the expired portion and combines it with the valid parts to form the complete response.

Cloudflare notes that when a DNS resolver looks up a name with a CNAME, it may see a series of alias records linking the original name to a final address, and the resolver caches each step with its own expiry time. If part of this chain has expired in cache, the resolver only re-fetches the expired piece and then combines it with the still valid parts to form the complete response. This is where the problem arises: previously, the code would create a new list, insert the existing CNAME chain, and then append the new records. However, to save some memory allocations and copies, the code was changed to instead append the CNAMEs to the existing answer list.

The Controversy: Different Interpretations of RFCs

While many DNS client implementations do not depend on the order, for example, systemd-resolved, others, including the getaddrinfo function in glibc, handle the chain in the resolution by keeping track of the expected name for the records and iterating sequentially, expecting to find the CNAME records before any answers. This subtle distinction in how RFCs are interpreted can have significant consequences.

On Reddit, a user comments: "On the one hand, I really respect the details in their post-mortems and really high standard in engineering, but on the other hand, I can't help to think that they do not have proper testing in place (and culture thereof) to understand the impact they have globally." This highlights the importance of thorough testing and the potential consequences of not doing so.

On a popular Hacker News thread, many users discuss whether the RFC is actually unclear, with the subtle distinction of RRsets versus RRs in message sections, or whether developers at Cloudflare misunderstood it. Patrick May comments: "A great example of Hyrum's Law: 'With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.' combined with failure to follow Postel's Law: 'Be conservative in what you send, be liberal in what you accept.'" This underscores the importance of clear and consistent specifications in API design.

The Solution: A Clearer RFC Specification

In an Internet-Draft to be discussed at the IETF, Cloudflare proposes an RFC that explicitly defines how to correctly handle CNAME records in DNS responses. According to the published timeline, Cloudflare began the global rollout on January 7 and reached 90% of servers by January 8 at 17:40 UTC. The company declared the incident soon after, began reverting the change at 18:27 UTC on January 8, and completed the rollback by 19:55 UTC.

This incident serves as a reminder of the importance of clear and consistent specifications in DNS standards. By proposing a clearer RFC specification, Cloudflare is taking steps to prevent similar incidents in the future and ensure the stability and reliability of its services.

Cloudflare 1.1.1.1 Outage Explained: How CNAME Ordering in RFC Specs Caused the Incident (2026)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 6232

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.