LogicLoop Logo
LogicLoop
LogicLoop / clean-code-principles / How a Simple Null Pointer Crashed Google and Took Down the Internet
clean-code-principles June 17, 2025 4 min read

How a Simple Null Pointer Dereference Crashed Google and Took Down the Internet for Hours

Sophia Okonkwo

Sophia Okonkwo

Technical Writer

How a Simple Null Pointer Crashed Google and Took Down the Internet

On June 12, a significant portion of the internet experienced a major outage lasting approximately three hours. Major services including Google, AWS, Cloudflare, Spotify, and numerous downstream services were affected. What's remarkable about this incident is that it stemmed from one of the most basic programming errors: a null pointer dereference in Google's infrastructure code.

The Scale and Impact of the Outage

The outage began around 1:45 PM EST, quickly affecting a wide range of services. Down Detector, which tracks website outages, showed simultaneous failures across major internet platforms. This wasn't just a minor glitch - it effectively rendered large portions of the internet unusable for many users, particularly in the United States.

The Google outage effectively took down a majority of internet services for about 3 hours across multiple regions
The Google outage effectively took down a majority of internet services for about 3 hours across multiple regions

The incident highlighted a concerning vulnerability in our internet infrastructure: the cascading effect when core services fail. As cloud providers like Google handle increasingly critical infrastructure, their failures can have disproportionate impacts across the digital ecosystem.

Understanding What Happened: The Technical Details

According to Google's incident report, the root cause was traced to their API management and control planes. These systems are responsible for ensuring that API requests are properly authorized and meet policy requirements before reaching their endpoints.

The specific component that failed was a binary called "service control" - a regional service that reads quota and policy information from a data store. This metadata gets replicated globally almost instantly to manage quota policies for Google Cloud and its customers.

Control planes and management planes handle the configuration and administration of network infrastructure
Control planes and management planes handle the configuration and administration of network infrastructure

The Timeline of Events

  1. On May 29, 2025, Google added a new feature to the service control binary for additional quota policy checks
  2. This code change went through a region-by-region rollout
  3. The problematic code path was never exercised during testing because it required a specific policy change to trigger it
  4. As a safety precaution, engineers included a "red button" to disable this policy serving path if needed
  5. On June 12, 2025, at approximately 10:45 AM PDT (1:45 PM EST), a policy change was inserted into the regional database tables
  6. This policy data contained unintended blank fields
  7. When service control attempted to process these blank fields, it encountered a null pointer dereference
  8. The binaries crashed globally, taking down Google Cloud services and many dependent systems

What is a Null Pointer Dereference?

For non-programmers, a null pointer dereference is one of the most fundamental errors in programming. In simple terms, a pointer is a variable that contains a memory address - it "points" to another location in memory where data is stored.

A pointer stores the memory address of data, while a null pointer has no valid memory reference
A pointer stores the memory address of data, while a null pointer has no valid memory reference

When a pointer has the value "null" (essentially zero), it means it doesn't point to any valid memory location. If code attempts to access or modify data at that non-existent location (dereferencing the null pointer), the program crashes.

C
// Example of a null pointer dereference in C
char *not_null = "hello";  // Valid pointer to a string
char *null_ptr = NULL;     // Null pointer (value is 0)

printf("%s %s\n", not_null, null_ptr);  // This might work
null_ptr[0] = 'A';                     // This will crash the program
1
2
3
4
5
6

In Google's case, when their service control binary encountered blank fields in the policy data, it failed to check if a pointer was null before attempting to use it. This caused the binary to crash, and because this happened globally across all regions simultaneously, it led to a widespread outage.

Why Wasn't This Caught Earlier?

There were several factors that contributed to this issue not being caught during testing:

  • The code path containing the bug was never exercised during testing because it required a specific policy change to trigger it
  • The service control changes were not protected by feature flags, which would have allowed for gradual rollout and earlier detection
  • Feature flags are typically used to gradually enable features region by region, starting with internal projects to catch issues before they affect customers
  • Had this been feature-flagged, the issue would likely have been caught in staging environments

Lessons for Developers and System Architects

This incident provides several valuable lessons for developers and system architects:

  • Always check for null pointers before dereferencing them, even in seemingly safe code paths
  • Implement comprehensive error handling for all possible edge cases
  • Use feature flags for all significant changes, even for internal systems
  • Implement gradual rollouts with proper testing at each stage
  • Design systems with failure isolation to prevent cascading failures
  • Include emergency kill switches (like Google's "red button") for critical components
  • Test not just the happy path but also error conditions and edge cases

The Broader Implications

This incident highlights the fragility of our interconnected digital infrastructure. As more services rely on cloud providers like Google, AWS, and Cloudflare, these providers become critical points of failure that can affect vast portions of the internet.

It's particularly notable that such a simple programming error - one that's taught in introductory programming courses - could have such far-reaching consequences. This underscores the importance of robust error handling, comprehensive testing, and careful deployment practices, especially for infrastructure that supports critical services.

Conclusion

The June 2025 Google outage serves as a powerful reminder of how even the most sophisticated technology companies can be brought down by basic programming errors. A simple null pointer dereference, combined with deployment practices that didn't catch the issue early, resulted in one of the most significant internet outages in recent years.

For developers and system architects, this incident reinforces the importance of fundamentals: proper error handling, comprehensive testing, gradual rollouts, and designing systems that can fail gracefully. As our digital infrastructure becomes increasingly interconnected, the stakes for getting these fundamentals right only continue to grow.

Let's Watch!

How a Simple Null Pointer Crashed Google and Took Down the Internet

Ready to enhance your neural network?

Access our quantum knowledge cores and upgrade your programming abilities.

Initialize Training Sequence
L
LogicLoop

High-quality programming content and resources for developers of all skill levels. Our platform offers comprehensive tutorials, practical code examples, and interactive learning paths designed to help you master modern development concepts.

© 2025 LogicLoop. All rights reserved.