The global internet infrastructure, underpinning most of modern-day operations, experienced a notable surge in outages during the week of October 21-27, 2024. For the first time in four weeks, the number of total global outages rose significantly, according to ThousandEyes. The report detailed 181 network outages, marking a 17% increase from 155 outages the prior week. U.S.-specific outages also climbed, increasing by 10% from 63 to 69. This article dissects the weekly trends across ISP, public cloud, and collaboration app networks, contextualizing these findings with notable incidents involving Rackspace Technology and Cogent Communications.
Trends in Global and U.S. Outages: A Mixed Performance Across Categories
The week of October 21-27 saw a 13% rise in ISP outages globally, climbing from 95 to 107. This trend extended to the U.S., where ISP outages grew by 19%, increasing from 37 to 44. The consistent growth in ISP-related disruptions underscores the challenges providers face in maintaining stable connections amid growing internet traffic demands. This increase is concerning, given ISPs’ foundational role in providing connectivity to millions of users worldwide.
Similarly, public cloud network outages experienced a steep rise. Globally, these outages jumped from 19 to 26, a 36% increase, while in the U.S., they more than doubled, surging from four to 11. This surge indicates growing pressures on cloud service providers as enterprises increasingly migrate to cloud-based solutions. Collaboration app networks also saw global outages increase significantly, from one to six. However, in the U.S., collaboration app network outages remained at zero, a rare point of stability amidst the otherwise rising trend.
This surge in outages across multiple network categories highlights the fragility of global internet infrastructure, particularly under the strain of increasing digital workloads. The varying performance across regions and categories suggests that some networks may lack the redundancy and scalability needed to prevent disruptions under peak usage.
Spotlight on Rackspace Technology: A 53-Minute Global Disruption
One of the most significant outages of the week involved Rackspace Technology, a U.S.-based managed cloud computing provider. The disruption, which lasted 53 minutes on October 21, affected customers and downstream partners across regions including the U.S., Europe, and Asia. Initially centered on nodes in Dallas, TX, the outage quickly cascaded to other regions, exposing the interconnectedness and vulnerabilities of modern cloud infrastructures.
Rackspace’s reliance on a centralized node structure may have contributed to the scale of the impact. Within 40 minutes, some of the affected nodes in Dallas cleared, coinciding with a reduction in impacted regions. However, the length of the outage raises questions about the effectiveness of Rackspace’s incident response mechanisms. With the company catering to diverse industries, from financial services to healthcare, the outage likely disrupted critical operations, underscoring the high stakes involved in ensuring cloud reliability.
This incident illustrates the growing need for cloud providers to implement distributed and decentralized network architectures. By doing so, they can limit the ripple effects of localized disruptions, minimizing their impact on global operations.
Cogent Communications: A Multinational Ripple Effect
On October 26, Cogent Communications experienced an outage that demonstrated the complexity of managing a multinational transit network. Lasting 23 minutes, the incident was divided into two occurrences over a 30-minute span. Initially concentrated in Washington, D.C., Raleigh, NC, and Bilbao, Spain, the outage quickly spread to nodes in New York, Los Angeles, Dallas, and other major cities.
The cascading nature of this outage highlights the challenge of maintaining stability in transit networks that span continents. Cogent’s failure to contain the disruption within a single region underscores the vulnerabilities inherent in large-scale internet transit systems. As the outage evolved, the number of impacted regions, customers, and downstream partners grew, demonstrating the extensive reach of even brief disruptions in Tier 1 networks.
The incident also raises concerns about the resilience of global transit providers. With the internet’s backbone dependent on a handful of Tier 1 carriers like Cogent, any disruption in their networks can have far-reaching consequences, affecting not only end-users but also the operations of other network providers.
Lessons Learned and the Path Forward
The network outages reported during the week of October 21-27 reflect a broader trend of increasing disruptions as internet traffic and reliance on digital infrastructure grow. ISPs, cloud providers, and transit networks must prioritize investments in redundancy, scalability, and automation to prevent such incidents from escalating in the future.
For ISPs, the 13% global rise in outages underscores the need for enhanced infrastructure maintenance and proactive monitoring systems. Public cloud providers, grappling with a 36% global increase in outages, must adopt distributed cloud architectures to mitigate the impact of localized failures. Additionally, collaboration app providers, though stable in the U.S., need to address the vulnerabilities that led to a sixfold increase in global outages.
Looking forward, network providers can draw valuable lessons from incidents like those involving Rackspace Technology and Cogent Communications. These cases highlight the importance of decentralized node structures, real-time monitoring, and rapid incident response capabilities. Furthermore, collaboration across ISPs, cloud providers, and transit networks will be essential to building a more resilient global internet.
In conclusion, the ThousandEyes report for October 21-27 provides a sobering look at the challenges facing global internet infrastructure. By analyzing trends and learning from notable outages, stakeholders across the internet ecosystem can work toward creating a more robust and reliable digital foundation for the future.