May 7 (Reuters) – Amazon’s cloud services were largely back online on Friday after overheating at one of its data centers triggered an outage that impacted companies including cryptocurrency exchange Coinbase.
The cloud giant said it was making progress in resolving the issue after a rapid spike in temperatures at a single data center in northern Virginia on Thursday knocked out power. A full recovery would take several hours, it said. Coinbase said its services were restored after the outage hampered their availability.
Overheating in data centers has been a key problem for companies: advanced AI and cloud servers crunching data require massive amounts of power and give off intense heat. To regulate the heat, data center operators have been increasingly turning to water or specialized coolants, which are thousands of times more efficient than traditional air cooling.
Thursday’s outage was the second major overheating-driven disruption in recent months, after derivatives marketplace CME Group suffered one of its longest outages in years last November, due to a cooling failure at data centers run by CyrusOne.
At 8:12 a.m. ET, outage reports for AWS on outage tracking website Downdetector had gone down to 72, from a peak of nearly 600 on Thursday night.
AWS has been bringing additional cooling system capacity online but said it was taking longer than expected to add the capacity required to safely restore all remaining affected systems.
The cloud computing platform also said it had shifted traffic away from the impacted Availability Zone for most services. An “Availability Zone” comprises one or more connected physical data centers and are designed to operate independently within an AWS Region.
The trading platform of CME Group, the world’s largest derivatives marketplace, had also faced some technical issues earlier, but was back online after it completed essential maintenance work. The AWS outage had no impact on CME Group, the company said.
AWS saw a major outage last October that caused global turmoil among thousands of sites, including some of the most popular apps like Snapchat and Reddit.
That was the largest internet disruption since the CrowdStrike malfunction in 2024 hobbled technology systems in hospitals, banks and airports, highlighting the vulnerability of the world’s interconnected technologies.
(Reporting by Mrinmay Dey in Mexico City, Deborah Sophia, Rhea Rose Abraham and Shivani Tanna in Bengaluru; Additional reporting by Rishabh Jaiswal; Writing by Shubham Kalia; Editing by Sumana Nandy, Kim Coghill and Devika Syamnath)





Comments