Massive Microsoft 365 outage caused by faulty ECS deployment

By Sergiu Gatlan, Bleeping Computer

In a preliminary post-incident report, Microsoft has revealed that this week's 5-hour-long Microsoft 365 worldwide outage was triggered by a faulty Enterprise Configuration Service (ECS) deployment that led to cascading failures and availability impact across multiple regions.

ECS is an internal central configuration repository designed to enable Microsoft services to make wide-scope dynamic changes across multiple services and features, as well as targeted ones such as specific configurations per tenant or user.

What initially started like a minor Microsoft Teams outage ended up expanding downstream to multiple Microsoft 365 services with Teams integration that also leverage ECS, including Exchange Online, Windows 365, and Office Online.

As a result, users worldwide began reporting that they could not use Microsoft Teams and multiple Microsoft 365 services or features.

"This issue affected the users' ability to connect to the Microsoft Teams Desktop, Web and Mobile clients," the company explained in its preliminary report.

"Telemetry indicated that approximately 300k calls were impacted by this event. The Asia Pacific (APAC) region was most affected due to business hours coinciding with the impact window. Additionally, Direct Routing and Skype MFA were mostly impacted service."

According to Redmond's report, the incident started on Thursday, July 21, at 1:05 AM UTC, with the company's engineers remediating most of its impact within five hours, by 6:00 AM UTC.

However, there was also some isolated residual impact until 1:14 PM UTC the same day, matching customer reports on social media.

In the end, the incident affected users attempting to utilize one or more of the following Microsoft 365 services and features (all impacted to some degree by the outage):

  • Exchange Online (Delays sending mail)
  • Microsoft 365 admin center (Inability to access)
  • Microsoft Word within multiple services (Inability to load)
  • Microsoft Forms (Inability to use via Teams)
  • Microsoft Graph API (Any service relying on this API may have been affected)
  • Office Online (Microsoft Word access issues)
  • SharePoint Online (Microsoft Word access issues
  • Project Online (Inability to access)
  • PowerPlatform and PowerAutomate (Inability to create an environment with a database)
  • Autopatches within Microsoft Managed Desktop
  • Yammer (Impact to Yammer flighting)
  • Windows 365 (Unable to provision Cloud PCs)


Comments

Popular posts from this blog

Why remote desktop tools are facing an onslaught of cyber threats

Ransomware gang starts leaking alleged stolen Change Healthcare data

Notepad++ wants your help in "parasite website" shutdown