Microsoft apologizes “Deep” to the entire Azure world, the discontinued teams

Microsoft apologized Tuesday for a global outage that affected Azure cloud services, including Microsoft Teams, Office 365 and Dynamics 365.

“We understand how incredibly shocking and unacceptable this is, and we deeply apologize,” Microsoft said in a post-incident review report on the outage, which was the result of “authentication errors” in several Microsoft cloud services. “We are continually taking steps to improve our Microsoft Azure platform and processes to ensure that such incidents do not occur in the future.”

Microsoft referred to changes made after a September 28, 2020 outage that affected Microsoft 365 users for five hours.

“In the September incident, we indicated our plans to apply additional protections to the Azure AD (Active Directory) Session Description Protocol (SDP) backend to prevent the class of issues identified here.”

Microsoft said that the first phase of the SDP changes is complete, and the second phase is in a “very careful development” that will end in the middle of the year.

“The initial analysis indicates that, once fully implemented, it will prevent the type of outage that occurred today, as well as the related incident in September 2020,” Microsoft said. “In the meantime, additional safeguards have been added to our key removal process, which will remain until the completion of the second phase of SDP implementation.”

Microsoft said Tuesday morning that “most services” affected by the global outage of Azure and Teams have returned online, with the exception of Intune and Microsoft Managed Desktop.

The most recent update on the outage came in a Tweet at 6:34 am in the Microsoft 365 status account.

Microsoft’s apology came after a global outage that affected the Teams collaboration application, as well as other “multiple services” of Azure, Office 365 and Dynamics 365.

The problems – revealed by Microsoft on Twitter since 3:40 p.m., Monday – could affect any user “around the world,” the company said at the time.

Even with the interruption, some industry executives are urging MSPs to move customers to the cloud faster, following the March 2 attack on the Exchange server by Chinese state-sponsored hackers.

That attack only affected local versions of Exchange Server, not Exchange Online or the cloud-based Office 365 email service. About 30,000 U.S. organizations and 60,000 organizations around the world received stolen emails as a result of the breach because they were still running local versions of Exchange.

Last week, Microsoft alerted customers to DearCry Ransomware violations following the attack on the local Exchange server. On March 12, he warned that “human-operated ransomware attacks use Microsoft Exchange vulnerabilities to exploit customers.”

Emmet Tydings, president of Columbia, MD-based AB&T Telecom, which provides internet data and voice and failover stability for MSP, said it is essential for partners to move customers to the cloud to avoid serious security issues such as they came with the Chinese attack on the local Exchange servers.

“MSPs need to move their customers to the cloud faster, and they also need to stabilize their diversity communications infrastructure in their circuits and in the event of a failover,” Tydings said. “Microsoft has emphasized that they are better able to provide security in the cloud than by sharing services locally.”

Tydings said partners need to provide robust internet connectivity with SD-WAN and wireless failover with operator plans via a SIM module and a backup of the cable to a primary fiber line.

In the event of an outage such as Microsoft Teams, MSPs should turn to alternative communications infrastructures such as Zoom or Cisco Webex, he said.

Given that the global pandemic is leading to a more distributed workforce, the local Exchange no longer makes sense for customers, according to Tydings.

“The MSPs we work with have been heroes in converting their customers from on-prem to cloud since the pandemic hit,” he said.

Rapid migration to the cloud has led companies to invest in faster software development, but not invest in improving the resilience of cloud services, said Ofer Smadari, co-founder and CEO of Portland-based StackPulse, Ore., Whose platform reliability helps teams to detect, respond to and resolve incidents with code-based automation.

“We see the results in the headlines every week, it seems, because the big brands have site interruptions,” Smadari said. “Most companies still use traditional IT tools, such as ticketing systems, service management tools or communication applications, to share information and collaborate to restore service. Companies need to move from an IT management mindset to an engineering mindset in which they build resilience in their business applications and operations to adopt a more risk-conscious approach. Only then can they recover quickly from interruptions and fulfill their promise to their customers. ”

.Source

Share this:

Related