CrowdStrike Blames Single Undetected Sensor For Global Outage
Cybersecurity company CrowdStrike released the Root Cause Analysis (RCA) report that blamed an undetected sensor as the cause behind the Windows system crash in July.
In a 12-page detailed report, it was confirmed that an undetected sensor written into an update for its Falcon software caused the outage impacting 8.5 million Microsoft users globally, ABC News reported.
The global IT outage, probably the biggest in history, affected businesses, airports and personal users. In Australia, the loss was estimated to be more than AU$1 billion.
As CrowdStrike announced it would take preventive steps, analysts called it an "embarrassing" mistake that first-year programming students were taught how to avoid.
"When they wrote this report, they must have been feeling very embarrassed. First-year programming students are taught about the 'stack', the series of instructions that need to be executed in a CPU (central processing unit)," Sigi Goode, a professor of information systems at the Australian National University, said.
Falcon software was installed to check for potential threats, and to lock them once detected.
"Falcon is looking at a range of sensors — a range of indicators — to see if something is wrong," Goode said.
Called the Channel 291 incident, the company introduced a new capability in Falcon's sensors, and sent the update to certain Windows hosts. While checking for dubious activity, sensors change the location or the number of sensors.
On July 19, when the update was sent, Falcon expected 20 input fields, but it had 21 input fields. The crash occurred due to this "count mismatch," the report stated.
As Falcon was integrated into the Windows, the error caused the entire system to crash, impacting global users.
"The Content Interpreter expected only 20 values," the RCA report said. "Therefore, the attempt to access the 21st value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash."
Toby Murray, associate professor at the University of Melbourne's School of Computing and Information Systems, pointed out that basic checks by a human developer would have detected the error.
"That is an incredibly basic and fundamental mismatch that was always going to lead to catastrophic problems, sooner or later," he said. "The fact that the CrowdStrike developers were able to have this obvious inconsistency between the data file format and the software code means that the most basic forms of quality review and assurance were not being correctly carried out."
Goode said the update should have been carried out through a staged deployment.
However, the quality assurance team of CrowdStrike have claimed that they go through extensive processes, including automated and manual testing, validation and rollout steps.
The RCA report comes as U.S. carrier Delta Air Lines sought damages from CrowdStrike and Microsoft for costing an estimated revenue loss of $500 million (AU$760 million). The two IT companies stated that the airline refused on-site assistance, indicating that the problems with the aviation company might have started much before the global outage, reported The Hacker News.
Australian Industry Group CEO Innes Willox put down the loss at billions of dollars, adding that it was not clear how the affected businesses could claim compensation from CrowdStrike.
Apologizing for the global meltdown, CrowdStrike CEO George Kurtz had said, "We are using the lessons learned from this incident to better serve our customers. To this end, we have already taken decisive steps to help prevent this situation from repeating, and to help ensure that we — and you — become even more resilient."
© Copyright 2024 IBTimes AU. All rights reserved.