It has been announced that CrowdStrike has blamed a test software bug for Friday’s outage which has caused chaos across the world, with the organisation’s CEO now also being called to US Congress. Following the news, David Ferbrache OBE, Managing Director at Beyond Blue gives his views on what we can learn from the incident.
“Until last Friday, few outside of the security and technology industries had heard of CrowdStrike, now the company has been catapulted into the conversations of consumers and business leaders alike. How could a company hired to protect the digital world bring down over 8.5 million machines with a single action?
“While shocking to many, it’s not entirely unsurprising to some. Just as ubiquitous technology, like Microsoft, has the power to cause mass outages when things go wrong, heavily concentrated security vendors, such as CrowdStrike, can do the same. Widely used security software has the potential to cause resilience issues or even open new attack surfaces. As a result, these vendors have a duty to robustly test updates before they are rolled out and they must prioritise the security of their own products and code.”
“The Beyond Blue CEO continues: But the incident also lies in part on the risks of rapid automated updates in live production environments and the need for organisations to have the ability to control how those updates are applied and to balance the risk of a deferred update (potentially leaving security issues open but allowing additional testing) against the risk of immediate application. This is a fine balance, and sophisticated customers need to be able to strike that balance.
“While the response from Microsoft and CrowdStrike was commendable and they communicated and issued a fix and apology quickly, the incident has raised several important issues that law makers will undoubtedly want to question George Kurtz on when he testifies in Congress. Insurance cases and legal debates around liability are mounting, and the incident demonstrated the real damage that can be caused when something goes wrong in the interconnected digital world. Governments will be keen to work with the security industry to understand how we can avoid such situations happening again in the future.
“Organisations are also now dissecting the incident to understand the extent of the damage caused and their management of the issue. Key learnings for many will be the importance of resilience and business continuity, as well as understanding the nature of their dependencies on third parties who were impacted.
“We have observed how complex it is to restore infrastructure that has a critical impact, especially when it involves cyber controls such as BitLocker and use of local administrative passwords. A significant amount of manual intervention has been required and, in some cases, significant engineering effort. The impacted systems have required a reboot in safe mode to implement the necessary configuration updates, which has been easy in some cases but harder when systems are physically distant and require a manual intervention.
“This raises further questions around resilience and reinforces the importance of organisations having disaster recovery plans in place to prepare for incidents like these. It also reminds us of the importance of managing a broader community response.”
For more cybersecurity news, click here