Rechercher
Contactez-nous Suivez-nous sur Twitter En francais English Language
 











Freely subscribe to our NEWSLETTER

Newsletter FR

Newsletter EN

Vulnérabilités

Unsubscribe

Breaking News - Comment on Crowdstrike IT Outage - Richard Ford, Integrity360

July 2024 by Richard Ford, CTO at Integrity360

The comment below in response to the Crowdstrike outage as sourced from Richard Ford, CTO at Integrity360.

"Crowdstrike has had a catastrophic error that has taken a large percentage of the global IT systems offline. On the one hand it’s shown how large Crowdstrike’s market share is, but it’s also shown how fragile the interconnected world we live in can be. This issue has grounded airlines, halted broadcasters and taken channels offline, and, at the most critical end, severely impacted emergency services. In this instance a small change has led to a huge global impact, and the questions will be how and why it happened. Crowdstrike were very bullish in their mission statement: "We Stop Breaches". Unfortunately, this time, they’ve created the outage.

The Crowdstrike ecosystem revolves around a single agent deployment to deliver their portfolio of security solutions, which operates permanently online, connected to their SaaS-based management platform. In a world where threats are constantly evolving, and we need to move quickly and often to counter them, this approach really works and has become the industry norm. Updates are delivered directly to the endpoint agents as they become available ensuring systems have the real-time protection they need. The downside, and what has happened with Crowdstrike today, is that a bad update can have wide ranging ramifications.

With this specific issue, as it does all the time, Crowdstrike has pushed what it refers to as a Channel File, which would likely include updates to their threat detection definitions to all Crowdstrike agents. This file, when processed by the Crowdstrike agent running on a Windows device, causes the agent to dramatically crash, creating a Blue Screen of Death (BSOD) and restarting the machine. Unfortunately, this file is run by the agent during system boot, crashing the system and repeating the process creating a restart loop.

The fix is relatively trivial. Once the agent is online it will just download the fixed Channel File. The challenge, is getting it online, and this is no small feat. As the system crashes and reboots before getting online, in many cases it requires manual intervention to fix as the users will need to end a special administrative mode before the system boots and use the command line to search for and delete the file. As many users aren’t IT experts or won’t be old enough to remember the days of MS-DOS, this will be entirely new to them, and a nightmare for IT teams to orchestrate.

But, unfortunately, it gets worse. Best security practice to protect your data is to have your data stored on the system encrypted at the hard drive level. This prevents data being directly extracted from the drive, should it be stolen, but, importanly, also protects the boot process. The knock-on effect is that to access the administrative mode for systems with drive encryption (provided as part of Microsoft Windows), systems will need to be put in recovery mode and may require a recovery key, unique to that system, in order to implement the fix. The recovery process is going to be long and really test IT teams and resilience of organisations.

Questions need to asked about how this has happened. Is this the product of agile, CI/CD (Continuous Integration/Continuous Delivery) software development? If you’re introducing an update, even to an external file, has this not been thoroughly tested through a QA process? The widespread impact calls this into question. Or, if it has gone through the testing & QA process, as the file been subverted further along in the process by a threat actor? There’s absolutely evidence currently that this is the case but we only need to look at the SolarWinds breach to see evidence of this happening in the past. These are all questions Crowdstrike will need to answer over the coming days and weeks.

Lastly, we should also think about the current approach to security, and the unification of technologies and vendors. The move to XDR, for example, is putting a lot of eggs in a single basket. Will the impact of this incident cause the more risk adverse organisations to distribute their security controls and risk across more vendors and segment areas of the business? Potentially, but all will definitely be acutely aware of the trust we put into vendors an our recovery plans. CrowdStrike haven’t been alone in this, with Microsoft only this week confirming a major outage for Microsoft 365 caused by a configuration change. A not so gentle reminder that Availability is equally as important as Confidentiality and Integrity in the CIA security triad."


See previous articles

    

See next articles












Your podcast Here

New, you can have your Podcast here. Contact us for more information ask:
Marc Brami
Phone: +33 1 40 92 05 55
Mail: ipsimp@free.fr

All new podcasts