Learning from Crowdstrike

On Friday evening, I caught up with a fellow cybersecurity professional over dinner. Luckily none of us had to deal with the meltdown Crowdstrike had caused. But this caused much grief and frenetic hours for our IT administrators around the globe, who had to manually perform workarounds to recover affected Windows servers and clients.

Being the largest cybersecurity outage in history, it is no surprise that Crowdstrike has continued to hog the news and will likely continue to do so. But all of us who are in Team Cyber knows the most important order of business is to learn from the debacle and understand how we can do better. The real enemies are black hats who seek to wreak havoc on our systems.

Why Agent-Based Defence?

When I was younger, I remembered reading through virus definitions of Norton Antivirus. One of the more interesting types of viruses then were polymorphic viruses, which change their signature to avoid detection. But file-based viruses, as long as they have similar routines, can still be prevented, and quarantined. Nonetheless, the basic idea of obfuscation still persists today.

Surprisingly, even simple-sounding obfuscation techniques like implementing a Caesar Cipher routine worked three years ago, and could still work (PEN-300 opened the Pandora’s box to lots of us building payloads to evade AV engines).

Extracted from slides on antivirus evasion from a talk I gave at both Vulncon 2021 and Mystikcon 2021. Briefly, the idea is to show how even simple evasion techniques can reduce detection rates. Mystikcon slides available here but be aware 2021 techniques may no longer be as effective. You don’t need a paid copy of Cobalt Strike to do C2 work anonymously either; a free C2 such as Sliver will also do the trick.

One other means of running malware became extremely popular: reflection via Powershell. Typically, this involves some Powershell script being run (e.g. disguised as a system start-up script, images, or called by a VB script in an office macro). The Powershell script in question does not contain the original malware, but runs malware from a remote server that contains the malicious payload. How would an antivirus catch this?

Introduce “behavioural monitoring”.

To understand if some process is malicious, it would be advantageous to understand what constitutes “good” traffic and “bad” traffic, especially since we have no signatures to catch. To do that, a solution called an endpoint detection & response (EDR) capability was introduced.

Typically, EDRs are installed on workstations and servers. In fact, if you use a Windows machine that runs Windows Defender, you are already using an EDR. EDRs work by hooking onto various Windows APIs to monitor what they believe is “good” traffic or “bad” traffic based on the activities being done. Malicious activity typically gets caught when they abuse APIs that “good” traffic typically does not use, or when they perform certain actions inconsistent with what normal, baseline “good” traffic is.

On Hooking

Many EDR solutions hook in “user-land” (Ring 3). This means they have privileges like you and me: typical users on our workstations or servers performing normal tasks. If crashes happen in user-land, there are usually error-handling techniques to prevent the entire system from crashing but a kernel-land (Ring 0) crash will almost always lead to a kernel panic. In Windows, this means a BSOD. In fact, Apple decided, in 2021, that hooking beyond user-land was dangerous, and hence required “permissions” to hook in kernel-land.

The levels of trust. Ring 3 is typically “user-land” whereas Ring 0 is “kernel-land”. Most applications run in Ring 3, and the need to call any APIs with higher permissions are typically through APIs.

Much of the Internet is now abuzz over whether Crowdstrike uses hooks in kernel-land. A list of EDR vendors do hook at Ring 0, but identifying the exact hooks EDR vendors use is a cat-and-mouse game; EDR vendors routinely update their hooks, and would not release these publicly simply so that malware writers have a harder time writing malware to bypass EDR solutions. Usually, independent security researchers publish their findings (helpfully on Github) so we are aware of the hooks that might be used.

Kernel-land hooks are powerful. On one hand, they catch attacks such as attempted privileged escalation attacks. Having direct access to the kernel implies such EDRs can achieve administrative levels of control. But that also means a crash that is related to kernel-land will likely cause a BSOD; there is not always a graceful way to recover from such errors.

The good old saying from Spiderman, “With great power comes great responsibility” holds true. And when bugs in software trigger, cybersecurity vendors often receive much more backlash; cybersecurity measures are often viewed as “additional costs” rather than “risk reducers” to many businesses, and the often high level of privileges required for effective cybersecurity mitigations can also be used against itself to crash the system, or lead to privilege escalation vulnerabilities being discovered that puts the host at risk.

On Testing: No Model Answers

Security vendors like Crowdstrike typically have it harder than a typical application vendor. When a customer like a business purchases a cybersecurity solution, we typically demand timely security updates to deal with “zero-day” threats, which are attacks on vulnerabilities that are not known, at least publicly. The most infamous zero-day threat is Stuxnet, which set the Iranian nuclear programme back significantly.

As a security vendor that has received threat intelligence of zero-days being exploited in the wild, there are several questions you will need to answer:

  • What are the tools, techniques and procedures (TTPs) used by the attackers?
  • How exploitable are the vulnerabilities that your customers may have?
  • Are your customers currently being targetted?
  • How can we fix the situation?

The battle here is between operational risk and cybersecurity risk. The longer your processes take in releasing a security update for your customers, the smaller the operational risk (you can verify your updates don’t crash the system), but the larger the cybersecurity risk (your customers may not obtain the protection they pay you for). In extreme scenarios, you may even get whistle-blown for being too slow!

In that regard, CISOs are faced with a similar dilemma as the security vendors: balancing between the speed at which the patch should be deployed, and the extent to which testing should be done, except that CISOs consider their decisions based on their operating environment. Sometimes, there may only be wrong answers.

An Important Lesson: Cyber Threat Intelligence

Now that the Crowdstrike incident has spilt, the threat landscape will change. Here are some changes:

  • Threat actors, just by reading the news, can pin-point companies that use Crowdstrike.
  • IT administrators, in the search of quick fixes, could fall for phishing websites that purport to deliver Crowdstrike hotfixes.
  • IT administrators may disable Crowdstrike entirely without consulting the cybersecurity team to provide adequate compensating controls whilst some hosts run without an EDR.

To some extent, the CISOs’ battle continues even after the IT administrators’ battles end. CISOs will likely have to instruct their threat hunting teams and SOC analysts to identify, based on cyber threat intelligence, whether or not the organisation was compromised through such means.

Other types of social engineering attacks include threat actors that may impersonate legal services to entice businesses to file legal claims against Crowdstrike for compensation.

An example of free threat intelligence one can obtain is a list of typosquat and look-alike domain names that are highly likely to be malicious. As there are likely many such domain names, and that the exact times the fallout will take to eventually peter out to be rather long, CISOs will likely fight a much longer, and arguably more silent battle which is much less covered by the media.

Another Important Lesson: A Narrower Risk Window

The crux of the Crowdstrike incident lies in how the cybersecurity business is never purely about cybersecurity risk, but that the speed of our cyber threat landscape has narrowed the acceptable range of opportunity between the speed in which security updates need to be delivered to mitigate APT groups that likely leverage zero-days and automate said attacks against a variety of targets (your firm could well become collateral damage as a result of an APT group’s mass attack), and the time provided for a security company (at least) to test the rigour of their patches/hotfixes.

In plainspeak, CISOs and security vendors now work with tighter margins of acceptable risk. Till the post-mortem for Crowdstrike is being released, we should not speculate on the exact cause. But cybersecurity practitioners must learn more about the operating environment they are in, and the constraints within that could endanger their organisations in accepting an unacceptable level of risk.

Leave a Reply

Your email address will not be published. Required fields are marked *

17 − 13 =