CWSandbox - Behavior-based Malware Analysis

Short summary

Malicious software artifacts like viruses, worms and bots are currently one of the largest threats to the security of the Internet. Upon discovery, such malware must be analyzed to determine the danger which it poses. Because of the speed in which malware spreads and the large number of new malware samples which appear every day, malware analysis calls for automation. CWSandbox is an approach to automatically analyze malware which is based on behavior analysis: malware samples are executed for a finite time in a simulated environment, where all system calls are closely monitored. From these observations, CWSandbox is able to automatically generate a detailed report which greatly simplifies the task of a malware analyst.

Motivation

Software artifacts that serve malicious purposes are usually termed malware. Particularly menacing is malware that spreads automatically over the network from machine to machine by exploiting known or unknown vulnerabilities. Such malware is not only a constant threat to the integrity of individual computers on the Internet. In the form of botnets for example that can bring down almost any server through distributed denial of service, the combined power of many compromised machines is a constant danger even to uninfected sites.

Malware is notoriously difficult to combat. Usually, security products such as virus scanners look for characteristic byte sequences (signatures) to identify malicious code. However, malware has become more and more adept to avoid detection by changing its appearance, for example in the form of poly- or metamorphic worms. The rate at which new malware appears on the Internet is also still very high. Furthermore, flash worms pose a novel threat in that they stealthily perform reconaissance for vulnerable machinces for a long time without infecting them, and then all of a sudden pursue a strategic and coordinated spreading plan by infecting thousands of vulnerable machines within seconds.

In the face of such automated threats, security researchers cannot combat malicious software using traditional methods of decompilation and reverse engineering by hand.
Automated malware must be analyzed:

  1. Automatically
  2. Effectively
  3. Correctly

Automation means that the analysis tool should create a detailed analysis report of a malware sample quickly and without user intervention. A machine readable report can in turn be used to initiate automated response procedures like updating signatures in an intrusion detection system, thus protecting networks from new malware samples on the fly. Effectiveness of a tool means that all relevant behavior of the malware should be logged, no executed functionality of the malware should be overlooked. This is important to realistically assess the threat posed by the malware sample. Finally, a tool should produce a correct analysis of the malware, i.e., every logged action should in fact have been initiated by the malware sample to avoid false claims about it.

CWSandbox is a tool for malware analysis that fulfills the three design criteria of automation, effectiveness and correctness for the Win32 familiy of operating systems:

  1. Automation is achieved by performing a dynamic analysis of the malware. This means that malware is analysed by executing it within a simulated environment (sandbox), which works for any type of malware in almost all circumstances. A drawback of dynamic analysis is that it only analyses a single execution of the malware. This is in contrast to static analysis in which the source code is analysed, thereby allowing to observe all executions of the malware at once. Static analysis of malware, however, is rather difficult since the source code is commonly not available. Even if the source code were available, one could never be sure that no modifications of the binary executable happened, which were not documented by the source. Static analysis at the machine code level is often extremely cumbersome since malware often uses code-obfuscation techniques like compression, encryption or self-modification to evade decompilation and analysis.
  2. Effectiveness is achieved by using the technique of API hooking. API hooking means that calls to the Win32 application programmers' interface (API) are re-routed to the monitoring software before the actual API code is called, thereby creating insight into the sequence of system operations performed by the malware sample. API hooking ensures that all those aspects of the malware behavior are monitored for which the API calls are hooked. API hooking therefore guarantees that system level behavior (which at some point in time must use an API call) is not overlooked unless the corresponding API call is not hooked.

    API hooking can be bypassed by programs which directly call kernel code in order to avoid using the Windows API. However, this is rather uncommon in malware, as the malware author needs to know the target operating system, its service pack level and some other information in advance. Our empirical results show that most autonomous spreading malware is designed to attack a large user base and thus commonly uses the Windows API.

  3. Correctness of the tool is achieved through the technique of DLL code injection. Roughly speaking, DLL code injection allows API hooking to be implemented in a modular and reusable way, thereby raising confidence in the implementation and the correctness of the reported analysis results.

The combination of these three techniques within the CWSandbox allows to trace and monitor all relevant system calls and generate an automated, machine-readable report that describes for example

Obviously, the reporting features of the CWSandbox cannot be perfect, i.e., they can only report on the visible behavior of the malware and not on how the malware is programmed. Using the CWSandbox also entails some danger which arises from executing dangerous malware on a machine which is connected to a network. However, the information derived from executing malware for even very short periods of time in the CWSandbox is surprisingly rich and in most cases sufficient to assess the danger originating from the malware.