Old Dog New Tricks: Attackers Adopt Exotic Programming Languages By the BlackBerry Research & Intelligence Team
July 2021 by BlackBerry
An examination into the trend by threat actors and security researchers alike of leveraging new and uncommon programming languages to evade detection and hinder analysis.
Eric Milam, VP of Threat Research at BlackBerry
“Malware authors are known for their ability to adapt and modify their skills and behaviors to take advantage of newer technologies. This has multiple benefits from the development cycle and inherent lack of coverage from protective solutions. This paper will look into less prolific programming languages and their use in the malware space. It is critical that industry and customers understand and keep tabs on these trends, as they are only going to increase.” explain Eric Milam, VP of Threat Research at BlackBerry.
Malware authors have a reputation for being slow to change what works for them. But this is not always the case. Some malware groups have taken the opportunity to branch out and try new or “exotic” programming languages to try to evade detection by the security community, or address specific pain-points in their development process.
The BlackBerry Research & Intelligence Team chose four uncommon programming languages of interest to examine: Go, D, Nim, and Rust. This choice was due in part to our detection methodology. We’ve identified an increase in their use for malicious intent, and we have seen an escalation in the number of malware families being identified and published that use these languages.
These languages have also piqued our interest because they could be considered more developed, and they have a strong community backing. While this trend is nothing new, BlackBerry aims to shed light on the state of the current threat landscape regarding these new and emerging languages. In our new white paper, we’ll cover the reasons for their adoption, as well as what areas we expect to see a further uptick in, as this trend enters its next evolution.
And perhaps most importantly, we’ll discuss ways both private individuals and corporations can address these growing risks. Key Findings
• More Loaders and Droppers are Being Written in Newer Languages
The BlackBerry Research & intelligence Team has been seeing a growing number of loaders and droppers written in uncommon languages. These new first-stage pieces of malware are designed to decode, load, and deploy commodity malware such as the Remcos and NanoCore Remote Access Trojans (RATs), as well as Cobalt Strike. They have been commonly used to help threat actors evade detection on the endpoint.
• Why Use a New Programming Language?
New programming languages are typically adopted because they improve upon a deficit in an existing language. Their creators could be in search of simpler syntax, performance boosts, or more efficient memory management. Or the nature of the new language could better suit the environment it is to be used within (e.g., Internet of Things devices use lower-level languages such as C or assembly). The user-friendly nature of some languages can also drastically improve both ease of development and the quality of life of the developer (e.g., the pip package manager for Python, or npm for Node.JS).
• Using Uncommon Languages Can Hamper Reverse-Engineering Efforts
The use of certain languages can significantly hamper reverse-engineering efforts. Malware analysis tooling does not always adequately support exotic programming languages. This can make analysis efforts a more tedious experience. Binaries written in the languages of focus within our white paper (Go, Rust, Nim, and DLang) can appear more complex, convoluted, and tedious when disassembled, compared to their traditional C/C++/C# based counterparts.
• Uncommon Languages Can Thwart Existing Signature-Based Detection:
Signature-based detection of malware depends on specific static characteristics being present within a file. These are qualities about the file that do not change and that do not require the file to be executed for some to visualize them. Hashes are an example of a static characteristic, which requires each byte to be identical within the target scope (i.e., a hash of the whole file, or a hash of a certificate, etc.). When malware is authored in a new language, as opposed to what has been seen traditionally (e.g., BazarLoader being rewritten in Nim), signatures written to detect the previous iteration will more than likely not match.
• New Languages May Appear to Add Layers of Obfuscation to Malware
In the case of more uncommon programming languages, the language itself can act almost as a layer of obfuscation. It is because of their relative youth and obscurity, that the languages themselves can have a similar effect to traditional obfuscation and can be used to attempt to bypass conventional security measures and hinder analysis efforts.
• Cross Compilation – One Malware to Many Machines
Modern-day organizations use a mixture of Windows® and MacOS® across departments for typical end-user work. Cross-compilation provides an attacker the option of authoring the same malware variant (containing the same or similar functionality) in one language and having it cross-compiled to target different architectures and operating systems. This would allow them to potentially cut down on the number of tools required to meet their goals, and to widen the net of any malicious campaign.
• New Languages Rejuvenate Old Malware
Older malware written in traditional languages like C++ and C# is actively being given new life with droppers and loaders written in exotic languages. Typically, the older malware will be stored in encrypted form within the first stage, using XOR, RC4, AES or other methods of encryption and encoding. Once decoded, the binary is dropped to disk and executed (by a dropper) or injected into a running process and loaded into memory (by a loader). This is an attractive proposition for a threat actor, as they do not need to go to the lengthy effort of recoding the malware and instead can “wrap” it in one of these delivery methods.
• Some Older Malware is Being Completely Rewritten in New Languages
While wrappers and loaders are more cost effective, some well-resourced threat actors are beginning to rewrite their existing malware using exotic languages. Examples of this are the switch from BazaLoader to NimzaLoader, and from Buer to RustyBuer. The pseudonyms are used to track them by referencing the name of the language.
• It’s All-Go in ‘Go’ for Threat Actors
Based on BlackBerry’s research and current trends within the current threat landscape, it appears that Go has matured to the point where it is now one of the "Go-to" languages for threat actors. This is both at the APT and commodity level for the development of malware variants. This assumption is based upon the fact that new Go-based samples are now appearing on a semi-regular basis, including malware of all types, and targeting all major operating systems across multiple campaigns.
• DLang on the Slow Burn
Findings by the BlackBerry Research & Intelligence Teams show that DLang malware appears to be the least used language within the threat landscape, despite its adoption by several industry players over the last few years. However, it’s worth noting that DLang has seen an uptick in use over the last year throughout 2021, in terms of the development of numerous types of malware. This could be the beginning of a new trend of DLang adoption within the threat landscape.
• Cobalt Strike + Go/Nim Pose Future Threat
The BlackBerry Research & Intelligence Team has seen a large uptick in use of initial stagers for Cobalt Strike being compiled using Go, and more recently in Nim. These initial stagers are the binary used to facilitate first-stage, initial access by reaching out to download the Cobalt Strike beacon from a TeamServer. This server is responsible for serving the beacons themselves. It is important that defenders stay ahead of the curve in catching Cobalt Strike-related files written in these languages, to enhance defensive capability against such a formidable threat.
Technological advancements are one of the driving factors in modern society. New technologies can revolutionize lives, improve efficiency at an incredibly large scale, and permanently alter the status quo of society. They also have the capacity to be misused by bad actors with ulterior motives, or they can be turned against the very purpose for which they were created. For example, while the concept of email had been around since the advent of ARPANET in the 1970s, it didn’t reach mainstream adoption until the explosion of the Internet in the mid-to-late 90s. With it came a deluge of email abuses, such as the ILOVEYOU computer worm in early 2000, which ran rampant and affected an estimated 10% of all Internet-connected computers at the time. While not exclusive to computer science, this trend of abusing new technology has been observed repeatedly with both new and uncommon programming languages. While the initial motivation for the creation of new programming languages is to achieve an improvement on existing languages and technologies, it is almost an eventuality that they will also be misappropriated by individuals or groups for malicious use. That could happen through security researchers creating a new proof of concept to help prevent future threats, or a threat actor using the new language to develop a new malware variant. From the use of Delphi and VB6 as a wrapper layer of malware, to a rewrite of the now-infamous BazarLoader (named ’NimzaLoader’) in the Nim programming language, we’ve seen history repeat itself.
And we ask – why is this the case?
Benefits and Drawbacks of Using Uncommon Programming Languages
Each programming language has their own benefits and drawbacks for different scenarios: C is not object oriented, whereas C++ is. C++ is strongly typed, whereas Python isn’t. Python is great for data science, but it is a less than ideal choice for devices with limited performance. In non-software-engineering terms, each language has areas of application where they excel, and areas where they fail.
When choosing a language, a developer must weigh options such as the target environment, syntax, purpose, and suitability of the language to the problem at hand. Furthermore, memory management, static vs. dynamic linking, and codebase extensibility should all be major considerations as well as many others.
On the positive side, the newer programming languages often come with a higher degree of security consideration, offering features such as memory-safe programming by design. This can protect the developer from introducing easily overlooked security holes that can result in memory-related bugs and vulnerabilities.
Additionally, the use of new languages can help to demonstrate that an individual, a development team, or company is on the technological cutting edge. It shows that they are using the most modern, most efficient, and most productive means of developing their products. However, this can come at a cost – be it financial or temporal.
Much like in the business world, developers with experience in these languages are hard to come by, and they can garner a higher salary. This increases the overhead for such a project.
In a similar vein, training existing developers to write code in these languages can be a significant time investment. This is not always the case, but in a tight development pipeline, this can still cause a deficit.
Within the threat landscape, these rules also apply, but there are still more reasons why security researchers and threat actors alike could benefit from using these uncommon languages. Why would threat actors be conscious of using these more secure languages, you might ask? Well, the answer is quite simple – they don’t want to leave themselves open to exploitation.
This was recently demonstrated in the case of ‘EmoCrash’, where security researcher James Quinn discovered that the infostealer malware Emotet was vulnerable to a buffer overflow within the installation routine of the main binary. In doing so, Quinn developed EmoCrash to leverage this and act as an Emotet ‘vaccine’, preventing installation of the malware in the first place. Uncommon Language Adoption by Threat Actors
There has been some notable malware written in Go, Rust, Nim, and DLang since their inception; most of what has been found is written in Go. These uncommon programming languages are no longer as rarely used as once thought. Threat actors have begun to adopt them to rewrite known malware families or create tools for new malware sets.
While C-language malware is still the most widespread, threat actors such as APT28 and APT29 have been using these unconventional programming languages in their malware sets more often than other groups. APT28 or Fancy Bear is a Russian state-sponsored group that has been operating since 2004. The group has frequently made headlines worldwide and is most notably known for allegedly infiltrating the United States’ Democratic National Committee in an attempt to influence the 2016 presidential election. APT28 has been involved and associated with a wide range of attacks and malware families, but the Zebrocy malware family notably uses multiple uncommon programming languages within its kill chain.
The first sample of Zebrocy seen in 2015 was made up of three components: a Delphi downloader, an AutoIT downloader, and a Delphi backdoor. Regardless of the programming language Zebrocy has been written in, the malware is spread through phishing campaigns that contain an initial Trojan which will attempt to communicate with a C2 server and execute a downloader to drop a malicious payload via an established backdoor. Though the malware has seen multiple rewrites and has evolved over time, the method of delivery via email attachment and general functionality remains largely the same.
In 2018, analysts linked a Go-based Trojan to APT28 and identified it as a rewritten version of the original Zebrocy Delphi downloader. In the years following, most recently in 2020, Go has proven to be an APT28 favorite, as the other core components of Zebrocy – the backdoor payload and downloader – were also found rewritten into Go.
In 2019, a Nim downloader, was found alongside the Go backdoor in the same Zebrocy campaign targeting embassies and ministries of foreign affairs in Eastern Europe and Central Asia. The group is still active and was last seen using the COVID-19 pandemic as a lure to deliver the Go downloader variant in late 2020.
Like APT28, APT29 (known as Cozy Bear) is also a Russian threat actor group found to be using Go in recent malware sets. The group is best known for their involvement in the SolarWinds compromise in early 2020. APT29 was seen targeting Windows and Linux machines in 2018 with WellMess, a RAT written in Go and .NET.
The Go version of WellMess is the most prevalent and comes in both 32-bit and 64-bit variants as PE and ELF files, giving APT29 the ability to deploy it to more than one type of architecture and OS. The group typically gains access to a victim’s network by performing vulnerability scans of an organization’s external IP addresses and using public exploits against the vulnerable systems they encounter.
In 2020, APT29 was seen using a more sophisticated versions of WellMess in attempts to steal information about COVID-19 vaccine development from multiple organizations located in the U.K., the U.S., and Canada. Although the newer variant is still written in Go, the threat group has added more complexity to the malware, including more network communication protocols and the ability to run PowerShell scripts post-infection.
Both threat actors are still active and have conducted some of the most impactful Russian cyberattacks to date. Recent activity suggests that these groups have been using the uncommon programming languages mentioned in this paper to add complexity to their malware, target multiple platforms, and evade detection.
Language Breakdowns: One Timeline to Bind Them All
Below is a timeline of some prominent examples of malware written in these languages throughout the last decade. This illustrates the uptick in their usage, particularly that of Rust, Nim, and D. (It is worth noting that this is not an exhaustive list of malware families developed in these languages).
Figure 1: Timeline of prominent examples of malware written in the languages of Go, Rust, Nim, and DLang.
Security Community Adoption of Uncommon Languages
Developers and threat actors are not the only groups capitalizing on the popularity and benefits of these newer programming languages. In recent years, the security community has also adopted these languages and used them for their offensive advantages in implementations such as Red Team tools. Many of these tools are open-sourced or publicly available. They reference features such as cross-compilation and efficiency in their repositories as motives behind using these more uncommon languages.
In December 2020, FireEye reported that a sophisticated threat actor had gained unauthorized access to their Red Team tools. As a countermeasure, FireEye publicly released a statement along with a GitHub repository containing detection signatures to help identify the stolen tools. In this repository, FireEye revealed that their Red Team had been using a combination of specially modified, publicly available tools as well as tools that were created in-house for their team. These were written in various languages including Go, DLang, and Rust.
Signatures for existing malware families that are based off static properties have little success in tagging the same malware once rewritten in these more obscure languages. In situations such as Buer and RustyBuer (as well as BazarLoader and NimzaLoader), new rules usually must be created to tag these tangentially related variants.
So, if static signatures are being broken each time a malware family is rewritten, is there much we can do to tag them?
We have a greater chance at catching these multi-language malware families using dynamic or behavioral signatures, signatures that tag behavior via sandbox output, or EDR or log data. These techniques can be far more reliable in these instances.
While the codebase could be ported over to this new language and thus break the static indicators, the actions of the malware can often stay the same. This is especially true in situations where the malware is re-coded. In other circumstances such as shellcode loaders, which often inject into processes using a limited subset of Windows API calls, they can be identified using that limited subset.
The languages investigated in this report have bindings which allow them to interface with the Win32 API and use these API calls. In essence, they can use an almost-identical methodology to that of more traditional languages such as C++. This is not always the case, as particular languages can use their own APIs in place of Win32 APIs. For example, they could use cryptographic libraries that would restrict the visibility of certain events. However, the use of these libraries within a binary can often be “signaturized” too.
By taking a step back from the implementation and looking at the core concept of how these pieces of malware interact with the system, threat researchers and software engineers alike can create more implementation-agnostic detection rules to be able to tag these dynamic behaviors if static signatures fail. This is not to say that dynamic signatures trump their static ilk by any means. Both are now necessary to have a comprehensive detection capability on the endpoint and beyond, and they should be used accordingly.
Does Adoption in Industry Mirror Adoption in the Threat Landscape?
Since the dawn of computing itself, the success or failure of a new language depends upon its adoption within legitimate business.
A "thumbs up" from any industry titan can be significant to the language’s adoption into the mainstream. As has been frequently observed with new programming languages and associated technologies, the rest of industry tends to follow where the industry titans lead.
This is not always the case, however. Many cutting-edge startups leverage new technologies, which can (at least eventually) inversely influence market leaders.
Malware developers also contribute, inadvertently, to the growing trend. Being the first to break ground by pioneering their product (in this case, a new malware variant) in new and uncommon languages can be just as much of a goal and an ambition to a threat group as it would be to a legitimate business. This can mean a greater level of kudos and reputation gain for the developer, regardless of the color of the hat that they wear.
Another aspect to consider is that analysis tools and techniques are typically not developed by the security industry until there is a certain level of saturation of malware being written in a new language. Even if a language begins to pick up adoption within the business world, it can take time for the analysis tooling to reach a point where they are able to process these new languages in an adequate fashion, if they ever do.
These languages can come with several improvements once they’re adopted into the software development lifecycle of a threat actor. While this might sound bad for researchers, the inverse is also true. By using these languages for enhanced detection evasion, or for quality-of-life improvements, they also inadvertently aid us in our hunt for malicious samples.
Due to the relatively low number of compiled binaries in these languages, it is arguably easier to identify malicious samples.
This report is intended to add new insight to the existing work of the security community on the topic of less-common programming languages and their application in malicious software and threat actor campaigns. It is important for defenders to further the discussion on the risk and effects of not defending against parts of the threat landscape that could seem obscure.
Programs written using the same malicious techniques but in a new language are not usually detected at the same rate as those written in a more mature language. The loaders, droppers, and wrappers previous discussed, are in many cases simply altering the first stage of the infection process rather than changing the core components of the campaign.
This is the latest trend in threat actors moving the line just outside of the range of security software in a way that might not trigger defenses in later stages of the original campaign.
This discrepancy in detections can be attributed to many factors. A smaller sample set for product testing, training, and improvement, along with a lack of supporting tooling, are part of the equation. Many features that analysts and researchers have come to enjoy, and at times rely on for binary analysis, are simply not available during the early stages of a language’s adoption. The limited usage of these more modern technologies in comparison to more mature workflows, does not lend itself to an outpouring of market support, but these threats are active and continue to have a very real impact.
Malicious binaries written in languages like D, Rust, Go, or Nim currently comprise a small percentage of the languages being used by bad actors in the world today, but it is imperative that the security community stay proactive in defending against the malicious use of emerging technologies and techniques.