New Trojan Source Technique Lets Hackers Hide Vulnerabilities in Source Code (15th November 2021)

Ref# AL2021_41 | Date: Nov 15th 2021

Cambridge University researchers Nicholas Boucher and Ross Aitken have discovered malicious actors can manipulate a wide range of vulnerabilities to inject visually deceptive malware that is dangerous but alters the logic defined by the source code, effectively opening the door to more first-party and supply chain risks.


According to Cambridge University researchers, the strategy, dubbed “Trojan Source supply attacks,” “exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are legitimately encoded in a different order than the one in which they are displayed, leading to vulnerabilities that cannot be recognized directly by human code reviewers.”

These vulnerabilities CVE-2021-42574 and CVE-2021-42694 affect compilers for all popular programming languages, including C, C++, C#, JavaScript, Java, Rust, Go, and Python. Compilers are programs that translate high-level human-readable source code into lower-level representations such as assembly language, object code, or machine language which can then be executed by the operating system.

How it works

The problem revolves around Unicode”s bidirectional (or Bidi) algorithm, which supports both left-to-right (e.g., English) and right-to-left (e.g., Arabic) languages and includes bidirectional overrides, which allow writing left-to-right words inside a right-to-left sentence, or vice versa, forcing the left-to-right text to be treated as right-to-left.

While a compiler”s output is expected to correctly implement the source code provided to it, errors introduced by inserting Unicode Bidi override characters into comments and strings can enable a scenario where the display order of characters presents logic that differs from the real logic.

Stated differently, the attack uses the encoding of source code files to craft targeted vulnerabilities, rather than intentionally introducing logical bugs to visually reorder tokens in the source code, while rendered in a perfectly acceptable approach to trick the compiler into processing the code differently and drastically changing the

program flow. A typical example of this can be seen when making comments appear as if they were code.

The researchers theorize, “In effect, we anagram program A into program B.” “An attacker could add specific vulnerabilities without being detected if the change in logic is small enough to go undetected in future testing.”

The researchers warned that when unnoticeable software vulnerabilities introduced into open-source software make their way down the chain, there is a possibility that all the users affiliated with the software will be affected. These aggressive encodings can have a significant impact on the supply chain. Even worse, the Trojan Source attacks can become even more serious if an attacker uses homoglyphs to redefine pre-existing functions in an upstream package and activate them from a victim application.


While there is no patch to address this vulnerability, it is recommended that developers establish a defence in depth mechanism to counter this vulnerability by implementing the following:

  • Implement the use of static application security testing (SAST) which is a testing methodology that analyze source code to find security vulnerabilities that make your organization application susceptible to attack.
  • Remove all bidirectional control characters from comments. This is required to avoid errors in which the display order of characters presents logic that differs from the true logic.

The Guyana National CIRT recommends that users and administrators review this alert and apply it where necessary.
PDF Download: New Trojan Source Technique Lets Hackers Hide Vulnerabilities in Source Code.pdf