“Trojan Source” hides flaws in source code from humans
Organizations urged to take action to combat the new threat that could result in SolarWinds-style attacks
Security researchers have revealed a flaw in compilers that could add vulnerabilities to open source projects. Dubbed Trojan Source, the researchers said the attack was potent within the context of software supply chains, such as this year’s SolarWinds attacks.
“If an adversary successfully commits targeted vulnerabilities into open-source code by deceiving human reviewers, downstream software will likely inherit the vulnerability,” said researchers.
Researchers said the attack exploits subtleties in text-encoding standards, such as Unicode, to produce source code with logically encoded tokens that are in a different order from how they are displayed, leading to vulnerabilities.
“These visually reordered tokens can be used to display logic that, while semantically correct, diverges from the logic presented by the logical ordering of source code tokens,” said researchers.
They added that compilers and interpreters adhere to the logical ordering of source code, not the visual order.
Hackers can use multiple techniques to exploit the visual reordering of source code tokens, according to researchers.
The first technique is called “Early Returns.” This causes a function to short circuit by executing a return statement that visually appears to be within a comment.
The second is “Commenting-Out.” This causes a comment to visually appear as code, which in turn is not executed.
The truth about cyber security training
Stop ticking boxes. Start delivering real change.

Lastly, there are “Stretched Strings.” These cause portions of string literals to visually appear as code, which has the same effect as commenting-out and causes string comparisons to fail.
There is also a variant that uses homoglyphs, which are characters that appear nearly identical to letters.
“An attacker can define such homoglyph functions in an upstream package imported into the global namespace of the target, which they then call from the victim code,” said researchers.
This attack variant is tracked as CVE-2021-42694.
Researchers said to defend against such attacks, compilers, interpreters, and build pipelines supporting Unicode should throw errors or warnings for unterminated bidirectional control characters in comments or string literals, and for identifiers with mixed-script confusable characters.
“Language specifications should formally disallow unterminated bidirectional control characters in comments and string literals,” they added. “Code editors and repository frontends should make bidirectional control characters and mixed-script confusable characters perceptible with visual symbols or warnings.”
2023 Strategic roadmap for data security platform convergence
Capitalise on your data and share it securely using consolidated platforms

The Total Economic Impact™ of IBM Cloud Pak® for Watson AIOps with Instana
Cost savings and business benefits

Leverage automated APM to accelerate CI/CD and boost application performance
Constant change to meet fast-evolving application functionality
