Yandex data breach reveals source code littered with racist language

(Image credit: Getty Images)

published 30 January 2023

Russian tech company Yandex has issued an apology after racial slurs were discovered in source code leaked in a recent data breach.

Several references to racial slurs, including the ‘N-word’, were found in the company’s source code last week.

A researcher first revealed the use of offensive terminology in a series of posts on Twitter on 26 January, sparking heavy criticism.

In a statement, Yandex told IT Pro that an initial investigation showed that the leaked code "appears to be old fragments differing from the current version of the company’s repository".

The company added that leaked code "would never have affected any of the company’s services".

"We deeply regret that this word ever appeared in our internal codes," Yandex said. “It is unacceptable and a blatant violation of our corporate ethics."

"We are currently conducting an internal review to better understand how this happened, and will be taking appropriate measures, including to ensure that this does not happen again."

Yandex data breach leaked source code

The discovery follows a recent data breach at Yandex which saw 44.7 gigabytes of source code leaked on a popular online hacker site BreachForums.

Leaked files were found to contain code on a range of Yandex products. The company is one of Russia’s largest tech firms and provides email, advertising, cloud computing and online sales services.

Responding to the breach, Yandex insisted that its systems were not hacked, but attributed the leak to a former employee.

RELATED RESOURCE

Cost of a data breach report 2022

Discover the factors to help mitigate breach costs

FREE DOWNLOAD

In a blog post detailing the scale of the leak, security researcher Arseniy Shestakov said the exposed files date back to February 2022, coinciding with the Russian invasion of Ukraine.

While Shestakov said the leaked files included source code for a range of services, they did not contain sensitive user data.

"Since this leak only contains contents of git repositories there is no personal data," he wrote. "There are at least some API keys, but they are likely only been used for testing deployment only."

Racist coding

Racial slurs were dotted throughout Yandex's leaked Git codebase. They were used in function and variable names, printed messages, and other places throughout configuration files.

Programmers frequently use specific terms or names to enable other developers to understand what function or action a certain line of code performs.

The use of easy-to-read terms is a common approach which helps cut the time required for engineers to potentially modify or update code.

In this instance, Yandex developers appear to have substituted a generic term for a function with offensive language.

Exactly why these terms were included is unclear. However, the use of offensive language in code is a violation of both best practice and, as Yandex pointed out in its statement, against its code of ethics.

Yandex did not provide additional information on why the ‘N word’ was used in this case, but onlookers noticed it seemed to have also been used to replace 'workers' in various parts of its codebase.

TOPICS

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.