QUICKLOOK: GitHub Token Leak Exposes New York Times Source Code

Github Tokens are leaky: Massive Data Breach at New York Times: 270GB of Data Leaked

Jun 09, 2024

BLUF

A threat actor has leaked 270GB of internal New York Times data on the 4chan message board, including roughly 5,000 source code repositories and 3.6 million files. The attacker claims to have accessed the data using an exposed GitHub token in January 2024. While the Times confirmed the breach, it stated that its internal systems and operations were not impacted.

Abstract

The New York Times data breach reveals the critical importance of securing access tokens. This report analyzes the attack chain involving CVE-2024-28849, explores methods of obtaining GitHub tokens, and compares this incident to other similar breaches. The findings underscore the necessity for robust cybersecurity measures and examine how such attacks are part of a broader trend of exploiting development environments for cyber espionage and financial gain.

Introduction

The data breach at the New York Times, facilitated by an exposed GitHub token, highlights the vulnerabilities in token management and the potential impact of such breaches. GitHub tokens have previously been exploited in various attacks, such as the Mercedes-Benz breach, which saw similar tactics used. This report delves into the attack chain, the specific CVEs involved, and the broader implications for organizations using third-party code repositories. The intention behind such attacks often includes espionage, financial extortion, and gaining unauthorized access to sensitive data. By examining these tactics, techniques, and procedures (TTPs), we can better understand the evolving landscape of cyber threats.

Analyst Comments

The New York Times data breach highlights the critical importance of securing access tokens and closely monitoring third-party repositories. The attacker's ability to access a vast amount of data using a single exposed token underscores the far-reaching consequences of inadequate access controls. Organizations must implement strict policies around token management and limit permissions to prevent unauthorized access.

The scope of the breach suggests that the Times stored a significant amount of sensitive information on GitHub, emphasizing the need for careful consideration of data storage on third-party platforms. While the Times stated that its internal systems were not affected, the leak of source code and proprietary data could have long-term implications, such as vulnerability exploitation and reputational damage.

This breach appears to be part of a larger trend of incidents involving mishandled GitHub tokens, as seen in the recent Mercedes-Benz breach. Vulnerabilities like CVE-2024-28849, involving improper handling of authorization headers in redirects, further highlight the importance of constant vigilance and timely updates to security practices.

Key Findings:

Exposed access tokens: The attacker's ability to access such a vast amount of data using a single exposed GitHub token highlights the far-reaching consequences of inadequate access controls. Organizations must implement strict policies around token management, regularly rotating and revoking tokens, and limiting their permissions to prevent unauthorized access.
Scope of the breach: The sheer volume of data allegedly stolen - 5,000 repositories and 3.6 million files - suggests that the Times had a significant amount of sensitive information stored on GitHub. While the exact nature of the data remains unclear, the folder names hint at a wide range of assets, from IT documentation to proprietary code like the Wordle game.
Potential impact: Although the Times stated that its internal systems and operations were not affected, the leak of source code and other proprietary data could have long-term implications. Attackers could potentially exploit vulnerabilities in the code, gain insights into the Times' infrastructure and processes, or repurpose the code for malicious purposes. .
Similarities to other recent breaches: The Times breach closely follows a similar incident involving Disney, where an attacker claiming ties to the defunct Club Penguin game leaked internal data on 4chan. The use of 4chan as a platform to publicize and distribute stolen data appears to be a growing trend among cybercriminals, leveraging the site's anonymity and reach.
Importance of rapid response: The Times' confirmation that the breach occurred in January 2024 raises questions about the timeliness of breach detection and disclosure. Prompt identification, investigation, and remediation of security incidents are crucial to minimizing their impact and preventing further compromise.

Attack Chain Analysis of CVE-2024-28849:

Vulnerability Description: CVE-2024-28849 involves the follow-redirects package, which fails to clear the Proxy-Authorization header during cross-domain redirects, potentially leaking credentials.
Attack Method: The attacker intercepts or tricks a user into making a request with a redirect. If the Proxy-Authorization header is not cleared, credentials can be exposed to a redirected domain controlled by the attacker.
Mitigation: Update to version 1.15.6 or later, which clears the header during redirects.

Methods of Acquiring Tokens:

Code Repositories: Accidental commits of tokens to public repositories.
Phishing Attacks: Tricking users into revealing their tokens.
Intercepting Traffic: Capturing tokens from unsecured communications.
Exploiting Vulnerabilities: Extracting tokens from applications or services through software flaws.

From the Media:

According to a report by The Register, an anonymous 4chan user claims to have leaked 270GB of internal New York Times data, including roughly 5,000 source code repositories and 3.6 million files. The attacker allegedly accessed the data using an exposed GitHub token. The Times confirmed to BleepingComputer that the breach occurred in January 2024 after credentials for a cloud-based third-party code platform, later identified as GitHub, were exposed. However, the company stated that the breach did not affect its internal systems or operations.

The folder names in the leaked data suggest a wide range of information was compromised, including IT documentation, infrastructure tools, and source code, potentially even for the popular Wordle game. The attacker shared details on how to access the stolen files via peer-to-peer networks on 4chan.

This breach comes just days after a similar incident involving Disney, where an attacker linked to the defunct Club Penguin game leaked internal data on 4chan. These incidents highlight the growing trend of cybercriminals using 4chan to publicize and distribute stolen data.

Media Source

References:

Cyber Roundup

Discussion about this post