An unknown attacker slipped a malicious binary into the PyTorch machine learning project by registering a malicious project with the Python Package Index (PyPI), infecting users’ machines if they downloaded a nightly build between Dec. 25 and Dec. 30.
The PyTorch Foundation stated in an advisory on Dec. 31 that the effort was a dependency confusion attack, in which an unknown entity created a package in the Python Package Index with the same name, torchtriton, as a code library on which the PyTorch project depends. The malicious library included the functions normally used by PyTorch but with a malicious modification: It would upload data from the victim’s system to a server at a now-defunct domain.
The malicious function would grab a variety of system-specific information, the username, environment variables, a list of hosts to which the victim’s machine connects, the list of password hashes, and the first 1,000 files in the user’s home directory.
“Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository,” the advisory stated. “This design enables somebody to register a package by the same name as one that exists in a third party index, and [the package manager] will install their version by default.”
The attack is the latest software supply chain attack to target open source repositories. In mid-December, for example, researchers discovered a malicious package disguised as a client from cybersecurity firm SentinelOne that had been uploaded to PyPI. In another dependency confusion attack in November, attackers created more than two dozen clones of popular software with names designed to fool unwary developers. Similar attacks have targeted the .NET-focused Nuget repository and the Node.js Package Manager (npm) ecosystem.
Same Name, Different Packages
In the latest attack on PyTorch, the attacker used the name of a software package that PyTorch developers would load from the project’s private repository, and because the malicious package existed in the PyPI repository, it gained precedence. The PyTorch Foundation removed the dependency in its nightly builds and replaced the PyPI project with a benign package, the advisory stated.
The group also removed any nightly builds that depend on the torchtriton dependency from the project’s download page and says it plans to take ownership of the torchtriton project on PyPI.
Fortunately, because the torchtritan dependency was only imported into the nightly builds of the program, the impact of the attack did not propagate to typical users, Paul Ducklin, a principal research scientist at cybersecurity firm Sophos, said in a blog post.
“We’re guessing that the majority of PyTorch users won’t have been affected by this, either because they don’t use nightly builds, or weren’t working over the vacation period, or both,” he wrote. “But if you are a PyTorch enthusiast who does tinker with nightly builds, and if you’ve been working over the holidays, then even if you can’t find any clear evidence that you were compromised, you might nevertheless want to consider generating new SSH key pairs as a precaution, and updating the public keys that you’ve uploaded to the various servers that you access via SSH.”
The PyTorch Foundation confirmed that users of the stable version of the PyTorch library would not be affected by the issue.
In a widely circulated mea culpa, the attacker claimed that they are a legitimate researcher and that the issue resulted from their investigation into dependency confusion issues.
“I want to assure that it was not my intention to steal someone’s secrets,” the person wrote, claiming to have notified Facebook on Dec. 29 of the issue and made reports to companies using the HackerOne crowdsourcing platform. “Had my intents been malicious, I would never have filled [sic] any bug bounty reports, and would have just sold the data to the highest bidder.”
Moreover, the impact of the attack could have exposed victims’ sensitive information, even if the person behind the malware had good intentions, Sophos’ Ducklin wrote in a blog post about the software supply chain attack.
“How is this a ‘false alarm’? ” he also said in a tweet. “This malware deliberately steals your data… and transmits it scrambled, not encrypted … so anyone on your network path who recorded it can trivially decode it.”