Shadow libraries have increasingly faced legal pressure to either stop pirating books or risk being shut down or driven to the dark web. Among the biggest targets are Z-Library, which the US Department of Justice has charged with criminal copyright infringement, and Library Genesis (Libgen), which was sued by textbook publishers last fall for allegedly distributing digital copies of copyrighted works "on a massive scale in willful violation" of copyright laws.
Nvidia seemed to defend the shadow libraries as a valid source of information online when responding to a lawsuit from book authors over the list of data repositories that were scraped to create the Books3 dataset used to train Nvidia's AI platform NeMo.
The authors argued that that list includes some of the most "notorious" shadow libraries -- Bibliotik, Z-Library (Z-Lib), Libgen, Sci-Hub, and Anna's Archive. However, Nvidia hopes to invalidate authors' copyright claims partly by denying that these controversial websites should even be considered shadow libraries.
"Nvidia denies the characterisation of the listed data repositories as 'shadow libraries' and denies that hosting data in or distributing data from the data repositories necessarily violates the US Copyright Act," Nvidia's court filing said.
The chipmaker did not go into further detail to define what counts as a shadow library or what potentially absolves these controversial sites from key copyright concerns raised by various ongoing lawsuits. Instead, Nvidia kept its response brief while curtly disputing the authors' petition for class-action status and defending its AI training methods for fair use.
"Nvidia denies that it has improperly used or copied the alleged works," the court filing said, arguing that "training is a highly transformative process that may include adjusting numerical parameters including 'weights,' and that outputs of an LLM may be based, at least in part, on such 'weights.'"
"Nvidia's argument likely depends on the court agreeing that AI models ingesting published works in order to transform those works into weights governing AI outputs is fair use," notes Ars. "However, authors have argued that 'these weights are entirely and uniquely derived from the protected expression in the training dataset' that has been copied without getting authors' consent or providing authors with compensation."
"Authors suing Nvidia have taken the next step, linking the chipmaker to shadow libraries by arguing that 'these shadow libraries have long been of interest to the AI-training community because they host and distribute vast quantities of unlicensed copyrighted material. For that reason, these shadow libraries also violate the US Copyright Act.'"