In response to legal actions, defendants—including Meta, OpenAI, and Bloomberg—have argued that their actions fall under the umbrella of fair use. Interestingly, the plaintiffs voluntarily dismissed a case against EleutherAI, the entity responsible for initially scraping and publicly sharing books.
However, litigation in the remaining cases is still in its early stages, leaving unresolved questions about permission and compensation. Although "The Pile" dataset has been removed from its official download site, it remains accessible through file-sharing services.
DiCello Levitt consumer protection attorney and partner, Amy Keller said:"Technology companies have acted with impunity. People are rightly concerned about not having a say in the matter. That's the crux of the issue."
Creators now find themselves uncertain about the road ahead. Full-time YouTubers diligently monitor unauthorised use of their content, frequently issuing takedown notices. However, there's growing concern that AI could soon generate content akin to their own—or even outright copy it.
David Pakman, creator of "The David Pakman Show," recently encountered the power of AI while scrolling through TikTok. He stumbled upon a video labelled as a Tucker Carlson clip, but upon closer inspection, it turned out to be a replica of what Pakman himself had said on his YouTube show. The voice, the words—everything matched. Pakman expressed alarm that only one of the video's commenters recognised it as a fake—a voice clone of Carlson reading Pakman's script.
"This is going to be a problem," Pakman warned in a YouTube video addressing the issue. "Essentially, anyone can be replicated."
Sid Black, cofounder of EleutherAI, revealed that he created "YouTube Subtitles" using a script. This script downloads subtitles from YouTube's API like a viewer's browser does while watching a video.
Black's search terms—495 in total—included phrases like "funny vloggers," "Einstein," "black protestant," "Protective Social Services," "infowars," "quantum chromodynamics," "Ben Shapiro," "Uighurs," "fruitarian," "cake recipe," "Nazca lines," and "flat earth."
Although YouTube's terms of service prohibit accessing videos via "automated means," over 2,000 GitHub users have bookmarked or endorsed Black's code.
Machine learning engineer Jonas Depoix, who published the code on GitHub, noted, "There are ways YouTube could prevent this module from functioning if they chose to." However, this hasn't happened.
Google spokesperson Jack Malon stated that the company has taken action over the years to prevent abusive, unauthorised scraping. However, he remained silent on other companies' use of such material for AI training.
AI companies employ more than 146 videos from the channel Einstein Parrot, which boasts nearly 150,000 subscribers.
The African grey parrot's caretaker, Marcia (who preferred not to disclose her last name for the bird's safety), found it amusing that AI models had absorbed the words of her mimicking her feathered friend.