Too much data can cause an AI model to collapse

Published in AI

Too much data can cause an AI model to collapse

by Nick Farrell on26 July 2024

font size decrease font size increase font size
Print
Email

Scraping from other models is a bad thing

A new study published in Nature has found that training AI models using datasets created by other AI models can lead to “model collapse,” where the models start producing increasingly nonsensical outputs over time.

One model began with a text about European architecture in the Middle Ages and, by the ninth generation, was talking nonsense about bunnies.

The research, led by Ilia Shumailov from Google DeepMind and Oxford, discovered that AI might miss less common lines of text in training datasets.

This means that models trained on the output of earlier models can’t carry forward those nuances, creating a recursive loop.

Duke University assistant professor Emily Wenger said that in a system generating images of dogs the AI model will focus on recreating the most common dog breeds in its training data, so it might over-represent Golden Retrievers compared to the Petit Basset Griffon Vendéen.

“If later models are trained on an AI-generated dataset that over-represents Golden Retrievers, the problem gets worse. After enough cycles, the model will forget about less common breeds like the Petit Basset Griffon Vendéen and only generate pictures of Golden Retrievers. Eventually, the model will collapse and be unable to generate meaningful content.”

While she admits that having too many Golden Retrievers might not be bad, the collapse process is a severe issue for producing meaningful and representative outputs that include less common ideas and writing styles.

“This is the problem at the heart of model collapse,” she said.

Last modified on 29 July 2024

Rate this item

(0 votes)

Tagged under

ai
data

More in this category: « Zuckerberg: Open Source AI would be better for the US Video game performers strike again over AI »

Too much data can cause an AI model to collapse

Most popular - Notebooks

Latest comments

Read more about: