Published in AI

Stability AI messed up its own AI again

by on13 June 2024

Keeping US puritans happy creates monsters 

Stability AI’s Stable Diffusion 3 Medium has been borked by its desperate attempt to keep US puritans happy.

The much-awaited AI image-synthesis model that turns text prompts into AI-generated images has been ridiculed online because it will not think about human bodies in case they lead to erotic thoughts.

This 19th-century approach to the human body is a step backwards from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it easily produces wild, anatomically incorrect visual abominations.

A thread on Reddit titled, "Is this release supposed to be a joke? [SD3-2B]" details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread titled, "Why is SD3 so bad at generating girls lying on the grass?" shows similar issues, but for entire human bodies.

AI image fans blame the Stable Diffusion 3's anatomy failure on Stability's insistence on filtering adult content (often called "NSFW" content) from the SD3 training data that teaches the model how to generate images.

While this “censorship first” approach satisfies the moral codes of nuns, retired colonials, and religious loonies, it also prevents the model from understanding any human anatomy.

It is not as if Stability AI’ did not know this. The release of Stable Diffusion 2.0 in 2023 suffered from similar problems in depicting humans accurately. AI researchers soon discovered that censoring adult content that contains nudity also severely hampers an AI model's ability to generate accurate human anatomy.

 At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by excluding NSFW content.

"It works fine as long as there are no humans in the picture. I think their improved NSFW filter for filtering training data decided anything humanoid is NSFW," wrote a Redditor.

Any time a prompt hones in on a concept that isn't represented well in its training dataset, the image model will confabulate its best interpretation of what the user is asking for. And sometimes, that can be completely terrifying. Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those reported by others.

For example, the prompt "a man showing his hands" returned an image of a man holding up two giant-sized backward hands, although each hand had at least five fingers.

Last modified on 13 June 2024
Rate this item
(0 votes)