Database behind AI-image generators found hiding a dark secret: Child sexual abuse content


Artificial Intelligence-based image generators are very popular across the internet. Throughout 2023, AI-generated images went viral, entertaining people. However, in a startling revelation, a study by the Stanford Internet Observatory uncovered thousands of child sexual abuse images hidden inside a depository used to train these AI generators. 

The discovery was made among the dataset of the LAION AI database. Researchers reportedly found over 3,200 images of suspected child sexual abuse hidden within the AI database, and more than 1,000 images were confirmed as child sexual abuse material.

The analysis was done in collaboration with the Canadian Centre for Child Protection and other anti-abuse charities. 

“We find that having possession of a LAION‐5B dataset populated even in late 2023 implies the possession of thousands of illegal images,” said the researchers. 

Which AI image generators used this database?

As per The Guardian, the database was used to train leading AI image-makers like Stable Diffusion. 

The report states that many text-to-image generators are in some way derived from the LAION database. However, it goes on to say that exactly which ones use it isn’t always clear.

Popular Artificial Intelligence maker OpenAI — behind DALL-E and ChatGPT, denied the use of LAION and said that it had fine-tuned its models to refuse any requests that may involve sexual content involving minors.

Imagen, a text-to-image model built by Tech giant Google, was built using LAION dataset, however, the company decided against making it public in 2022. This was because an audit of the dataset “uncovered a wide range of inappropriate content, including pornographic imagery, racist slurs, and harmful social stereotypes,” reports the publication. 

LAION’s response

LAION, or Large-scale Artificial Intelligence Open Network, a non-profit, swiftly responded by temporarily removing its datasets, emphasising it “has a zero-tolerance policy for illegal content.” 

It said that “in an abundance of caution, we have taken down the LAION datasets to ensure they are safe before republishing them.” 

The more than 3,200 images make up merely a fraction of the 5.8 billion images contained in the dataset. However, as per the Stanford researchers, their presence probably influences AI generators’ ability to generate harmful outputs. 

Such images, they say, facilitate the creation of realistic, explicit imagery, and enable the transformation of social media photos of clothed teens into nudes. In recent days, the latter has been a cause of concern among schools and law enforcement globally.

The Stanford Internet Observatory has advocated for drastic measures, urging those using LAION-5B datasets (containing over 5 billion image-text pairs) to delete or cleanse the material. Additionally, they suggest making older versions, like Stable Diffusion, less accessible, especially if used to generate abusive images without adequate safeguards.

(With inputs from agencies)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *