- Generative AI could soon be trained on AI-generated content — and experts are raising the alarm.
- The phenomenon, which some experts call "model collapse," could result in AI producing low-quality outputs down the line.
- The new term comes as AI-generated content filled with errors continues to flood the internet.
Experts warn that AI-generated content may pose a threat to the AI technology that produced it.
In a recent paper on how generative AI tools like ChatGPT are trained, a team of AI researchers from schools like the University of Oxford and the University of Cambridge found that the large language models behind the technology may potentially be trained on other AI-generated content as it continues to spread in droves across the internet — a phenomenon they coined as "model collapse." In turn, the researchers claim that generative AI tools may respond to user queries with lower-quality outputs, as their models become more widely trained on "synthetic data" instead of the human-made content that make their responses unique.
Other AI researchers have coined their own terms to describe the training method. In a paper released in July, researchers from Stanford and Rice universities called this phenomenon the "Model Autography Disorder," in which the "self-consuming" loop of AI training itself on content generated by other AI could result in generative AI tools "doomed" to have their "quality" and "diversity" of images and text generated falter. Jathan Sadowski, a senior fellow at the Emerging Technologies Research Lab in Australia who researches AI, called this phenomenon "Habsburg AI," arguing that AI systems heavily trained on outputs of other generative AI tools can create "inbred mutant" responses that contain "exaggerated, grotesque features."
While the specific effects of these phenomena are still unclear, some tech experts believe that "model collapse" and AI inbreeding could make it difficult to pinpoint the original source of information an AI-model is trained on. As a result, providers of accurate information such as the media may decide to limit the content they post online — even putting it behind paywalls — to prevent their content from being used to train AI, which could create a "dark ages of public information," according to an essay written by Ray Wang, the CEO of tech research firm Constellation Research.
Some tech experts are less worried about the growth of AI-generated content on the internet. Saurabh Baji, the senior VP of engineering at AI-firm Cohere, told Axios that human guidance is "still critical to the success and quality" of its AI-generated models, and others told the outlet that the rise of AI-generated content will only make human-crafted content more valuable.
These new terms come as AI-generated content has flooded the internet since OpenAI launched ChatGPT last November. As of August 28, NewsGuard, a company that rates the reliability of news websites, identified 452 "unreliable AI-generated news outlets with little to no human oversight" that contain stories filled with errors. AI-generated sites with generic names like iBusiness Day, Ireland Top News, and Daily Time Update may appeal to consumers as accurate sources of information, which would bolster the spread of misinformation, according to NewsGuard.
It's not just AI-generated websites that have produced articles filled with inaccuracies. In January, tech publication CNET published 77 articles using an "internally designed AI engine" and had to issue significant corrections after learning that its articles were riddled with basic math errors. Months later, Gizmodo criticized company executives after the media outlet published AI-written articles with factual inaccuracies. Most recently, Microsoft removed a string of articles from its travel blog, one of which was found to be an AI-generated article recommending visitors in Ottawa to visit the Ottawa Food Bank and to "consider going into it on an empty stomach."
Now that AI-content detectors like ZeroGPT and OpenAI's Text Classifier have been found to be unreliable, people may find it harder to discover accurate information with human oversight online, Kai-Cheng Yang, a computational social science researcher who has written a paper about the malicious actors that could take advantage of OpenAI's chatbot, previously told Insider.
"The advancement of AI tools will distort the idea of online information permanently," Yang said