- Naveen Rao is VP of Generative AI at Databricks and co-founder of LLM-training platform MosaicML.
- Rao says copyright infringement could prevent companies from successfully monetizing AI.
- Databricks and MosaicML are building open-source LLMs to help companies achieve this.
Generative AI has a monetization problem, says Naveen Rao.
Rao, who oversees generative AI strategy for Databricks after it bought his startup MosaicML for $1.3 billion, likens it to the issue that crushed Napster, the 2000s-era music-sharing platform.
Napster, wildly popular at first, fundamentally changed the way people consume and share music, until it was sued over copyright infringement so many times that it went bankrupt three years after its launch. It has been bought and sold so many times since then, that it is now a shadow of its former self. Meanwhile, Apple launched iTunes and became the dominant force in online music streaming.
Rao sees a similar scenario playing out with generative AI, which writes or creates art, but only after it ingests vast amounts of data to train its models. OpenAI's introduction of ChatGPT last year has sparked a frenzy of AI model training — and a lot of concern that those models are using copyrighted material, training themselves to imitate it, or even use actual bits and pieces of it, leaving individuals open to intellectual property infringement and companies vulnerable to lawsuits.
Such lawsuits are already beginning. Just last month, a group of 17 popular authors, including Jodi Picoult and 'Game of Thrones' creator George R.R. Martin, sued OpenAI in federal court for "systematic theft on a mass scale," over concerns their work is being used to train its models. For Rao, this lawsuit is reminiscent of some of the first lawsuits against Napster, like one filed by Metallica in 2000.
"That needs to be respected," Rao said of copyrighted materials. "And we need tools to do that."
Rao has spent his entire career building those tools for this moment. A verification engineer by training with a PhD in neuroscience, Rao was researching neuromorphic machines — computers inspired by the human brain — at Qualcomm. He sold his first company, deep-learning startup Nervana, to Intel in 2016 for more than $350 million.
With MosaicML, Rao built a platform that offered companies foundational models to turn into their own LLMs and train them with their own data in a secure environment.
Rao's thesis is that if companies have a way to use their own data safely for model training on a transparent, open-source platform, they'll be free from worrying about legal challenges and free to successfully monetize their AI-based services.
Data sources aside, Rao sees another key business reason for companies to build their own LLMs — differentiation. MosaicML's platform provides a user-friendly infrastructure for companies to build their own models, something Rao says competitors like OpenAI don't necessarily offer.
"We build tools that enable companies to differentiate their AI from everyone else's and leverage their data uniquely," Rao told Insider.
Rao thinks of Mosaic's technology as a kind of a democratization of generative AI, and he's brought that ethos to Databricks, which in April launched its own open-source-trained LLM called Dolly that companies can use to help train their own models as well. The more people building generative AI technology the better, Rao says.
"What's interesting about technology is there's always some element of it that can be used in nefarious ways," said Rao. "But the way to stop that is by having more people armed with the same tools and being able to use them for good."