Google's AI can ingest multiple books. What will it do with all that data?

Google Gemini 1.5 Pro has a context window of 1 million tokens.
Users will be able to submit giant prompts that are equivalent to multiple books.
What will Google do with all the data that users share with the company?

Two decades ago, Google cofounder Larry Page had a dream to digitally scan millions of books. It turned into a long and bitter legal battle that the company eventually won.

Today, the emergence of huge AI models is turning this book-scanning debate on its head.

Google will soon release a powerful new model called Gemini 1.5 Pro that has a context window of 1 million tokens. That's about 750,000 words, or the equivalent of 3 to 7 books depending on the length. It can also suck in 1 hour of video, 11 hours of audio, and more than 30,000 lines of code via user prompts.

"Entirely new capabilities"

Until recently, AI models could only handle a few thousand tokens. This meant that users were limited in their interactions with these systems. It was a bit like having a conversation with a forgetful friend who would have to restart the chat from scratch every so often.

Gemini 1.5 Pro is being previewed to a few lucky early testers. When it rolls out fully, users will be able to dump in whole book series, codebases, entire legal case histories, or really anything they want. This Google model can ingest all this information quickly and then answer questions about the data.

"Longer context windows show us the promise of what is possible," Google CEO Sundar Pichai said when unveiling Gemini 1.5 in February. "They will enable entirely new capabilities."

A giant digital vacuum

What is Google going to do with the data people share through Gemini 1.5?

After trying so hard for so many years to scan millions of books itself, Google will now have users willingly dumping whole volumes into its AI model, along with mountains of other text, code, images, and video.

It's highly likely that this information will be used as training data to help Google build other models. The emergence of generative AI has sparked a global race for high-quality data, so a huge context window can work as a giant digital vacuum.

Google says data shared with Gemini "helps improve and develop Google products, services, and machine-learning technologies."

Machine learning is a type of AI. So it's safe to interpret this comment as a yes: Google will use this data to train future AI models.

Developers versus corporate customers

The internet giant treats information shared with its AI models and services differently, depending on the offering.

Google AI Studio is a new developer tool for Gemini. For this service, the company says content submitted "may be used to improve our services, including our machine-learning technologies."

Vertex AI is an enterprise platform for larger corporate customers. Google told BI that in this case the company "doesn't use customer data to train Google models without that customer's permission."

Gemini 1.5 Pro, the fanciest Google AI model with the largest context window, is not fully available yet, so the terms of service aren't out. A Google spokesperson declined to comment on which data-use approach will apply to this top model. "We'll prioritize transparency, choice, and control," they added.

A brave new AI world

Either way, this is a brave new AI world of information sharing. It's probably why some big companies have sent around warnings again recently prohibiting employees from sharing sensitive data with AI models.

Google also warns users about sharing certain data with its models.

"Do not submit sensitive, confidential, or personal information to the Services," the company says in bold writing in one of its current Gemini terms of service.

Prompt data controls

Here are some other important tips for controlling how Google uses any prompts you submit to its AI models. These are from a company spokesman and Google terms of service.

You can turn Gemini Apps Activity off through this dashboard. This prevents your future conversations from being used to improve Google's generative AI models.
If this setting is off, your conversations will still be saved for up to 72 hours to help Google provide the Gemini AI service and process any feedback you want to share with the company.
In those 72 hours, unless you give feedback, your conversations also won't be used to improve Google products, including its AI models.
If you're 18 or older, Google stores your Gemini Apps activity in your Google Account for up to 18 months by default. You can cut that down to 3 or 36 months in your Gemini Apps Activity settings.
You can also review or delete your activity in that same dashboard at any time.

Read the original article on Business Insider