Google, we all ready to do a little Googling.
Welcome to Google I/O, it’s great to have all of you with us.
We’ll begin launching this fully revamped experience AI overviews to everyone in the US this week, and we’ll bring it to more countries soon. With Gemini, you’re making that a whole lot easier.
Ask Photos
Say you’re at a parking station ready to pay, now you can simply ask photos. It knows the cars that appear often, it triangulates which one is yours and just tells you the license plate number.
You can even follow up with something more complex, “Show me how Luci swimming has progressed.” Here Gemini goes beyond a simple search, recognizing different contexts from doing laps in the pool to snorkeling in the ocean. We are rolling out ask photos this this summer with more capabilities to come.
Gemini 1.5 Pro
Multimodality radically expands the questions we can ask and the answers we will get back. Long context takes this a step further, enabling us to bring in even more information: hundreds of pages of text, hours of audio, a full hour of video or entire code repositories.
You need a 1 million token context window, now possible with Gemini 1.5 Pro. I’m excited to announce that we are bringing this improved version of Gemini 1.5 Pro to to all developers globally. Gemini 1.5 Pro with 1 million contexts is now directly available for consumers in Gemini Advanced and can be used across 35 languages.
So today, we are expanding the context window to 2 million tokens. This represents the next step on our journey towards the ultimate goal of infinite context. And you couldn’t make the PTA meeting, the recording of the meeting is an hour along, if it’s from Google Meet, you can ask Gemini to give you the highlights. There’s a parents group looking for volunteers, you’re free that day, of course Gemini can draft a reply. Gemini 1.5 Pro is available today in Workspace Labs.
Notebook LM
Apresentador
Notebook LM is going to take all the materials on the left as input and output them into a lively science discussion personalized for him.
voz do software
So let’s uh, let’s dive into physics. What’s on deck for today? Well uh, we’re starting with the basics: force and motion. Okay, and that of course means we have to talk about Sir Isaac Newton and his three laws of motion.
Apresentador
And what’s amazing is that my son and I can join into the conversation and steer it whichever direction we want when I tap join.
voz do software
Hold on, we have a question. What’s up Josh?
Apresentador
Yeah, can you give my son Jimmy a basketball example?
voz do software
Hey Jimmy, that’s a fantastic idea! Basketball is actually a great way to visualize force and motion. Let’s break it down, okay? So first, imagine a basketball just sitting there on the court. It’s not moving, right? That’s because all the forces acting on it are balanced. The downward pull of gravity…
Apresentador
It connected the dots and created that age appropriate example for him.
CEO GOOGLE
Making AI helpful for everyone!
DeepMind FOUNDER
Last year, we reached a milestone on that path when we formed Google DeepMind. So today, we’re introducing Gemini 1.5 Flash. Flash is a lighter weight model compared to Pro. Starting today, you can use 1.5 Flash and 1.5 Pro with up to 1 million tokens in Google AI Studio and Vertex AI. Today, we have some exciting new progress to share about the future of AI assistante that we’re calling Project Astra.
Demonstração com o smartphone na mão e filmando e fazendo perguntas
Tell me when you see something that makes sound. I see a speaker, which makes sound. What is that part of the speaker called? That is the tweeter. It produces high frequency sounds. What does that part of the code do? This code defines encryption and decryption functions. It seems to use AES CBC encryption to encode and decode data based on a key and an initialization vector (IV). What can I add here to make this system faster? Adding a cache between the server and database could improve speed.
DeepMind FOUNDER
Today, we’re introducing a series of updates across our generative media tools with new models covering image, music and video.
Apresentador desconhecido
Today, I’m so excited to introduce Imagen 3. Imagen 3 is more photorealistic. You can literally count the whiskers on its snout with richer details like this incredible sunlight in the shot, and fewer visual artifacts or distorted images. You can sign up today to try Imagen 3 in ImageFX, part of our suite of AI tools at labs.google.com. Together with YouTube, we’ve been building Music AI Sandbox, a suite of professional music AI tools that can create new instrumental sections from scratch, transfer styles between tracks, and more.
DeepMind FOUNDER
Today, I’m excited to announce our newest, most capable generative video model called Veo. Veo creates high-quality 1080p videos from text, image, and video prompts. It can capture the details of your instructions in different visual and cinematic styles. You can prompt for things like aerial shots of a landscape or time-lapse, and further edit your videos using additional prompts. You can use Veo in our new experimental tool called VideoFX. We’re exploring features like storyboarding and generating longer scenes. Not only is it important to understand where an object or subject should be in space, it needs to maintain this consistency over time, just like the car in this video. Over the coming weeks, some of these features will be available to select creators through VideoFX at labs.google.com, and the waitlist is open now.
Google CEO
Today we are excited to announce the sixth generation of TPUs, called Trillium. Trillium delivers a 4.7x improvement in compute performance per chip over the previous generation. We will make Trillium available to our Cloud customers in late 2024.
Apresentadora Desconhecida
We’re making AI overviews even more helpful for your most complex questions. To make this possible, we’re introducing multi-step reasoning in Google search. Soon, you’ll be able to ask search to find the best yoga or Pilates studios in Boston and show you details on their intro offers and the walking time from Beacon Hill. You get some studios with great ratings and their introductory offers, and you can see the distance for each. Like this one, it’s just a 10-minute walk away. Right below, you see where they’re located, laid out visually. It breaks your bigger question down into all its parts and figures out which problems it needs to solve and in what order.
Next, take planning, for example. Now you can ask search to create a 3-day meal plan for a group that’s easy to prepare. Here, you get a plan with a wide range of recipes from across the web. If you want to get more veggies in, you can simply ask search to swap in a vegetarian dish. You can export your meal plan or get the ingredients as a list just by tapping here. Soon, you’ll be able to ask questions with video right in Google search.
Demonstração
I’m going to take a video and ask Google, “Why will this not stay in place?” And in near instant, Google gives me an AI overview, I guess some reasons this might be happening and steps I can take to troubleshoot.
Back to Apresentadora Desconhecida.
You’ll start to see these features rolling out in search in the coming weeks.
Apresentadora Indiana:
And now we’re really excited that the new Gemini-powered side panel will be generally available next month. Three new capabilities are coming to Gmail mobile. It looks like there’s an email thread on this with lots of emails that I haven’t read. And luckily for me, I can simply tap the summarize option up top and skip reading this long back and forth. Now, Gemini pulls up this helpful Mobile card as an overlay, and this is where I can read a nice summary of all the salient information that I need to know. Now I can simply type out my question right here in the Mobile card and say something like, “Compare my roof repair bids by price and availability.” This new Q&A feature makes it so easy to get quick answers on anything in my inbox without having to first search Gmail, then open the email, and then look for the specific information and attachments and so on.
I see some suggested replies from Gemini now. Here I see I have declined the service suggested a few times. These new capabilities in Gemini and Gmail will start rolling out this month to Labs users.
It’s got a PDF that’s an attachment from a hotel as a receipt, and I see a suggestion in the side panel to help me organize and track my receipts.
Step one: create a Drive folder and put this receipt and 37 others it’s found into that folder.
Step two: extract the relevant information from those receipts in that folder into a new spreadsheet. Gemini offers you the option to automate this, so that this particular workflow is run on all future emails. Gemini does the hard work of extracting all the right information from all the files in that folder and generates this sheet for you.
Show me where the money is spent. Gemini not only analyzes the data from the sheet but also creates a nice visual to help me see the complete breakdown by category. This particular ability will be rolling out to Labs users this September.
We’re prototyping a virtual Gemini powered teammate.
Apresentação homem de preto:
Chip’s been given a specific job role with a set of descriptions on how to be helpful for the team. You can see that here, and some of the tasks are to monitor and track projects, organize information, provide context, and more. Are we on track for launch? Chip gets to work, not only searching through everything it has access to but also synthesizing what’s found and coming back with an up-to-date response.
There it is, a clear timeline, a nice summary. And notice even in this first message here, Chip flags a potential issue the team should be aware of. Because we’re in a group space, everyone can follow along. Anyone can jump in at any time, as you see someone just did, asking Chip to help create a doc to address the issue.
Apresentador mulher de terno claro:
And this summer, you can have an in-depth conversation with Gemini using your voice. We’re calling this new experience Live. When you go live, you’ll be able to open your camera so Gemini can see what you see and respond to your surroundings in real-time.
So, we’re rolling out a new feature that lets you customize it for your own needs and create personal experts on any topic you want. We’re calling these “gems.” Just tap to create a gem, write your instructions once, and come back whenever you need it. For example, here’s a gem that I created that acts as a personal writing coach. It specializes in short stories with mysterious twists and even builds on the story drafts in my Google Drive. Gems will roll out in the coming months.
That reasoning and intelligence all come together in the new trip planning experience in Gemini Advanced. We’re going to Miami, my son loves art, my husband loves seafood, and our flight and hotel details are already in my Gmail inbox. To make sense of these variables, Gemini starts by gathering all kinds of information from search and helpful extensions like maps and Gmail. The end result is a personalized vacation plan presented in Gemini’s new Dynamic UI.
I like these recommendations, but my family likes to sleep in, so I tap to change the start time, and just like that, Gemini adjusted my itinerary for the rest of the trip. This new trip planning experience will be rolling out to Gemini Advanced this summer.
You can upload your entire thesis, your sources, your notes, your research, and soon interview audio recordings and videos too. It can dissect your main points, identify improvements, and even role-play as your professor.
Maybe you have a side hustle selling handcrafted products. Simply upload all of your spreadsheets and ask Gemini to visualize your earnings. Gemini goes to work calculating your returns and pulling its analysis together into a single chart. And of course, your files are not used to train our models. Later this year, we’ll be doubling the long context window to two million tokens.
Apresentador homem sport de pé:
We’re putting AI-powered search right at your fingertips. Let’s say my son needs help with a tricky physics word problem like this one. If he’s stumped on this question, instead of putting me on the spot, he can circle the exact part he’s stuck on and get step-by-step instructions right where he’s already doing the work. This new capability is available today.
Apresentador homem outro:
Now we’re making Gemini context aware. So, my friend Pete is asking if I want to play pickleball this weekend. So, I’m going to reply and try to be funny. I’ll say, “Uh, is that like tennis but with pickles?” And I’ll say, “Uh, create image of tennis with pickles.”
Now, one new thing you’ll notice is that the Gemini window now hovers in place above the app, so I stay in the flow. Okay, so that generated some pretty good images. Uh, what’s nice is I can then drag and drop any of these directly into the Messages app below. So cool, let me send that. And because it’s context aware, Gemini knows I’m looking at a video, so it proactively shows me an “ask this video” chip. “What is the two-bounce rule?” By the way, this uses signals like YouTube’s captions, which means you can use it on billions of videos. So give it a moment and there, starting with Pixel.
Later this year, we’ll be expanding what’s possible with our latest model, Gemini Nano, with multimodality. So, several years ago, we developed TalkBack, an accessibility feature that helps people navigate their phone through touch and spoken feedback. And now we’re taking that to the next level with the multimodal capabilities of Gemini Nano. So when someone sends Cara a photo, she’ll get a richer and clearer description of what’s happening. And the model even works when there’s no network connection. These improvements to TalkBack are coming later this year.
Apresentador homem outro 2:
1.5 Pro is $7 per 1 million tokens. And I’m excited to share that for prompts up to 128k, it’ll be 50% less for $3.50. 1.5 Flash will start at 35 cents per 1 million tokens. Today’s newest member, PaliGemma, our first Vision language open model, is available right now. I’m also excited to announce that we have Gemma 2 coming. It’s the next generation of Gemma, and it will be available in June.
Apresentador homem black:
Today, we’re expanding Synth ID to two new modalities: text and video. And in the coming months, we’ll be open-sourcing Synth ID text watermarking. I’m excited to introduce LearnLM, our new family of models based on Gemini and fine-tuned for learning. We’re developing some pre-made gems which will be available in the Gemini app and web experience, including one called Learning Coach.
Google CEO:
I have a feeling that someone out there might be counting how many times we have mentioned AI today. We went ahead and counted so that you don’t have to. That might be a record in how many times someone has said AI. Here’s to the possibilities ahead and creating them together. Thank you.