1.5 Flash excels at summarization, chat functions, picture and video captioning, knowledge extraction from lengthy paperwork and tables, and extra. It is because it’s been skilled by 1.5 Professional via a course of known as “distillation,” the place essentially the most important data and abilities from a bigger mannequin are transferred to a smaller, extra environment friendly mannequin.
Learn extra about 1.5 Flash in our up to date Gemini 1.5 technical report, on the Gemini know-how web page, and study 1.5 Flash’s availability and pricing.
Considerably enhancing 1.5 Professional
Over the previous few months, we’ve considerably improved 1.5 Professional, our greatest mannequin for basic efficiency throughout a variety of duties.
Past extending its context window to 2 million tokens, we’ve enhanced its code technology, logical reasoning and planning, multi-turn dialog, and audio and picture understanding via knowledge and algorithmic advances. We see robust enhancements on public and inside benchmarks for every of those duties.
1.5 Professional can now comply with more and more complicated and nuanced directions, together with ones that specify product-level conduct involving function, format and elegance. We’ve improved management over the mannequin’s responses for particular use circumstances, like crafting the persona and response model of a chat agent or automating workflows via a number of operate calls. And we’ve enabled customers to steer mannequin conduct by setting system directions.
We added audio understanding within the Gemini API and Google AI Studio, so 1.5 Professional can now purpose throughout picture and audio for movies uploaded in Google AI Studio. And we’re now integrating 1.5 Professional into Google merchandise, together with Gemini Superior and in Workspace apps.
Learn extra about 1.5 Professional in our up to date Gemini 1.5 technical report and on the Gemini know-how web page.
Gemini Nano understands multimodal inputs
Gemini Nano is increasing past text-only inputs to incorporate photos as effectively. Beginning with Pixel, functions utilizing Gemini Nano with Multimodality will be capable of perceive the world the way in which individuals do — not simply via textual content, but in addition via sight, sound and spoken language.
Learn extra about Gemini 1.0 Nano on Android.