What Is Gemini 3.5 Flash?

Google unveiled Gemini 3.5 Flash at Google I/O 2026 on May 19, 2026. It is the company's strongest model yet for coding and autonomous AI agents — and unlike many flagship AI announcements that take weeks to reach users, this one went live immediately for billions of people worldwide.

To understand the significance, you need to know where Flash models sit in Google's hierarchy. Flash has always been the faster, more affordable sibling to Pro. It's the workhorse — the model you throw at high-volume tasks where latency matters. What makes 3.5 Flash different is that it has crossed a threshold: it now outperforms older Pro-tier models on multiple benchmarks, including coding. The gap between Flash and Pro has shrunk dramatically, which is a genuinely big deal.

💡
Entity Context

Gemini 3.5 Flash is a large language model (LLM) developed by Google DeepMind, part of the Gemini model family. It is designed for agentic and coding tasks requiring speed and efficiency, operating on Google's proprietary AI infrastructure.

Koray Kavukcuoglu, DeepMind's chief technologist, put it plainly ahead of the launch: "3.5 Flash offers an incredible combination of quality and low latency. It outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks" — covering coding, agentic tasks, and multimodal reasoning.

That's not marketing fluff. The benchmarks back it up, and we'll get into the numbers shortly.

Why Gemini 3.5 Flash Is Now the Default in AI Mode

Google's AI Mode in Search is the conversational layer built on top of Google's traditional search index. Instead of returning ten blue links, AI Mode generates synthesized, context-aware responses — drawing from live web data and structured knowledge to answer complex, multi-step questions.

Until this week, that experience ran on Gemini 3 Flash (launched in December 2025). Now it runs on 3.5 Flash. The reason for the switch is straightforward: 3.5 Flash is materially better in every way that matters for a real-time search experience — faster responses, richer multimodal understanding, and dramatically stronger reasoning on technical and scientific questions.

Previous Default
Gemini 3 Flash
  • Good at general Q&A
  • Strong multimodal base
  • 11% on Humanity's Last Exam
  • Limited agentic capabilities
  • Standard coding support
New Default (May 2026)
Gemini 3.5 Flash ✦
  • Frontier-level reasoning at speed
  • 4× faster than comparable models
  • 90.4% on GPQA Diamond
  • Full agentic, long-horizon tasks
  • 78% SWE-bench Verified score

The practical effect: if you've been using AI Mode in Google Search, your answers just got significantly sharper — particularly for questions involving science, mathematics, coding, and multi-step reasoning. Most users won't need to do anything; the upgrade happened automatically.

Speed and Coding Capabilities Explained

The Speed Story

Google claims Gemini 3.5 Flash is four times faster than comparable frontier models. That's not a claim they make about raw token generation speed — it's about useful work done per unit of time. For developers running agentic pipelines, this is the number that actually matters.

In practical terms, tasks that previously took developers hours or auditors weeks can now be completed in a fraction of the time, according to Google. Partner companies already testing the model in enterprise settings — including banks and fintech firms — are reportedly automating multi-week workflows. These aren't marketing claims from early adopters; they're observations from teams that have been using the model in production environments before the public launch.

Speed Context

Gemini 3.5 Flash uses 30% fewer tokens on average than Gemini 2.5 Pro for thinking tasks, meaning you get better answers while spending less on API costs — a rare combination in large language models.

The Coding Story

This is where 3.5 Flash genuinely surprised the industry. On SWE-bench Verified — the standard benchmark for evaluating coding agent capabilities — it scored 78%. That score outperforms not just previous Flash models but also Gemini 3.1 Pro, the former top of the Gemini lineup.

The model was co-developed with Antigravity, Google's new agent-first development environment (previously known as Project IDX). That integration shows: 3.5 Flash can independently execute coding pipelines, manage multi-file codebases, build and test components iteratively, and — according to one of Google's internal demos — assemble an operating system from scratch using a team of coordinated sub-agents.

That last demo is worth dwelling on. A group of software agents, coordinated by 3.5 Flash, split up the OS components, built them in parallel, then stitched everything together. Whether that demo represents something you'd deploy in production is a separate question. But the architecture it illustrates — Flash as the executor, Pro as the planner — is the actual model Google is building toward.

🧠
PhD-Level Reasoning

Scored 90.4% on GPQA Diamond, a benchmark designed to test doctoral-level scientific reasoning across biology, chemistry, and physics.

🖥️
Advanced Coding

78% on SWE-bench Verified. Can execute entire coding pipelines autonomously — from understanding requirements to writing, testing, and iterating on code.

🎯
Multimodal Input

Processes text, images, audio, and video natively with near real-time response performance, scoring 81.2% on MMMU-Pro for multimodal understanding.

🔧
Tool Use at Scale

Scored 83.6% on MCP Atlas scaled tool use benchmark — the measure of how well an agent coordinates multiple tools and APIs simultaneously.

Benchmark Comparison: Gemini 3.5 Flash vs Older Models

Numbers without context aren't useful. Here's what each benchmark actually measures, alongside the scores:

Benchmark What It Tests Gemini 2.5 Flash Gemini 3.1 Pro Gemini 3.5 Flash NEW
GPQA Diamond PhD-level science reasoning ~85% 90.4%
Humanity's Last Exam Expert-level multi-domain tasks 11% ~30% 33.7%
MMMU-Pro Multimodal understanding ~80% 81.2%
SWE-bench Verified Coding agent capabilities ~74% 78%
Terminal-Bench 2.1 Agentic terminal task execution 76.2%
MCP Atlas Tool Use Multi-tool coordination at scale 83.6%
⚠️
Important Note

Benchmark scores are useful for comparison but don't capture everything. Real-world performance on your specific task may differ. Use benchmarks as a starting point, not a final verdict.

Agentic Features: What Actually Changed for Users

Google has been saying "agentic AI" for two years. Gemini 3.5 Flash is the first moment where that phrase maps onto something most people will actually experience in their day-to-day searches.

AI Mode in Search now handles genuinely long-horizon requests. You can upload a PDF, a screenshot, or a video directly into Search and ask follow-up questions across multiple turns. Search doesn't reset between queries — it holds context, refines understanding, and generates increasingly specific answers as the conversation develops.

For developers specifically, the model can now autonomously iterate on code, manage entire project workflows, and coordinate sub-agents that handle separate components of a larger task. The comparison Google draws is instructive: tasks that previously required a developer's sustained attention for a full day can now be handed to the model with minimal input and reviewed when complete.

"3.5 Flash delivers frontier-level intelligence at exceptional speed — proving you no longer have to trade quality for latency."

— Google DeepMind, May 2026

The safety story is also part of this. Google says 3.5 Flash is less likely to produce harmful content while simultaneously being less likely to refuse safe queries — a balance that prior models handled poorly. The approach uses interpretability tools that check the model's internal reasoning before it generates a response, rather than applying blunt post-generation filters.

Gemini Spark: Google's New 24/7 Agent, Powered by 3.5 Flash

One of the bigger announcements alongside 3.5 Flash is Gemini Spark — a personal AI agent that runs continuously in the background, handling tasks across Gmail, Google Docs, Calendar, and more than 30 third-party integrations.

Spark runs entirely on Gemini 3.5 Flash. It's designed to operate under your direction — not autonomously — taking actions like drafting emails, scheduling meetings, surfacing relevant documents, and tracking information you asked it to monitor. Think of it as a digital assistant that doesn't log off.

Gemini Spark is rolling out to trusted testers immediately. A wider beta for Google AI Ultra subscribers in the US follows next week. Google has not given a timeline for broader availability.

📱
Where to Access Gemini 3.5 Flash

Gemini app (Android & iOS) · AI Mode in Google Search · Google Antigravity IDE · Gemini API in Google AI Studio · Android Studio · Gemini Enterprise Agent Platform. No account upgrade needed for the Gemini app and Search.

Who Benefits Most from This Update

The honest answer is: most people who use Gemini or Google Search with AI Mode. But the benefit isn't equal across user types.

Developers and engineers get the most obvious upgrade. If you're building with the Gemini API or inside Antigravity, 3.5 Flash's coding performance means faster, more accurate code generation, better debugging assistance, and the ability to hand off longer workflows to the model without babysitting every step.

Students and researchers will notice the improvement on complex questions. GPQA Diamond at 90.4% means the model can reason through graduate-level science problems with meaningful accuracy. That's not the same as being infallible on research — don't use it to generate citations without verification — but it's a significant step up for understanding and exploring difficult material.

Everyday users get a subtler improvement. Searches involving follow-up questions, image interpretation, document analysis, or anything requiring reasoning across multiple pieces of information will feel noticeably more coherent. The model holds context better and produces more specific answers.

Enterprises using Vertex AI or Gemini Enterprise get the most structured access, with the ability to deploy Flash inside their existing infrastructure for high-volume workflows — at a pricing point Google describes as "often less than half the cost" of comparable frontier models.

Frequently Asked Questions

Q What is Gemini 3.5 Flash?
Q Is Gemini 3.5 Flash the default model in AI Mode in Google Search?
Q How does Gemini 3.5 Flash perform on coding benchmarks?
Q Is Gemini 3.5 Flash free to use?
Q What is Gemini Spark and how does it use Gemini 3.5 Flash?
Q When is Gemini 3.5 Pro coming?
The Bottom Line

Gemini 3.5 Flash is a meaningful upgrade, not a rebrand. The combination of frontier-level benchmark scores, genuine coding capability, and four-times-faster inference puts it in a different category than prior Flash models. For the average person using Google Search, the improvement is mostly invisible in the best way — answers get better without requiring any effort. For developers, the upgrade is substantial enough to revisit workflows you may have ruled out as too slow or too expensive six months ago. And for the broader AI field, it signals that Google's bet on efficiency over raw parameter count is paying off in ways that actually matter.