Google launches Gemini 2.0, focusing on AI agents and multimodal capabilities

10 months ago 20

ARTICLE AD BOX

Google DeepMind today announced the next version of its AI model. Gemini 2.0 Flash Experimental is available now in the web chat app and to developers and select testers through the Gemini API in Google AI Studio and Vertex AI, with a broader release planned for early 2025.

The new version brings significant improvements to its multimodal capabilities, processing text, images, video, and audio while generating images and multilingual voices natively. Google plans to integrate Gemini 2.0 into its AI Overviews—infamously known for their mixed accuracy—to handle more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding challenges.

According to Google, Gemini Flash 2.0 runs twice as fast as its predecessor, Gemini 1.5 Pro. While it nearly matches Anthropic's Sonnet "3.6" in benchmarks, it may be much cheaper given Google's pricing for Flash 1.5. Keep in mind that benchmark performance often differs from real-world performance.

Performance metrics of Gemini versions across categories like coding, mathematics and reasoning with benchmark results.

Google is rolling out a chat-optimized version of Gemini 2.0 Flash Experimental to all Gemini users through desktop and mobile web browsers. The company plans to add mobile app integration in the near future.

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

For developers, Google plans to integrate Gemini 2.0 into various platforms including Android Studio, Chrome DevTools, and Firebase. The enhanced coding support, called Gemini Code Assist, will be available in popular integrated development environments such as Visual Studio Code, IntelliJ, and PyCharm.

Three specialized AI agents

Along with Gemini 2.0, Google has introduced two new research prototypes that showcase Gemini 2.0's agentic capabilities.

Project Mariner functions as an experimental Chrome extension designed for web-based tasks. The prototype has demonstrated strong performance, achieving an 83.5 percent success rate in real-world testing scenarios. To maintain security, the agent can only operate within the active browser tab and requires explicit user confirmation for sensitive actions such as purchases.

The second agent, Jules, focuses on supporting developers through GitHub workflow integration. This agent can work asynchronously, develop multi-stage troubleshooting plans, and prepare pull requests. Currently, Jules is available only to a select group of testers.

Project Astra, which Google had previously announced, will take advantage of Flash's speed and multimodal capabilities. This universal AI assistant can maintain multilingual conversations with up to ten minutes of context memory. The system integrates with Google Search, Lens, and Maps to provide comprehensive assistance.

Recommendation

Google is also upgrading its existing data science agent for Google Colab to use Gemini 2.0. The agent can automatically generate analyses based on user descriptions. In a recent project at the Lawrence Berkeley National Laboratory, Google claims the system cut analysis time from a week to minutes. Developers interested in testing the agent can submit requests for access.

Gaming and robotics experiments

Additionally, Google DeepMind is testing Gemini 2.0 in video games, where agents provide real-time strategic advice to players by analyzing screen content. The speed of the Flash model makes these real-time applications possible. The company also plans to test the model's enhanced spatial reasoning capabilities in robotics applications.

Google launches "Deep Research" for Gemini Advanced

Google has also introduced Deep Research for Gemini Advanced subscribers. This new agent-based feature automates complex searches and quickly generates comprehensive reports.

The company says the system is designed to mimic human research methods: searching, analyzing information, and initiating new queries based on findings. Results appear in structured reports with sources that can be exported to Google Docs. The feature combines Google's search technology with Gemini's analysis capabilities and uses a large context window of 1 million tokens.

Read Entire Article