Now AI Mode allows searching for images with conversational descriptions

Users will be able to find a pair of jeans, a piece of furniture, or the perfect decoration for their kitchen simply by describing what they need.

September 30, 2025

After having expanded AI Mode, Google has announced an update to the tool’s functionalities: from now on, it will allow users to explore visually through conversational descriptions. In other words, users will be able to describe what they have in mind —as if they were telling a friend— and receive as results images and naturally related products.

For example, if you are seeking inspiration to decorate a bedroom with a “maximalist style,” AI Mode will display images aligned with that “visual feeling,” and you will be able to continue asking, “more dark tones?”, “more contrast?” or “ideas with striking prints?”, all without the need to apply manual filters.

Moreover, you may initiate a search by providing an image (either uploading it or taking a photo) and then “converse” with AI Mode regarding what you observe.

Facilitates product searches for purchase

If you wish to purchase an item you have seen, you simply need to describe it. For instance: “barrel jeans that are not too loose,” and AI Mode will present visual, purchasable options. If a particular garment appeals to you, you can proceed directly to the seller’s website.

To achieve this, Google maintains a Shopping Graph that includes more than 50 billion product listings, enabling you to browse products from stores around the globe, ranging from major retailers to local shops, each complete with details such as reviews, latest offers, color choices, and availability. “You will only see the most recent shopping results, as more than 2 billion of these product listings on Google are updated every hour,” the company explains.

Lens technology and the power of Gemini 2.5

To achieve this new functionality, Google combines its expertise in visual search —with Lens and Image Search— with the multimodal capabilities of the Gemini 2.5 model, enabling the interpretation of both language and images.

Most notably, it utilizes a technique called “visual search fan-out”: instead of merely recognizing the primary subject of an image, it generates multiple queries regarding secondary details, context, additional objects, and visual nuances to provide more comprehensive answers.

This makes it possible for not only the main object to matter, but also its environment and visual relationships.

Additionally, the new mode is integrated multimodally: you may combine text with images, continue querying about what you see in an image, and refine your search progressively.