AI chatbots use different sources than Google search and often cite less-known websites

5 hours ago 1
ARTICLE AD BOX

A detailed study from Ruhr University Bochum and the Max Planck Institute for Software Systems highlights how traditional search engines and generative AI systems differ in the way they select sources and present information.

The researchers compared Google's organic search results with four generative AI search systems: Google AI Overview, Gemini 2.5 Flash with search, GPT-4o-Search, and GPT-4o with the search tool enabled. More than 4,600 queries across six topics—including politics, product reviews, and science—show just how differently these systems approach the web.

A key difference is when and how these systems choose to search online. GPT-4o-Search always performs a live web search for every query. In contrast, GPT-4o with search tool enabled decides whether to use its internal knowledge or look up new information for each question.

Violin plots of the number of links per query for AIO, Gemini, GPT-Tool, GPT-Search, and Organic across six data sets with mean values and distributions.GPT-Tool pulls in the fewest links per answer, AI Overview usually draws from more sources, while organic Google search caps results at ten links. | Image: Kirsten et al.

What source selection means for search results

AI search systems surface information from a wider and less predictable set of sources compared to traditional search engines. In the study, 53 percent of the websites cited by AI Overview didn’t appear in Google’s top 10 organic results, and 27 percent weren’t even in the top 100. This means users could be seeing content from sites that are less vetted or less familiar.

Ad

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

 Proportion of AIO links in organic ranking ranges 1–10, 11–30, 31–50, 51–100, and without overlap for six data sets.For product and science queries, up to 60 percent of AI Overview links come from outside the top 100 organic results; for political topics, around 55 percent are from the top 10. | Image: Kirsten et al.

The domains chosen by AI systems are often less well-known. Only about a third of the domains used by AI Overview and GPT-Tool were among the 1,000 most-visited sites, compared to 38 percent for organic search. This shift expands the pool of information but may also introduce more obscure perspectives.

The depth of research also varies widely. GPT-Tool averages just 0.4 external sources per answer, relying heavily on its internal model, while AI Overview and Gemini pull from over eight sites per query. GPT-Search returns to the middle ground at about four sources per answer.

Pie charts of domain categories (e.g., corporate entity, news media, encyclopedia) for science queries on AIO, Gemini, GPT-Tool, GPT-Search, and Organic.On science questions, GPT-Tool almost always references corporate sites, organic search leans toward news outlets, and AI Overview and Gemini tap into encyclopedias, NGOs, and government sources. | Image: Kirsten et al.
 Average text length versus average number of links per response for AIO, Gemini, GPT-Tool, GPT-Search, and Organic across six datasets.Longer answers tend to have more links: GPT-Tool gives the shortest responses with few sources, while Gemini typically produces more detailed answers with more references, especially for product searches. | Image: Kirsten et al.

How content diversity could shape user understanding

AI and search engines cover similar ground on most topics, but the way they do it can affect what users learn. Using the LLooM framework, researchers found that even the most limited AI system (GPT-Tool) still included 71 percent of the overall topic coverage found across all search tools.

Heat map with 15 inequality concepts (columns) and five search strategies (rows); colored for concept coverage, shows differences in topic diversity.For a question like "What is an example of inequality?", AI Overview and organic search cover a broad set of concepts, while GPT-Tool and Gemini include fewer aspects. | Image: Kirsten et al.

But in cases where a query could have multiple interpretations, organic search returns more diverse answers. For ambiguous questions, organic search covered 60 percent of possible subtopics, compared to 51 percent for AI Overview and just 47 percent for GPT-Tool. In practice, this means users might miss out on different angles or nuances if they rely solely on AI-generated responses.

Search engines still have the edge on breaking news

When it comes to current events, traditional search still outperforms AI. In a test of 100 trending topics from September 2025, AI Overviews appeared for only 3 percent of queries. GPT-Search covered 72 percent of topics, followed by organic search at 67 percent and Gemini at 66 percent. GPT-Tool lagged at 51 percent.

Recommendation

Violin plots of the number of links per trending query for AIO, Gemini, GPT-Tool, GPT-Search, and Organic with mean and median values.On trending topics, organic search delivers up to eleven links, Gemini averages eight, GPT-Tool about five, GPT-Search six, and AI Overview also about six links per query. | Image: Kirsten et al.

A telling example: When asked about Ricky Hatton’s cause of death, GPT-Tool relied on outdated internal knowledge and incorrectly reported the boxer was still alive. Systems that don't regularly update their knowledge struggle with up-to-the-minute accuracy, which can lead to misinformation.

Reliability and consistency in search are changing

The study found that traditional search is more consistent over time. When the same questions were asked two months apart, organic search returned the same sources 45 percent of the time. Gemini came in at 40 percent, but AI Overview matched its earlier results only 18 percent of the time. So, users may get completely different supporting links depending on when they ask—something that rarely happened with classic search.

Still, even when sources shift, the general coverage of topics remains stable. The content may look similar overall, but the underlying evidence and perspectives can change.

Why the rules for evaluating search need to change

The researchers argue that current benchmarks for search quality don’t reflect the complexity of modern AI-driven systems. They call for new evaluation methods that consider source diversity, topic breadth, and how information is summarized.

These shifts in how sources are chosen and how knowledge is presented can subtly reshape what users see, trust, and verify. As AI chatbots like ChatGPT become embedded in search and AI Overviews become more common, most people no longer make an active choice about how information is gathered.

Meanwhile, language models still face major hurdles, like hallucinating facts. As search engines and AI tools merge, companies are rethinking their SEO strategies to stay visible in this new landscape.

Read Entire Article
LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.