Advancements in generative AI are moving at a break-neck pace and AI-powered search is one of the newest artificial intelligence trends. ChatGPT hit the scene in November of 2022, and it feels like just yesterday. More recently, we’ve seen search engines like Google and Bing integrate AI overview summaries into their search results pages. Microsoft has offered AI-powered search summaries for a while now, while Google introduced AI overviews for search at its 2024 I/O conference in May. “Were the AI overviews at least useful?” you may wonder.
The answer is not very — at least for now. Microsoft’s AI-powered chatbot got an alarming amount of election information wrong. Meanwhile Google has recalled its AI overviews feature, after it informed users to put glue on pizza to help cheese stick to it — among many other flubs and falsehoods.
Tech companies seem insistent on bringing AI summaries to their search results, but there are worries — namely hallucinations which lead AI to confidently make something up when it doesn’t know the answer. But what can companies do if these hallucinations are unavoidable? What can be done to make sure their AI chatbots offer the most accurate information possible? We spoke with four AI experts about the problem and what companies can do to fix it.
Abeba Birhane - Senior Advisor, AI Accountability, at Mozilla
“Google’s AI overviews hallucinate too much to be reliable. There is no clear evidence showing users even want this AI overview. In fact, to the contrary, many have expressed frustration with AI overviews being forced onto users where many prefer a simple list of links.
Generative systems tend to encode societal and historical patterns, patterns that often mistreat, misrepresent and disadvantage individuals and groups at the margins of society. Subsequently, genders, races, identities and cultures that are outside the status quo are often disproportionately negatively impacted.
Unfortunately, it currently isn’t clear if there’s a reliable solution to the AI hallucination problem. Overall, we can expect it to continue to improve but, right now, generative systems are proven to be inherently unreliable, and this means that Google’s AI overview will also remain unreliable.”
Laura Edelson - Assistant Professor of Computer Science at Northeastern University
There's a difference between how these systems are trained versus where they are sourcing their input data from. The real question is: what are you training for? Companies building these systems need to do a significant amount of additional work to make sure their systems reflect the best of us, not the worst. It's disappointing that so many systems get released in a state where they still regularly espouse racist, sexist, and classist views. Yet companies ship these features despite blaring safety concerns.
AI overviews are being created by systems using retrieval-augmented generation (RAG) to fetch results and then generate a summary. The dream is to create systems that not only can do this, but can track which search results led to which sentences in the summary. Whether or not they are going to get better or worse: the answer is likely both!
Jesse McCrosky - Principal Researcher, Open Source Research & Investigations at Mozilla (and former Google employee)
I’m shocked by how badly Google handled the rollout of AI Overviews. On one hand, I would say that AI Overview’s problems are generally overhyped, faked or, if real, come from attempts to find concerning examples. However, there have been some seriously concerning examples like when the AI Overview for the search “how many Muslim presidents has the US had?” claimed that Barack Obama was the first Muslim US president.
AI overviews present two sorts of risks: risks to those who use it and risks to the broader information economy. To the broader information ecosystem, the overviews could rob content creators of visits to their sites, reducing revenue streams or eliminating creators' incentives to write. This is really about the platforms taking an even bigger slice of the pie for themselves. I don’t think the pros outweigh the cons, at least not in their current form.
So how do we fix it? I don’t think the problem can be solved perfectly. There are many cases in which humans can’t agree on what is just or true, so why should we expect machines to do so? What’s needed is good governance, risk assessment, and risk mitigation. The real challenge is that the companies that need to be doing this don’t necessarily have the right incentives. Regulatory approaches to this are developing quickly, but I don’t know if they are quick enough.
Richard Whitt, President, GLIA Foundation; Mozilla Foundation Fellow in Residence (and former Google employee)
Google is signaling to the world that it is no longer a search company, but instead wants to be an answers company. When I worked at Google we took pride in having the end user spend as little time on Google's search site as possible but now the company has all but abandoned that mission. It's no longer about making information "accessible," but rather pulling it all together from many sources into a neat package and presenting it to the end user as “The Truth”. The fact that the company is using more advanced AI to accomplish this impressive feat should not obscure this reality.
This decided shift means that those blue links are not receiving the full recognition — or the revenues — that come from forging direct relationships with end users.
Having a company distill answers for us from the full panoply of the Web can be useful in some circumstances, but can also prove challenging when we consider something like hallucinations. I don't believe we should adopt tech companies' term "hallucinations," as it suggests an actual human being simply having a bad day — that kind of anthropomorphizing is dangerous. One pundit suggested we call it "making up shit." My glia.net project seeks to help solve this but, whatever term we use, my understanding is that companies are no closer to solving it. "Making up shit" may always be with us in these highly complex tools. That makes having our own AIs, trained on us and responding to us authentically, all the more critical.