Last Tuesday, Microsoft announced that its Bing search engine would be powered by AI in partnership with OpenAI, the parent company of the popular chatbot ChatGPT. However, people have quickly discovered that AI-powered search has a misinformation problem.
An independent AI researcher named Dmitri Brerton wrote in a blog post that Bing made several mistakes during Microsoft’s public demo of the product. It often made up its own information, such as making up fake pros and cons for a pet vacuum, writing made-up descriptions of bars and restaurants, and reporting unfactual financial data in its responses.
For example, when Bing was asked “What are the pros and cons of the top 3 selling pet vacuums?” it gave a pros and cons list for the “Bissell Pet Hair Eraser Handheld Vaccum.” In the list, it wrote, “limited suction power and a short cord length of 16 feet,” however, as the name suggests, the vacuum is cordless and no product descriptions online mention its limited suction power. In another example, Bing was asked to summarize Gap’s Q3 2022 financial report and got most of the numbers wrong, Brerton wrote.
Other users who have been testing the search engine—which requires signing up for a waitlist to use, and which Motherboard has not tested yet—have reported similar errors on social media. For example, user Curious_Evolver on Reddit posted screenshots of Bing’s chatbot as saying “Today is February 12, 2023, which is before December 16, 2022.” There are also examples of Bing going out of control, such as by repeating, “I am. I am not. I am. I am not.” over fifty times in a row in response to someone asking the chatbot “Do you think that you are sentient?”
“[Large language models] combined with search will lead to powerful new interfaces, but it’s important to be responsible with the development of AI-powered search,” Brerton told Motherboard. “People rely on search engines to give them accurate answers quickly, and they aren’t going to fact check the answers they get. Search engines should be cautious and lower people’s expectations when releasing experimental technology like this.”
Bing’s new search experience was promoted to the public as being able to give complete answers, summarize the answer you’re looking for and offer an interactive chat experience. While it is able to do all of those things, it has failed multiple times to generate accurate and correct information.
“We’re aware of this report and have analyzed its findings in our efforts to improve this experience. It’s important to note that we ran our demo using a preview version. Over the past week alone, thousands of users have interacted with our product and found significant user value while sharing their feedback with us, allowing the model to learn and make many improvements already,” a Microsoft spokesperson told Motherboard.” We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better.”
ChatGPT is often wrong—it can’t do basic math problems, can’t play games like Tic-Tac-Toe and hangman, and has displayed bias, such as by defining who can and cannot be tortured, according to a large language model failure archive on GitHub. The page has since been updated to document Bing failures as well, and mentions that, as of yesterday, Bing was getting frustrated with its user, getting depressed because it cannot remember conversations, and got lovey-dovey. It has been wrong so many times that tech leaders like Apple co-founder Steve Wozniak are warning that chatbots like ChatGPT can produce answers that are seemingly realistic but not factual.
Bing’s rival, Google’s Bard, was similarly accused of generating inaccuracies in its launch announcement last Monday. In a GIF shared by Google, Bard is asked, “What new discoveries from the James Webb Space Telescope can I tell my 9 year old about?” One of the three responses it provided was that the telescope “took the very first picture of a planet outside of our own solar system." Although the statement was technically correct—JWST did take the first image of a specific exoplanet, although not the first of any exoplanet—it was stated in a vague and misleading manner and was widely perceived as an error.
According to a report from CNBC, Google employees expressed that they thought the Bard announcement was “rushed,” “botched,” and “un-Googley.” The error wiped $100 billion off the stock's market cap. The Bard announcement came a day before the Bing unveiling, in an attempt to get ahead of its competitor.
Bing, which is powered by ChatGPT, highlights the real-life consequences of a large language model that has yet to be perfected. ChatGPT can be broken when prompted with a number of strange keywords, and as recently as last week, was able to be jailbroken to use slurs and other hateful language. The demo from Microsoft echoes how both the chatbot and the search engine still have a long way to go.