Home Ideas Journalists Are Accusing This AI Chatbot of Stealing Their Work

Journalists Are Accusing This AI Chatbot of Stealing Their Work

June 20, 2024

Google introduced AI Overviews in search results shortly after Google I/O in May, but it wasn’t first to the AI search game. It had already given Gemini the ability to search the internet, and Meta and other competing AI companies had done similarly with their own models. One of the biggest players in this field was Perplexity, which markets itself as a “conversational search engine”—basically another chatbot with internet access, but with even more of a focus on summaries and current events. Unfortunately, Perplexity is now finding itself in hot water after breaking rules and, like Google, returning wrong answer after wrong answer.

On June 11, Forbes published an article accusing Perplexity of stealing its content for quickly rewriting original articles without sourcing, and passing them off as its own. The AI company went as fair as to adapt Forbes’ reporting to podcast form. Shortly after, Wired ran an exposé on Perplexity, accusing it of “bullshitting” and breaking a widely held internet rule (more on that shortly). Now, we’re learning a lot more about what kind of recent data an AI might be able to train on going forward, and why AIs often make so many mistakes when trying to sum up current events.

Perplexity is accused of breaking a longstanding internet rule

Bots aren’t anything new on the internet. Before AI scraped websites for training material, search engines scraped websites to determine where to place them in search results. This led to a standard called the Robots Exclusion Protocol, which allows developers to lay out which parts of their site they don’t want bots to access. Perplexity says it follows this rule, but, spurred on by the Forbes story and an accusation of rule breaking from developer Robb Knight, Wired conducted its own investigation. What it discovered wasn’t flattering to Perplexity.

“Wired provided the Perplexity chatbot with the headlines of dozens of articles published on our website this year, as well as prompts about the subjects of Wired reporting,” Wired’s article reads. According to the investigation, the bot then returned answers “closely paraphrasing Wired stories,” complete with original Wired art. Further, it would summarize stories “inaccurately and with minimal attribution.”

Examples include the chatbot inaccurately accusing a police officer of stealing bicycles, and, in a test, responding to a request to summarize a webpage containing a single sentence with a wholly invented story about a young girl going on a fairy tale adventure. Wired concluded Perplexity’s summaries were the result of the AI flagrantly breaking the Robots Exclusion Protocol, and that its inaccuracies likely stemmed from an attempt to sidestep said rule.

According to both Knight and Wired, when users ask Perplexity questions that would require the bot to summarize an article protected by the Robots Exclusion Protocol, a specific IP address running what is assumed to be an automated web browser would access the websites bots are not supposed to scrape. The IP address couldn’t be tracked back to Perplexity with complete certainty, but its frequent association with the service raised suspicions.

In other cases, Wired recognized traces of its metadata in Perplexity’s responses, which could mean the bot may not be reading articles themselves, but accessing traces of it left in URLs and search engines. These wouldn’t be protected by the Robots Exclusion Protocol, but are so light on information that they’re more likely to lead to AI hallucinations—hence the problem with misinformation in AI search results.

Both of these issues presage a battle for the future of AI in search engines, from both ethical and technical standpoints. Even as artists and other creators argue over AI’s right to scrape older works, accessing writing that is just a few days old puts Perplexity at further legal risk.

Perplexity CEO Aravind Srinivas issued a statement to Wired that said “the questions from Wired reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work.” At the same time, Forbes this week reportedly sent Perplexity a letter threatening legal action over “willful infringement” of its copyrights.

Source: LifeHacker.com