Allegations that Perplexity's websites violate the robots.txt file are on the agenda

Perplexity, a company that describes itself as a “free AI search engine,” has been facing a variety of criticisms in recent days. Just after Forbes accused Perplexity of stealing its news and republishing it on various platforms, Wired also reported that the company was ignoring the Robots.txt Protocol and scraping various websites, including Condé Nast publications. Technology site The Shortcut also claimed to have received the company’s articles. Finally, Reuters reported that Perplexity was not the only AI company committing these violations.

Reuters said it had seen a letter sent to publishers from TollBit, a startup that matches AI firms with publishers to strike licensing deals. This letter warned that “AI agents from many sources (not just one company) are choosing to bypass the robots.txt protocol to collect content.” The robots.txt file contains instructions to web crawlers about which pages they can access and which they cannot. Web developers have been using this protocol since 1994, but compliance is completely voluntary.

Although TollBit’s letter did not name any companies, Business Insider has learned that OpenAI and Anthropic, developers of the ChatGPT and Claude chatbots respectively, also ignore robots.txt signals. Both companies have previously claimed that they respect the “do not crawl” instructions they include in their websites’ robots.txt files.

During Wired’s investigation, it was revealed that a machine on an Amazon server “strictly operated by Perplexity” was ignoring the website’s robots.txt instructions. To verify whether Perplexity had cited its content, Wired provided the company’s tool with the titles of its articles or short descriptions of its stories. The tool produced results that closely rewrote their articles with “minimal attribution.” He even occasionally created inaccurate summaries of his stories. Wired noted that the chatbot created a false news story claiming that a particular California police officer had committed a crime.

In an interview with Fast Company, Perplexity CEO Aravind Srinivas said his company “does not ignore the Robot Exclusions Protocol and then lie about it.” However, that doesn’t mean there aren’t benefits from browsers that ignore the protocol. Srinivas explained that the company uses third-party web browsers as well as its own browsers, and the browser identified by Wired is one of them. When Fast Company asked whether Perplexity had told its browser provider to stop scraping Wired’s website, Srinivas simply replied, “it’s complicated.”

Srinivas defended his company’s practices, stating that the Robots.txt Protocol “is not a legal framework” and suggested that a new type of relationship might need to be established between publishers and companies. He also hinted that Wired deliberately used certain inputs to make Perplexity’s chatbot behave this way, so casual users won’t get the same results. Regarding the false summaries produced by the tool, Srinivas said, “We never said that we did not see hallucinations.” said.

Allegations that Perplexity’s websites violate the robots.txt file are on the agenda