Reddit sues Perplexity and three data companies for unauthorized content use

Reddit filed a lawsuit against four different companies, stating that their content was collected without permission and used for commercial purposes. These companies include SerApi, OxyLabs and AWMProxy, along with artificial intelligence-based Perplexity. Reddit claims that these companies access their own content through Google search results and use this data in their commercial products. In this context, the company not only demands compensation, but also wants a permanent ban on the use of all data scraped in the past.

In 2023, Reddit launched a new licensing process for companies that want to access data on its platform. With this new structure, the company required official permission from the platform to use user-generated content for commercial purposes. On the other hand, it is stated that some companies go beyond this system and continue their data collection activities. Agreements with major technology companies such as Google and OpenAI show that Reddit has taken a clear stance on this issue. Despite this, it is clearly stated that some companies bypass this licensed system.

Reddit claims to have documented unauthorized data use with a technical testing method

The most striking section in Reddit’s case file is about the technical evidence put forward for Perplexity. The company prepared a special test post that could only be indexed by Google, ensuring that this content did not leave the platform. Although this post was structured to be viewable only through Google, within a few hours this content was displayed verbatim in Perplexity’s response system. This makes it clear that, according to Reddit, the relevant data was obtained by directly scraping Google search results. In other words, access to the content was not provided from a publicly available source, but from a point intended to be protected with restricted access.

Reddit accuses not only Perplexity but also three other companies of scraping data using similar methods. Companies such as SerApi, OxyLabs and AWMProxy are known for collecting data from search engine results and selling it to third parties. Arguing that these companies gain commercial profit without running any licensing process, Reddit defines this situation as data exploitation. He finds it unacceptable that content created by users’ efforts is sold without any return. In addition, it is stated that these data collection activities have become systematic and have been continuing for years.

Perplexity, in its statement on the issue, said that the case has not yet reached them. The company stated that they defend the right of users to access public information freely and fairly. But Reddit emphasizes that the data in question is not only publicly available, but also made available under special conditions. Therefore, free access alone is not considered sufficient. In addition, removing content from its context and processing it systematically creates serious problems in terms of information security.

Reddit had previously sent a warning to Perplexity, asking for data scraping activities to be stopped. However, despite this warning, the company continued to use Reddit content in its systems. Thereupon, Reddit made tracking easier by creating special test content in order to provide more definitive evidence. These test posts are configured so that they can only be crawled by Google bots. However, the fact that these contents appeared in Perplexity’s results in a short time showed that the violation was directly proven.

The tightening of Reddit’s data policies was not limited to litigation processes. In 2024, system-wide speed limits were introduced to protect against unidentified bots and data collectors. In parallel, in August 2025, access to the Internet Archive’s popular service, the Wayback Machine, was restricted. In addition to all this, Reddit introduced the Really Simple Licensing (RSL) protocol to further clarify the rules regarding data access. Thanks to this protocol, license conditions were integrated into robots.txt files, making it clearer which data of sites can and cannot be scraped.