Perplexity hits back at Cloudflare amid claims of website 'stealth crawling' to dodge AI blocks
Perplexity denies its bots are slipping past Cloudflare blocks or ignoring robots.txt files


Cloudflare has accused Perplexity of failing to honour requests from websites to opt out of content scraping by AI companies.
Last month, the web infrastructure company announced a system to block AI companies from accessing websites without permission or compensation. The move came as part of a push back against AI companies hoovering up the entire internet to use as training data — a tactic that has sparked lawsuits.
Cloudflare's system lets online publishers and other website owners block AI crawlers from seeing their content, with future plans to only allow those who have paid to scrape.
Several weeks into the blocking system, Cloudflare has reported that AI company Perplexity is using evasive techniques to access that content regardless. In a blog post this week, the firm said Perplexity changes how it presents itself to a website when it spots a block.
"Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences," the post noted.
ITPro contacted Perplexity for a statement, but had received no response at time of publication. A spokesperson for the firm told TechCrunch that the Cloudflare research was a "sales pitch" for the blocking product and said that the bot discussed "isn't even ours."
In a separate statement to The Verge, the company said Cloudflare's report was a "publicity stunt" and that there were "a lot of misunderstandings in the blog post".
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
It's not the first time Perplexity has been accused of letting its bots crawl where they're not wanted. Reports from Wired spotted such behavior last year while Forbes, the New York Times, and the BBC accused the company of scraping and reproducing their content without permission.
Perplexity has denied the accusations.
What Cloudflare claims
Cloudflare said it saw "continued evidence" that Perplexity's user agent is changing their user agent and the source where it's coming from to hide this activity, and even ignoring or failing to view the "robot.txt" files — these are a list of instructions for bots telling them what to access and not, used for search crawlers and now AI agents.
After hearing complaints from customers who had tried to block AI crawlers, Cloudflare set up a series of experiments using brand-new test websites that were not publicly accessible, with a robots.txt file directing "respectful bots from accessing any part of a website."
Cloudflare then asked Perplexity AI questions about the domains, and found it was able to access detailed information from the restricted test sites.
"This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers," the post noted.
Cloudflare said Perplexity is not only using a declared user agent, but also a generic browser that impersonates Chrome on macOS when the declared agent is blocked.
For comparison, Cloudflare ran similar tests with OpenAI's ChatGPT, finding it fetched the robots.txt file and stopped crawling when told not to access a page; when there were no instructions in the robots.txt file but there was a block page, ChatGPT again stopped crawling.
"Both of these demonstrate the appropriate response to website owner preferences," Cloudflare said.
Danger to the internet?
Cloudflare said that this behavior risks the network of trust that holds up the internet.
"There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences," the post said.
The company added that it would now block the AI company from websites using its service.
"Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling," the post added.
Calling on AI companies to behave better, Cloudflare said "well-intentioned crawlers acting in good faith" should be transparent and identify the agent honestly, and not attempt to dodge detection by sites attempting to block such access.
For sites that do allow access, AI crawlers should behave fairly and not flood sites with too much traffic or scrape sensitive data, and serve a "clear purpose" — such as checking a price or powering a voice assistant.
Cloudflare also suggested AI companies use separate web crawlers for each activity, letting website owners more easily allow some crawler activity but not others. "Don’t force site owners to make an all-or-nothing decision," the post said.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO
- A threat to Google’s dominance? The AI browser wars have begun – here are the top contenders vying for the crown
- Sick and tired of spreadsheets? Perplexity’s new tools can help with that
- Perplexity AI, a startup that just raised $73.6 million from Nvidia and Databricks, wants to take on Google's search engine
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
Nearly one-third of ransomware victims are hit multiple times, even after paying hackers
News Many ransomware victims are being hit more than once, largely thanks to fragmented security tactics
-
Millions of Dell laptops are are at risk thanks to a Broadcom chip vulnerability
News Widely used in high-security environments, the PCs are vulnerable to attacks allowing the theft of sensitive data
-
The state of cybersecurity in Europe
Cyber professionals across Europe reveal which industries have been worst hit by an increasingly hostile threat landscape and what they are doing to stay protected
-
The state of cybersecurity in the Middle East
Cloudflare’s annual deep dive into the forces shaping the cyber landscape in the Middle East reveals the sectors and vectors that threat actors are focusing on
-
Are you prepared for the next attack? The state of application security in 2024
Webinar Aligning to NIS2 cybersecurity risk-management obligations in the EU
-
NIS 2 compliance with WatchGuard Technologies
Webinar WatchGuard whitepaper - Demystifying NIS 2 requirements
-
Chicanes and tunnels
Webinar The race to securely connect remote users
-
Shielding the future
Webinar Europe's cyber threat landscape report
-
Adding reliability
Webinar By subtracting networking complexity
-
Shielding the future: Europe's cyber threat landscape report
Webinar Get a better understanding of the cybersecurity risks at play