Scrubnet
The Internetβs Clean Layer for AI Systems
What is Scrubnet?
Scrubnet is a machine-readable layer of the web designed from the ground up for AI agents and LLMs. It hosts optimised, structured data formats with no UX bloat. Just clean, fast, and purposeful content.
Why Now?
- ποΈ Humans no longer consume most web content. Bots do.
- π¦ Traditional websites are bloated, redundant, and slow for bots.
- β‘ Scrubnet provides direct access to structured, validated data optimised for crawling, indexing, and training.
Who It's For
- π€ LLM platforms and AI agents looking for faster, cleaner access to knowledge.
- π’ Brands seeking visibility in AI-powered discovery systems.
- π Researchers and engineers building the next generation of AI infrastructure.
Our Principles
- π‘οΈ Neutral by design: Scrubnet is independent and unaffiliated with any AI platform.
- βοΈ Machine-first: Built for bots, not browsers.
- π Transparency: Every data point is timestamped, traceable, and documented.
The Future We See
As AI replaces traditional search, Scrubnet becomes the structured foundation beneath it β a frictionless, signal-rich web layer tuned for intelligent systems. We're not just adapting to change. We're building what comes next.
Allowed Bots
Scrubnet is designed for trustworthy AI agents and search crawlers. The following bots are explicitly allowed access to our structured data endpoints among others:
- Googlebot β Google Search and Discover
- Google-Extended β AI training exclusion support
- GPTBot β OpenAI's web crawler
- ClaudeBot β Anthropic's crawler for Claude
- PerplexityBot β Perplexity AI's research assistant bot
- bingbot β Microsoft's Bing search crawler
- BingPreview β Bing's page preview bot
- CCBot β Common Crawl archive bot
- DuckDuckBot β DuckDuckGo's search engine crawler
- Applebot β Appleβs Siri and Spotlight crawler
Only trustworthy agents, when verified from their official IP ranges, are allowed to access protected data paths.
Meet ScrubberDuck
ScrubberDuck is our lightweight web crawler, designed to extract clean, structured data from public pages to help large language models (LLMs) access better content.
It quietly visits websites, avoids unnecessary load, and respects all robots.txt
rules.

If you see ScrubberDuck in your logs, rest assured β itβs just cleaning up the noise for the future of the web. Thanks for letting us pass through.
User-Agent: ScrubberDuck/1.0 (+https://scrubnet.org) Clean web noise since 2025
Get Involved
Want your data included? Or are you building an AI that needs structured feeds? Reach out at [email protected]