What is Scrubnet?
Scrubnet is an independent experimentation hub for understanding how search crawlers and AI bots behave. We create custom feeds, observe how bots discover and revisit them, publish the resulting data and case studies, and build practical tools for technical SEO and generative engine optimisation (GEO).
Why study crawlers?
- 🤖 Search engines, AI assistants and answer engines use different crawlers with different purposes.
- 📊 Their real fetch behaviour is often hidden behind assumptions, documentation and incomplete third-party data.
- 🧪 Scrubnet provides a controlled environment where discovery, recrawling, freshness and format preferences can be observed.
Who It’s For
- 🔎 SEO and GEO practitioners: use real crawler data to inform audits, experiments and recommendations.
- 🧪 Researchers and crawler teams: examine public feeds, fetch patterns and technical findings.
- 🌐 Site owners: contribute a site for free and help broaden the research dataset.
Our Principles
- 🛡️ Independent: our work is not tied to a search engine, AI platform or vendor.
- 🔬 Evidence-led: we separate observed behaviour from hypotheses and marketing claims.
- 🌍 Open by default: feeds, live logs and findings are public wherever access and privacy allow.
What Scrubnet brings together
Scrubnet combines a growing set of custom feeds, a public crawler log dashboard, technical case studies, controlled experiments and free tools. Together they help us move beyond crawler speculation and build a clearer picture of what may improve visibility in traditional search and AI-powered discovery.
Meet ScrubberDuck
ScrubberDuck is our lightweight research crawler. It collects public content and creates the custom Scrubnet feeds used to test discovery, formats, freshness signals and recrawl behaviour.
It’s designed to minimise load, avoid unnecessary requests, and respect robots.txt.
If you see ScrubberDuck in your logs, it means your site is contributing content to a Scrubnet feed experiment.
User-Agent: ScrubberDuck/1.0 (+https://scrubnet.org)
How the research works
We add participating websites, fetch their public pages efficiently and publish consistent machine-readable feeds. We then monitor which verified and unverified bots request those feeds, when they return, which formats they choose and how requests relate to content changes.
The observations feed into case studies, technical SEO and GEO guidance, and tools such as SEO Scrubbox. Adding more varied sites increases the content load and gives the experiments a broader base.
Participation is free: add a website with up to 50,000 public URLs. There are no visibility or ranking guarantees.
Crawlers we monitor
The live dashboard records recognised search, AI and archive crawlers requesting Scrubnet feeds, including:
- Googlebot – Google Search and Discover
- Google-Extended – AI training exclusion support
- GPTBot – OpenAI’s web crawler
- ClaudeBot – Anthropic’s crawler for Claude
- PerplexityBot – Perplexity AI’s research assistant bot
- bingbot – Microsoft Bing search crawler
- BingPreview – Bing page preview bot
- CCBot – Common Crawl archive bot
- DuckDuckBot – DuckDuckGo crawler
- Applebot – Apple Siri and Spotlight crawler
Get Involved
Add a site to expand the research dataset, explore the live logs, or collaborate with us on a technical experiment or case study.
Or reach out at contact@scrubnet.org
SEO Scrubbox
A Chrome extension for technical SEO and crawler diagnostics. Compare view-source vs rendered signals, spot canonical drift, validate JSON-LD, audit sitemaps, hreflang, redirects, and crawl signals without leaving the page.