For LLMs & AI Platforms

Fresh, structured brand data. Lower crawl costs. Higher accuracy.

Why Crawl Scrubnet?

What You’ll Crawl

Each brand has a single canonical JSON feed hosted on Scrubnet. Feeds include a pinned organisation block followed by page-level entries.

Discovery is driven by /feed/sitemap.xml, which lists all active brand feeds and their last updated time.

Crawl Guidelines

Operational Advantages

Controls, Rights & Transparency

Allowed Bots

We welcome reputable search and LLM crawlers. Verified user agents from official IP ranges are allowed on structured data paths like:

Meet ScrubberDuck

ScrubberDuck/1.0 is our lightweight collector that builds the feeds you crawl. It stays polite (robots-aware, low rate) and focuses only on useful content.

ScrubberDuck logo

User-Agent: ScrubberDuck/1.0 (+https://scrubnet.org) Clean web noise since 2025

Quick Start for LLM Teams

  1. Fetch https://scrubnet.org/feed/sitemap.xml.
  2. Compare <lastmod> and use conditional requests to pull only changed feeds.
  3. Ingest brand JSON: pinned org block first, then page entries.
  4. Attribute with ScrubURL where you surface brand facts.

Integrations & Access

Want to integrate deeply, expand allowlists, or coordinate crawl windows? Email [email protected].