Google’s 2 MB Crawl Limit
Why byte budgets matter and how Scrubnet fits the future of crawling
Google clarified a major crawling constraint. When Googlebot crawls a page for Search, it only fetches the first 2 MB of each supported resource. This is down from older, looser limits and it applies strictly to uncompressed data.
This is not a theoretical guideline. It is a hard cutoff. Once the limit is reached, the fetch stops and only the retrieved portion is considered for indexing and rendering.
The change in practical terms
The important detail is that the 2 MB limit applies per fetched resource, not per page.
That means:
- The HTML document has its own 2 MB limit
- Each JavaScript file has its own 2 MB limit
- Each CSS file has its own 2 MB limit
- Each referenced asset is fetched independently
Google does not pool these together. There is no overall page allowance. Each request is bounded on its own.
Why this matters more than it sounds
Many modern sites transfer ten to twenty megabytes per page without issue. That alone is not a problem. What matters is the size of individual resources and where important content appears.
If the HTML exceeds 2 MB, anything beyond that point is invisible to Google. If a JavaScript bundle exceeds 2 MB, Google executes only a partial file. Critical rendering logic, injected content, or internal links may never be reached.
Compression does not help here. The limit applies after decompression. A heavily compressed file can still exceed the cutoff once inflated.
The role of access logs
In most setups, this cutoff is observable. When Googlebot reaches its internal limit, it aborts the connection. The server stops sending data.
With proper access logging, you can see:
- The exact number of bytes actually sent
- Requests that terminate early
- Responses that cluster around the 2 MB boundary
This moves the discussion from documentation to evidence. You can see what Google actually consumed, not what the page was meant to deliver.
How Scrubnet aligns with this reality
Scrubnet was built for constrained, deterministic crawlers. Not browsers. Not humans. Agents.
Every Scrubnet feed is:
- Compact and predictable in size
- Clean HTML with no rendering dependency
- Free from oversized scripts and layout noise
- Designed to be fully consumable in a single fetch
This ensures that crawlers never hit byte ceilings before reaching meaningful content. What is published is what is read.
What this signals about the future
Google’s clarification is not a regression. It is an admission. Crawlers need firm limits to operate at scale. Rendering everything is expensive. Guessing intent is risky.
The web is splitting into layers. A human web full of interaction and presentation. A machine web focused on clarity, structure, and bounded cost.
Scrubnet exists in the second layer.
Key takeaways
- The 2 MB limit applies per resource, not per page
- Oversized HTML and JS can silently lose content
- Access logs reveal real crawl behaviour
- Machine first publishing reduces uncertainty
- Clean feeds are future proof by design
Build for crawlers, not assumptions
Scrubnet helps brands publish content that bots can fully consume, trust, and reuse. No truncation. No guesswork. No rendering gamble.