Google’s 2 MB Crawl Limit and the Machine Web, Scrubnet

Google clarified a major crawling constraint. When Googlebot crawls a page for Search, it only fetches the first 2 MB of each supported resource. This is down from older, looser limits and it applies strictly to uncompressed data.

This is not a theoretical guideline. It is a hard cutoff. Once the limit is reached, the fetch stops and only the retrieved portion is considered for indexing and rendering.

The change in practical terms

The important detail is that the 2 MB limit applies per fetched resource, not per page.

That means:

The HTML document has its own 2 MB limit
Each JavaScript file has its own 2 MB limit
Each CSS file has its own 2 MB limit
Each referenced asset is fetched independently

Google does not pool these together. There is no overall page allowance. Each request is bounded on its own.

Why this matters more than it sounds

Many modern sites transfer ten to twenty megabytes per page without issue. That alone is not a problem. What matters is the size of individual resources and where important content appears.

If the HTML exceeds 2 MB, anything beyond that point is invisible to Google. If a JavaScript bundle exceeds 2 MB, Google executes only a partial file. Critical rendering logic, injected content, or internal links may never be reached.

Compression does not help here. The limit applies after decompression. A heavily compressed file can still exceed the cutoff once inflated.

The role of access logs

In most setups, this cutoff is observable. When Googlebot reaches its internal limit, it aborts the connection. The server stops sending data.

With proper access logging, you can see:

The exact number of bytes actually sent
Requests that terminate early
Responses that cluster around the 2 MB boundary

This moves the discussion from documentation to evidence. You can see what Google actually consumed, not what the page was meant to deliver.

How Scrubnet aligns with this reality

Scrubnet was built for constrained, deterministic crawlers. Not browsers. Not humans. Agents.

Every Scrubnet feed is:

Compact and predictable in size
Clean HTML with no rendering dependency
Free from oversized scripts and layout noise
Designed to be fully consumable in a single fetch

This ensures that crawlers never hit byte ceilings before reaching meaningful content. What is published is what is read.

What this signals about the future

Google’s clarification is not a regression. It is an admission. Crawlers need firm limits to operate at scale. Rendering everything is expensive. Guessing intent is risky.

The web is splitting into layers. A human web full of interaction and presentation. A machine web focused on clarity, structure, and bounded cost.

Scrubnet exists in the second layer.

Key takeaways

The 2 MB limit applies per resource, not per page
Oversized HTML and JS can silently lose content
Access logs reveal real crawl behaviour
Machine first publishing reduces uncertainty
Clean feeds are future proof by design

Build for crawlers, not assumptions

Scrubnet helps brands publish content that bots can fully consume, trust, and reuse. No truncation. No guesswork. No rendering gamble.

Add a Site Explore Research Feeds