⭐️⭐️ Googlebot Will Crawl and Index The First 15MB of Content Per Page

RealDaddy

Repeatedly violating rules
Joined
Jun 30, 2018
Messages
9,017
Reaction score
11,292
Source - Search Engine Land

Google has updated its help document on Googlebot to specify that Googlebot will crawl up to the first 15MB of the page and then stop. So if you want to ensure that Google ranks your page appropriately, make sure Googlebot can crawl and index that part of the page within the first 15MB.

What is new. In the https://developers.google.com/search/docs/advanced/crawling/googlebot, Google added this section that reads:

Googlebot can crawl the first 15MB of content in an HTML file or https://support.google.com/webmasters/answer/35287. After the first 15MB of the file, Googlebot stops crawling and only considers the first 15MB of content for indexing.

- In general, you probably want to keep your pages pretty light for both users and search engine crawlers. But here Google is being very clear about how much Googlebot will consume from your page.

- A good way to test this is to use the URL Inspection tool in Google Search Console and see what parts of the page Google renders and sees within the debugging tool.
 
Thanks for this valuable information
 
I don't trust them. Because of the complexity of the algorithm, even Google Developers can't fully understand how it functions. In such case, we can't believe their assumptions blindly.
 
This can easily be tested. Make a few 14-20 Mb pages and see how much data Google fetches, by checking the logs. If is more than 15 Mb, then Google lied.
 
This can easily be tested. Make a few 14-20 Mb pages and see how much data Google fetches, by checking the logs. If is more than 15 Mb, then Google lied.
I think they are lying (without even testing it, I just know these people) :D
 
I think they are lying (without even testing it, I just know these people) :D
Just open any big news website page and check and voila their lie is caught . They just scare people don’t fall for the trap
 
Just open any big news website page and check and voila their lie is caught . They just scare people don’t fall for the trap

No, this 15 MB does not include images, ads, JS, CSS etc. This 15 MB is applicable only to the core HTML.

They do mention this

Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately.
 
Googlebot will crawl up to the first 15MB of the page and then stop
It's rare for a single web page to be upto 15MB if the content are mostly texts and images. A 15MB web page is actually heavy.

I also don't see the difference it would make for most websites.
 
15MB pure HTML is a lot IMO
 
In other words, have a light weight speedy site with good internal linking to maximize whatever crawl budget they allocate each site.


If your target pages aren't getting indexed, then perhaps make a more direct internal linking from the pages that get the most frequent crawls.
 
Pages of 15MB pure code???

Smells like poop to me.
 
Back
Top