[Easy Trick] How to Bulk Check if URLs are Indexed in Google Sheets

aidenhera

BANNED
Joined
Nov 30, 2016
Messages
3,059
Reaction score
913
I messed with it some and came up with the code below. It mostly works, but is pretty buggy.

If you can fix the bugs and make a few adjustments, it should work.

Code:
function checkIfPageIsIndexed(url)
{
   if (!url)
     return "N/A";

   url = "https://www.google.com/search?q=site:"+url;
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   var html     = response.getContentText();

   if ( html.match(/Your search -.*- did not match any documents./) )
     return "URL is Not Indexed";

   // <div id="resultStats">1 result<nobr> (0.21 seconds)&nbsp;</nobr></div>
   // <div id="resultStats">About 112,000,000 results<nobr> (0.22 seconds)&nbsp;</nobr></div>
   var num_pages_indexed = false;
   if ( num_pages_indexed = html.match(/id="resultStats">.*result/) )
   {
     num_pages_indexed = num_pages_indexed[0] + "(s)";
     num_pages_indexed = num_pages_indexed.replace("id=\"resultStats\">","");
     // num_pages_indexed = num_pages_indexed.replace("<","");
   }
   else
     num_pages_indexed = "ERROR";
   return "URL is Indexed and returned "+num_pages_indexed;
}

Example Output
View attachment 100611


I have my code already :) took me half a day, because I dont know javascript nor google apps script. but yours looks more smooth.


Code:
function checkIfPageIsIndexed(url)
{
  url = "https://www.google.com/search?q="+url;
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   var html     = response.getContentText();

  var arr = html.match(/resultStats">.*? result.../);
  return(arr);

}

it simply takes html and returns regex.

its based on your indexing checking code. I noticed that sometimes your code shows "INDEXED" if the string is not found, but sometimes the search is probably not performed because it showed me indexed many times while it was not.

Same with the edition I made. Sometimes it wont show anything, even if it should be there. Maybe google blocks searches from google sheets (just as it blocks from importxml function).

What bugs did you mean? Ill try your code
 

MatthewGraham

BANNED
Joined
Oct 6, 2015
Messages
1,759
Reaction score
2,753
Website
www.youtube.com
its based on your indexing checking code. I noticed that sometimes your code shows "INDEXED" if the string is not found, but sometimes the search is probably not performed because it showed me indexed many times while it was not.
What bugs did you mean? Ill try your code

Yes, that's pretty much the same bug I was referring to. Needs better error handling or similar to deal with that case or retry. The code that was mentioned by the user in the quote below might have better error handling in that regard. Might be worth checking out:
  • there is sheet already created by Greenlane called Greenlane Indexation Tester and its working too
  • https://docs.google.com/spreadsheets/d/1AYAMWqIj6F1DRe1iwLmpVANGwiXUHRHgcYDAsG0GF_Q/copy?usp=sharing
 

aidenhera

BANNED
Joined
Nov 30, 2016
Messages
3,059
Reaction score
913
Yes, that's pretty much the same bug I was referring to. Needs better error handling or similar to deal with that case or retry. The code that was mentioned by the user in the quote below might have better error handling in that regard. Might be worth checking out:
  • there is sheet already created by Greenlane called Greenlane Indexation Tester and its working too
  • https://docs.google.com/spreadsheets/d/1AYAMWqIj6F1DRe1iwLmpVANGwiXUHRHgcYDAsG0GF_Q/copy?usp=sharing
I did already, although this code looks big and really complicated. and its also much slower than your code. But I will have a look again. thanx again you saved my life with this thread : - )


theres a lot of fancy things to do within google sheets. i consider learning javascript and google apps script
 

MatthewGraham

BANNED
Joined
Oct 6, 2015
Messages
1,759
Reaction score
2,753
Website
www.youtube.com
This should be consistently working for checking the number of indexed results for a query (not a URL, but you can make a slight adjustment either to 4th line of that function or just have your input cell include the "site:" directly).

Works great to help determine keyword difficulty. Might bastardize it to check for inurl and intitle results as well.

Code:
function checkNumResults(query)
{
   if (!query)
     return "N/A";
   
   var url = "https://www.google.com/search?q="+query;
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   var html     = response.getContentText();

   if ( html.match(/Your search -.*- did not match any documents./) )
     return "URL is Not Indexed";

   // <div id="resultStats">1 result<nobr> (0.21 seconds)&nbsp;</nobr></div>
   // <div id="resultStats">About 112,000,000 results<nobr> (0.22 seconds)&nbsp;</nobr></div>
   var num_pages_indexed = false;
   if ( num_pages_indexed = html.match(/id="resultStats">.*result/) ) 
   {
     num_pages_indexed = num_pages_indexed[0] + "(s)";
     num_pages_indexed = num_pages_indexed.replace("id=\"resultStats\">","");
     // num_pages_indexed = num_pages_indexed.replace("<","");

     // Remove all after "result"
     num_pages_indexed = num_pages_indexed.replace(/result.*/,""); 

     // Remove non-numeric from num_pages_indexed
     num_pages_indexed = num_pages_indexed.replace(/\D/g,'');
   }
   else
     num_pages_indexed = "ERROR";
   
   return parseInt(num_pages_indexed);
}
 

RayChand

Elite Member
Joined
Aug 3, 2010
Messages
1,530
Reaction score
546
Nice tips, but since now it is on BHW very soon it is going to be removed from google
 

Javicuse

Junior Member
Joined
Jan 26, 2018
Messages
139
Reaction score
123

MatthewGraham

BANNED
Joined
Oct 6, 2015
Messages
1,759
Reaction score
2,753
Website
www.youtube.com
Check out this sweet-ass alternative that is substantially easier to use with way fewer bugs.

Copy-paste this into the script editor under "Tools --> Script Editor":
Code:
function getStatusCode(url){
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': true
   };
   var response = UrlFetchApp.fetch(url, options);
   return response.getResponseCode();
}
Call this function to check if a URL is indexed (where A2 contains the URL that you want to check):

  • =getStatusCode("https://webcache.googleusercontent.com/search?q=cache%3A"&A2)

Code 200 = Page has been cached by Google (and indexed)
Code 404 = Page has not been cached by Google (and is not indexed)
Code 503 = Google's cache page timed out or otherwise didn't respond. Page or may not be cached. Wait or delete and redrag the formula and generally fixes itself.
 

itmark

Junior Member
Joined
Aug 4, 2008
Messages
190
Reaction score
35
Age
48
Thanks this is very helpfull script.

Anyone know how to extract the title of the first found indexed url? Does not matter if its a sub site.

I would be very grateful...

Like this: http://prntscr.com/j9w1j9
 

terrycody

Jr. VIP
Jr. VIP
Joined
Sep 29, 2012
Messages
5,168
Reaction score
2,489
Sorry to bump, I bookmarked this months ago and now I need this lol.

I use several hours to figure out how to use this, watch a video, bla bla, no idea if I did the right thing, but here's what I did:


first paste the URLs I want to check
at first URL right column, I type "=" then I input the function name (in this example, the function name is "checkIfPageIsIndexed()" , at this step, I use mouse to click left column (first URL), and then press enter

it worked and showed indexed!

And I use old school trick to drag first column to the bottom of all my URLs, then some URL showed indexed, while some showed noindexed, but as others said, there are some bugs when running, sometimes it shows right results, sometimes not, no idea how to overcome this.

I also tried others said URL and it worked like a charm, but apparently, that one is much more complicated for me.

https://docs.google.com/spreadsheet...rKH0TW-TLKju4qSk2r1ToOEVc/edit#gid=1274402483

At the last, thanks for sharing good things to us, I bookmarked all your valuable threads :D

Update: sorry I tried this too, seems still got bugs?! all show 200, while some pages not indexed for sure....
 
Last edited:

hajroboss

Junior Member
Joined
Aug 11, 2015
Messages
122
Reaction score
44
This is amazing, will need it very soon to check about 300k links when I publish my website. :D Thanks
 

mmulder1985

Regular Member
Joined
Dec 16, 2008
Messages
211
Reaction score
45
You can scrape a lot of things with Google Sheets -- here's one of many things you can do with this. Script searches for "site:domain.com" in Google to check if there are indexed results for the input URL.

No need to run proxies since Google is scraping on your behalf and then returning the data to you.

This base code can be modified to scrape many other things as well. Working well for Reddit, as well as a number of other websites.

Step #1: Open the Script Editor in Google Sheets
View attachment 100448
Go to Tools --> Script Editor... to open the script editor.

Step #2: Copy-Paste Code into Script Editor
Make sure to save the file with Ctrl+S. The code to copy-paste is below.

Code:
function checkIfPageIsIndexed(url)
{
   url = "https://www.google.com/search?q=site:"+url;
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   var html     = response.getContentText();

   if ( html.match(/Your search -.*- did not match any documents./) )
     return "URL is Not Indexed";
   return "URL is Indexed";
}

View attachment 100449
The file should look like the above once the code is inserted.

Step #3: Run the Function in Google Sheets

View attachment 100450
The function that was created in step #2 (checkIfPageIsIndexed(url)) can be run the same as any other spreadsheet function.


Step #4: Use the Output
Bonus: Use conditional formatting to add red/green highlight to cells depending on whether or not the input URL is indexed.

View attachment 100451
Your output should look something like this.

I tried this but it always gives me back that it is indexed. When I debug it returns me that it is not indexed but when I do it directly from the function in google sheets it always gives back that it is indexed even if I fill in rubish.
 

rideronjourney

Regular Member
Joined
Jul 3, 2016
Messages
223
Reaction score
72
I have tried OP method and verified it with results, It does NOT show accurate results.

Some urls are not indexed and the excel function shows it is indexed.

You can verify results by two ways:

#1 - Manual search for that url in Google, If the Google listing shows with Title, Description and Same url, then it is indexed.
#2 - Login to Google Webmaster Tools, Go to Status > Index Coverage, Click the url you want to check and on right hand side new box will appear, From the new box options select "Inspect URL".

Note : Above Steps are for new GWT layout not old one.
 

moneyflower

Power Member
Joined
Jan 22, 2016
Messages
594
Reaction score
197
Website
vcc.is
Sorry to bump, I bookmarked this months ago and now I need this lol.


Update: sorry I tried this too, seems still got bugs?! all show 200, while some pages not indexed for sure....

Were you able to find any fix for the bug?
 

terrycody

Jr. VIP
Jr. VIP
Joined
Sep 29, 2012
Messages
5,168
Reaction score
2,489
Were you able to find any fix for the bug?

I didnt use that thing for a long time, but I guess the bug is still exist, for some weird reasons, I dont care about the index anymore, because if you write good contents, your articles will finally indexed. Try to sumbit your sitemap to search console, it helps a bit.
 

moneyflower

Power Member
Joined
Jan 22, 2016
Messages
594
Reaction score
197
Website
vcc.is
I didnt use that thing for a long time, but I guess the bug is still exist, for some weird reasons, I dont care about the index anymore, because if you write good contents, your articles will finally indexed. Try to sumbit your sitemap to search console, it helps a bit.

Thanks for your reply. I wanted to use it to check for some web 2.0 I ordered. Search console might not really be possible in this case.
 
Top