[Easy Trick] How to Bulk Check if URLs are Indexed in Google Sheets

MatthewGraham · Feb 6, 2018

You can scrape a lot of things with Google Sheets -- here's one of many things you can do with this. Script searches for "site:domain.com" in Google to check if there are indexed results for the input URL.

No need to run proxies since Google is scraping on your behalf and then returning the data to you.

This base code can be modified to scrape many other things as well. Working well for Reddit, as well as a number of other websites.

Step #1: Open the Script Editor in Google Sheets

Go to Tools --> Script Editor... to open the script editor.

Step #2: Copy-Paste Code into Script Editor
Make sure to save the file with Ctrl+S. The code to copy-paste is below.

Code:

function checkIfPageIsIndexed(url)
{
   url = "https://www.google.com/search?q=site:"+url;
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   var html     = response.getContentText();

   if ( html.match(/Your search -.*- did not match any documents./) )
     return "URL is Not Indexed";
   return "URL is Indexed";
}

The file should look like the above once the code is inserted.

Step #3: Run the Function in Google Sheets

The function that was created in step #2 (checkIfPageIsIndexed(url)) can be run the same as any other spreadsheet function.

Step #4: Use the Output
Bonus: Use conditional formatting to add red/green highlight to cells depending on whether or not the input URL is indexed.

Your output should look something like this.

Snoop Dawg · Feb 6, 2018

Really helpful tip, by a real practitioner and not some arm chair SEO

Good job

fasttrak · Feb 6, 2018

I do not bookmark a lot of posts but that is superb. Thank you.

Phiman · Feb 6, 2018

That's so cool! Thanks OP!

KaiserTom · Feb 7, 2018

That's really useful. Thanks!

ScribScribScrib · Feb 7, 2018

MatthewGraham said:
You can scrape a lot of things with Google Sheets -- here's one of many things you can do with this. Script searches for "site:domain.com" in Google to check if there are indexed results for the input URL.

No need to run proxies since Google is scraping on your behalf and then returning the data to you.

This base code can be modified to scrape many other things as well. Working well for Reddit, as well as a number of other websites.

Step #1: Open the Script Editor in Google Sheets

View attachment 100448
Go to Tools --> Script Editor... to open the script editor.

Step #2: Copy-Paste Code into Script Editor
Make sure to save the file with Ctrl+S. The code to copy-paste is below.

Code:

function checkIfPageIsIndexed(url) { url = "https://www.google.com/search?q=site:"+url; var options = { 'muteHttpExceptions': true, 'followRedirects': false }; var response = UrlFetchApp.fetch(url, options); var html = response.getContentText(); if ( html.match(/Your search -.*- did not match any documents./) ) return "URL is Not Indexed"; return "URL is Indexed"; }

View attachment 100449
The file should look like the above once the code is inserted.

Step #3: Run the Function in Google Sheets

View attachment 100450
The function that was created in step #2 (checkIfPageIsIndexed(url)) can be run the same as any other spreadsheet function.

Step #4: Use the Output
Bonus: Use conditional formatting to add red/green highlight to cells depending on whether or not the input URL is indexed.

View attachment 100451
Your output should look something like this.

Cool shit bro, could you do me the favor and explain the code to me?
Like I understand what it does but I want to know every little part contributes to the whole, you know what I mean

Especially this part

var options = {
'muteHttpExceptions': true,
'followRedirects': false
};
var response = UrlFetchApp.fetch(url, options);
var html = response.getContentText();

if ( html.match(/Your search -.*- did not match any documents./) )

And one more thing, how would anyone go about starting to learn stuff like this, any books/articles you would personally recommend?

terrycody · Feb 7, 2018

Damn I am wondering how much your brain can contain a lot of such weird but useful magics?!

710fla · Feb 7, 2018

Thanks man, really appreciate it! Would have never discovered this trick

MatthewGraham · Feb 7, 2018

ScribScribScrib said:
Cool shit bro, could you do me the favor and explain the code to me?
Like I understand what it does but I want to know every little part contributes to the whole, you know what I mean

Especially this part

var options = {
'muteHttpExceptions': true,
'followRedirects': false
};
var response = UrlFetchApp.fetch(url, options);
var html = response.getContentText();

if ( html.match(/Your search -.*- did not match any documents./) )

And one more thing, how would anyone go about starting to learn stuff like this, any books/articles you would personally recommend?

Added a few comments explaining the details below. You might also check out the official Google Apps Script documention here:
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app

Also, Google Apps scripts use Javascript, so any JS documentation applies to Google Apps scripts as well. This script is a bastardization of a different script that I used to bulk check page codes / bulk check whether pages return 404. And that script is a bastardization of some other person's similar script, which was a bastardization of the official Google Apps documentation, which was a bastardization of some JS documentation, which was a bastardization of... probably some Stack Exchange threads, which were in and of themselves bastardizations of... something, which was a bastardization of...

To answer the question about recommended reading / ways to learn programming/scripting, the best way is to decide what you want to do and then Google search your issue until you find some code that you can hack together into something usable. Start by building scripts and then figure out how to actually make them work as you go.

Code:

function checkIfPageIsIndexed(url)
{
   url = "https://www.google.com/search?q=site:"+url;

   // The options parameter for the function UrlFetchApp.fetch() is an optional parameter.
   // This whole section could probably be removed now that I think about it. Left it in since it was needed for a different similar script to bulk check page code.
   var options = {
   // if this is set to true, the fetch will not throw an exception if the response code
   // indicates failure, and will instead return the HTTPResponse (default: false).
   // This could probably be omitted from this script. IIRC, I originally put this in for a
   // similar script to bulk check page code (404/200/403/other/etc), which would
   // throw an error if the page failed to load with a 200 code.
    'muteHttpExceptions': true,
   // if this is set to false, the fetch not automatically follow HTTP redirects;
   // it will return the original HTTP response. The default is true. E.g., if you load an
   // http page, but the page 301s to an https version, this determines whether to follow
   // the 301 or to return the original page's data.
    'followRedirects': false
   };

   // Makes a request to fetch a URL.
   var response = UrlFetchApp.fetch(url, options);

   // From the HTML that has been retreived, gets the content of the HTTP response encoded as a string.
   // https://developers.google.com/apps-script/reference/url-fetch/http-response#getContentText()
   var html     = response.getContentText();

   // Now that the HTML is encoded as a string, it can be searched using a regex
   // https://en.wikipedia.org/wiki/Regular_expression
   // html.match() searches for all sections of the the html string that match the pattern inserted
   // If a site:url.com search returns no results, Google will instead return an error that looks like the image below this code
   // It should be something like "Your search -" + [INSERT YOUR SEARCH HERE] + "- did not match any documents"
   // the "." indicates any character in a regex.
   // A regular expression followed by an asterisk (*) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.
   // http://users.cs.cf.ac.uk/Dave.Marshall/Internet/NEWS/regexp.html
   // .* therefore means up to unlimited of any character
   // So this regex returns matches for any page that includse "Your search -" followed by up to unlimited arbitrary characters followed by "- did not match any documents."
   // Which will evaluate as being true for Google searches that did not yield any results.
   if ( html.match(/Your search -.*- did not match any documents./) )
    return "URL is Not Indexed";
   return "URL is Indexed";
}

Bonus: The Script to Bulk Check URL Status Code / Bulk Check 404 Errors that this Script was Bastardized From

Code:

// https://medium.com/@the.benhawy/how-to-use-google-spreadsheets-to-check-for-broken-links-1bb0b35c8525
function getStatusCode(url){
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   return response.getResponseCode();
}

ScribScribScrib · Feb 8, 2018

MatthewGraham said:
Added a few comments explaining the details below. You might also check out the official Google Apps Script documention here:
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app

Also, Google Apps scripts use Javascript, so any JS documentation applies to Google Apps scripts as well. This script is a bastardization of a different script that I used to bulk check page codes / bulk check whether pages return 404. And that script is a bastardization of some other person's similar script, which was a bastardization of the official Google Apps documentation, which was a bastardization of some JS documentation, which was a bastardization of... probably some Stack Exchange threads, which were in and of themselves bastardizations of... something, which was a bastardization of...

To answer the question about recommended reading / ways to learn programming/scripting, the best way is to decide what you want to do and then Google search your issue until you find some code that you can hack together into something usable. Start by building scripts and then figure out how to actually make them work as you go.

Code:

function checkIfPageIsIndexed(url) { url = "https://www.google.com/search?q=site:"+url; // The options parameter for the function UrlFetchApp.fetch() is an optional parameter. // This whole section could probably be removed now that I think about it. Left it in since it was needed for a different similar script to bulk check page code. var options = { // if this is set to true, the fetch will not throw an exception if the response code // indicates failure, and will instead return the HTTPResponse (default: false). // This could probably be omitted from this script. IIRC, I originally put this in for a // similar script to bulk check page code (404/200/403/other/etc), which would // throw an error if the page failed to load with a 200 code. 'muteHttpExceptions': true, // if this is set to false, the fetch not automatically follow HTTP redirects; // it will return the original HTTP response. The default is true. E.g., if you load an // http page, but the page 301s to an https version, this determines whether to follow // the 301 or to return the original page's data. 'followRedirects': false }; // Makes a request to fetch a URL. var response = UrlFetchApp.fetch(url, options); // From the HTML that has been retreived, gets the content of the HTTP response encoded as a string. // https://developers.google.com/apps-script/reference/url-fetch/http-response#getContentText() var html = response.getContentText(); // Now that the HTML is encoded as a string, it can be searched using a regex // https://en.wikipedia.org/wiki/Regular_expression // html.match() searches for all sections of the the html string that match the pattern inserted // If a site:url.com search returns no results, Google will instead return an error that looks like the image below this code // It should be something like "Your search -" + [INSERT YOUR SEARCH HERE] + "- did not match any documents" // the "." indicates any character in a regex. // A regular expression followed by an asterisk (*) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used. // http://users.cs.cf.ac.uk/Dave.Marshall/Internet/NEWS/regexp.html // .* therefore means up to unlimited of any character // So this regex returns matches for any page that includse "Your search -" followed by up to unlimited arbitrary characters followed by "- did not match any documents." // Which will evaluate as being true for Google searches that did not yield any results. if ( html.match(/Your search -.*- did not match any documents./) ) return "URL is Not Indexed"; return "URL is Indexed"; }

View attachment 100470

Bonus: The Script to Bulk Check URL Status Code / Bulk Check 404 Errors that this Script was Bastardized From

Code:

// https://medium.com/@the.benhawy/how-to-use-google-spreadsheets-to-check-for-broken-links-1bb0b35c8525 function getStatusCode(url){ var options = { 'muteHttpExceptions': true, 'followRedirects': false }; var response = UrlFetchApp.fetch(url, options); return response.getResponseCode(); }

wow man this is all so very cool, i had to read it like 10 times to understand anything

''Start by building scripts and then figure out how to actually make them work as you go.''

there's one script I would like to build and it could be quite useful, could you help me out with it?
I mean the moment I share the code I already have, you'll know what's up so I guess you'll get a method for free?
I'll pm you if you're fine with it.

MatthewGraham · Feb 8, 2018

ScribScribScrib said:
there's one script I would like to build and it could be quite useful, could you help me out with it?
I mean the moment I share the code I already have, you'll know what's up so I guess you'll get a method for free?
I'll pm you if you're fine with it.

Sounds good to me. Feel free to send a PM with details!

buybdl · Feb 8, 2018

Bookmarked!!!

picapiedra18 · Feb 11, 2018

Mind doing a little video how to run this script for the not so savvy guys in here.

Thanks

ThatSEO · Feb 11, 2018

If we all did this at the same time, we could DDOS their own servers lol

bArAsc · Feb 11, 2018

there is sheet already created by Greenlane called Greenlane Indexation Tester and its working too https://docs.google.com/spreadsheets/d/1AYAMWqIj6F1DRe1iwLmpVANGwiXUHRHgcYDAsG0GF_Q/copy?usp=sharing

theRevolt · Feb 11, 2018

picapiedra18 said:
Mind doing a little video how to run this script for the not so savvy guys in here.

Thanks

You can also use these very simple solution using importxml, just paste this into B1, in a1 write your domain .

=IMPORTXML(CONCAT("https://www.google.com/search?q=site:",A1),"//div[@id='resultStats']")

A1 = yoursite.com
B1 , paste formula above
C1 will show you either blank cell or X results, to actually show you how m any URLs are being indexed

simple and no scripts needed

illuminateme · Feb 11, 2018

@MatthewGraham @bArAsc thank you! no more wasted proxies!

PennyAHREF · Feb 11, 2018

Damn good man. this will make the job much easier

aidenhera · Feb 11, 2018

MatthewGraham said:
You can scrape a lot of things with Google Sheets -- here's one of many things you can do with this. Script searches for "site:domain.com" in Google to check if there are indexed results for the input URL.

No need to run proxies since Google is scraping on your behalf and then returning the data to you.

This base code can be modified to scrape many other things as well. Working well for Reddit, as well as a number of other websites.

Step #1: Open the Script Editor in Google Sheets

View attachment 100448
Go to Tools --> Script Editor... to open the script editor.

Step #2: Copy-Paste Code into Script Editor
Make sure to save the file with Ctrl+S. The code to copy-paste is below.

Code:

function checkIfPageIsIndexed(url) { url = "https://www.google.com/search?q=site:"+url; var options = { 'muteHttpExceptions': true, 'followRedirects': false }; var response = UrlFetchApp.fetch(url, options); var html = response.getContentText(); if ( html.match(/Your search -.*- did not match any documents./) ) return "URL is Not Indexed"; return "URL is Indexed"; }

View attachment 100449
The file should look like the above once the code is inserted.

Step #3: Run the Function in Google Sheets

View attachment 100450
The function that was created in step #2 (checkIfPageIsIndexed(url)) can be run the same as any other spreadsheet function.

Step #4: Use the Output
Bonus: Use conditional formatting to add red/green highlight to cells depending on whether or not the input URL is indexed.

View attachment 100451
Your output should look something like this.

Great one! I never bothered checking what is indexed, I just blasted links to it and assumed it is. But this is great and quick tool to do it.

@MatthewGraham do you have an idea how to make the script check how many results are returned?

Like:

And put the 261,000,000 into excel? Would save my life.

MatthewGraham · Feb 12, 2018

aidenhera said:
Great one! I never bothered checking what is indexed, I just blasted links to it and assumed it is. But this is great and quick tool to do it.

@MatthewGraham do you have an idea how to make the script check how many results are returned?

Like:

And put the 261,000,000 into excel? Would save my life.

I messed with it some and came up with the code below. It mostly works, but is pretty buggy.

If you can fix the bugs and make a few adjustments, it should work.

Code:

function checkIfPageIsIndexed(url)
{
   if (!url)
     return "N/A";

   url = "https://www.google.com/search?q=site:"+url;
   var options = {
     'muteHttpExceptions': true,
     'followRedirects': false
   };
   var response = UrlFetchApp.fetch(url, options);
   var html     = response.getContentText();

   if ( html.match(/Your search -.*- did not match any documents./) )
     return "URL is Not Indexed";

   // <div id="resultStats">1 result<nobr> (0.21 seconds)&nbsp;</nobr></div>
   // <div id="resultStats">About 112,000,000 results<nobr> (0.22 seconds)&nbsp;</nobr></div>
   var num_pages_indexed = false;
   if ( num_pages_indexed = html.match(/id="resultStats">.*result/) ) 
   {
     num_pages_indexed = num_pages_indexed[0] + "(s)";
     num_pages_indexed = num_pages_indexed.replace("id=\"resultStats\">","");
     // num_pages_indexed = num_pages_indexed.replace("<","");
   }
   else
     num_pages_indexed = "ERROR";
   return "URL is Indexed and returned "+num_pages_indexed;
}

Example Output

[Easy Trick] How to Bulk Check if URLs are Indexed in Google Sheets

BANNED

Attachments

BANNED

Elite Member

Regular Member

Newbie

Elite Member

Elite Member

Elite Member

BANNED

Elite Member

BANNED

Registered Member

Power Member

Supreme Member

Regular Member

Jr Vip

Power Member

Newbie

BANNED

BANNED

Main Menu

Marketplace

Making Money

BlackHat World