[Easy Trick] How to Bulk Check if URLs are Indexed in Google Sheets

Discussion in 'White Hat SEO' started by MatthewGraham, Feb 6, 2018.

  1. MatthewGraham

    MatthewGraham Jr. VIP Jr. VIP

    Joined:
    Oct 6, 2015
    Messages:
    1,683
    Likes Received:
    2,226
    Gender:
    Male
    Occupation:
    Rolling Face on Keyboard
    Location:
    United States of America
    Home Page:
    You can scrape a lot of things with Google Sheets -- here's one of many things you can do with this. Script searches for "site:domain.com" in Google to check if there are indexed results for the input URL.

    No need to run proxies since Google is scraping on your behalf and then returning the data to you.

    This base code can be modified to scrape many other things as well. Working well for Reddit, as well as a number of other websites.

    Step #1: Open the Script Editor in Google Sheets
    step-1-script-editor.png
    Go to Tools --> Script Editor... to open the script editor.

    Step #2: Copy-Paste Code into Script Editor
    Make sure to save the file with Ctrl+S. The code to copy-paste is below.

    Code:
    function checkIfPageIsIndexed(url)
    {
       url = "https://www.google.com/search?q=site:"+url;
       var options = {
         'muteHttpExceptions': true,
         'followRedirects': false
       };
       var response = UrlFetchApp.fetch(url, options);
       var html     = response.getContentText();
    
       if ( html.match(/Your search -.*- did not match any documents./) )
         return "URL is Not Indexed";
       return "URL is Indexed";
    }
    
    step-2-insert-code.png
    The file should look like the above once the code is inserted.

    Step #3: Run the Function in Google Sheets

    step-3-input.png
    The function that was created in step #2 (checkIfPageIsIndexed(url)) can be run the same as any other spreadsheet function.


    Step #4: Use the Output
    Bonus: Use conditional formatting to add red/green highlight to cells depending on whether or not the input URL is indexed.

    step-4-output.png
    Your output should look something like this.

     
    • Thanks Thanks x 41
  2. Snoop Dawg

    Snoop Dawg BANNED BANNED

    Joined:
    Jan 31, 2018
    Messages:
    57
    Likes Received:
    36
    Really helpful tip, by a real practitioner and not some arm chair SEO

    Good job
     
    • Thanks Thanks x 1
  3. fasttrak

    fasttrak Jr. VIP Jr. VIP

    Joined:
    Sep 12, 2016
    Messages:
    988
    Likes Received:
    338
    Gender:
    Male
    I do not bookmark a lot of posts but that is superb. Thank you.
     
  4. Phiman

    Phiman Junior Member

    Joined:
    Oct 17, 2017
    Messages:
    191
    Likes Received:
    52
    Occupation:
    Fry cook at the Krusty Krab
    Location:
    Bikini Bottom
    That's so cool! Thanks OP!
     
  5. KaiserTom

    KaiserTom Newbie

    Joined:
    Feb 5, 2018
    Messages:
    1
    Likes Received:
    0
    That's really useful. Thanks!
     
  6. ScribScribScrib

    ScribScribScrib Jr. VIP Jr. VIP

    Joined:
    Mar 15, 2017
    Messages:
    1,170
    Likes Received:
    1,346
    Gender:
    Male
    Occupation:
    Grim Reaper
    Location:
    Your Home
    Home Page:
    Cool shit bro, could you do me the favor and explain the code to me?
    Like I understand what it does but I want to know every little part contributes to the whole, you know what I mean

    Especially this part

    var options = {
    'muteHttpExceptions': true,
    'followRedirects': false
    };
    var response = UrlFetchApp.fetch(url, options);
    var html = response.getContentText();

    if ( html.match(/Your search -.*- did not match any documents./) )


    And one more thing, how would anyone go about starting to learn stuff like this, any books/articles you would personally recommend?
     
  7. terrycody

    terrycody Elite Member

    Joined:
    Sep 29, 2012
    Messages:
    2,069
    Likes Received:
    570
    Occupation:
    marketer
    Location:
    Hell
    Damn I am wondering how much your brain can contain a lot of such weird but useful magics?!
     
  8. 710fla

    710fla Jr. VIP Jr. VIP

    Joined:
    Aug 25, 2015
    Messages:
    1,251
    Likes Received:
    534
    Thanks man, really appreciate it! Would have never discovered this trick
     
  9. MatthewGraham

    MatthewGraham Jr. VIP Jr. VIP

    Joined:
    Oct 6, 2015
    Messages:
    1,683
    Likes Received:
    2,226
    Gender:
    Male
    Occupation:
    Rolling Face on Keyboard
    Location:
    United States of America
    Home Page:
    Added a few comments explaining the details below. You might also check out the official Google Apps Script documention here:
    https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app

    Also, Google Apps scripts use Javascript, so any JS documentation applies to Google Apps scripts as well. This script is a bastardization of a different script that I used to bulk check page codes / bulk check whether pages return 404. And that script is a bastardization of some other person's similar script, which was a bastardization of the official Google Apps documentation, which was a bastardization of some JS documentation, which was a bastardization of... probably some Stack Exchange threads, which were in and of themselves bastardizations of... something, which was a bastardization of...

    To answer the question about recommended reading / ways to learn programming/scripting, the best way is to decide what you want to do and then Google search your issue until you find some code that you can hack together into something usable. Start by building scripts and then figure out how to actually make them work as you go.

    Code:
    function checkIfPageIsIndexed(url)
    {
       url = "https://www.google.com/search?q=site:"+url;
    
       // The options parameter for the function UrlFetchApp.fetch() is an optional parameter.
       // This whole section could probably be removed now that I think about it. Left it in since it was needed for a different similar script to bulk check page code.
       var options = {
       // if this is set to true, the fetch will not throw an exception if the response code
       // indicates failure, and will instead return the HTTPResponse (default: false).
       // This could probably be omitted from this script. IIRC, I originally put this in for a
       // similar script to bulk check page code (404/200/403/other/etc), which would
       // throw an error if the page failed to load with a 200 code.
        'muteHttpExceptions': true,
       // if this is set to false, the fetch not automatically follow HTTP redirects;
       // it will return the original HTTP response. The default is true. E.g., if you load an
       // http page, but the page 301s to an https version, this determines whether to follow
       // the 301 or to return the original page's data.
        'followRedirects': false
       };
    
       // Makes a request to fetch a URL.
       var response = UrlFetchApp.fetch(url, options);
    
       // From the HTML that has been retreived, gets the content of the HTTP response encoded as a string.
       // https://developers.google.com/apps-script/reference/url-fetch/http-response#getContentText()
       var html     = response.getContentText();
    
       // Now that the HTML is encoded as a string, it can be searched using a regex
       // https://en.wikipedia.org/wiki/Regular_expression
       // html.match() searches for all sections of the the html string that match the pattern inserted
       // If a site:url.com search returns no results, Google will instead return an error that looks like the image below this code
       // It should be something like "Your search -" + [INSERT YOUR SEARCH HERE] + "- did not match any documents"
       // the "." indicates any character in a regex.
       // A regular expression followed by an asterisk (*) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.
       // http://users.cs.cf.ac.uk/Dave.Marshall/Internet/NEWS/regexp.html
       // .* therefore means up to unlimited of any character
       // So this regex returns matches for any page that includse "Your search -" followed by up to unlimited arbitrary characters followed by "- did not match any documents."
       // Which will evaluate as being true for Google searches that did not yield any results.
       if ( html.match(/Your search -.*- did not match any documents./) )
        return "URL is Not Indexed";
       return "URL is Indexed";
    }
    
    google-domain-search-failure.png

    Bonus: The Script to Bulk Check URL Status Code / Bulk Check 404 Errors that this Script was Bastardized From
    Code:
    // https://medium.com/@the.benhawy/how-to-use-google-spreadsheets-to-check-for-broken-links-1bb0b35c8525
    function getStatusCode(url){
       var options = {
         'muteHttpExceptions': true,
         'followRedirects': false
       };
       var response = UrlFetchApp.fetch(url, options);
       return response.getResponseCode();
    }
    
     
    • Thanks Thanks x 3
    Last edited: Feb 7, 2018
  10. ScribScribScrib

    ScribScribScrib Jr. VIP Jr. VIP

    Joined:
    Mar 15, 2017
    Messages:
    1,170
    Likes Received:
    1,346
    Gender:
    Male
    Occupation:
    Grim Reaper
    Location:
    Your Home
    Home Page:
    wow man this is all so very cool, i had to read it like 10 times to understand anything

    ''Start by building scripts and then figure out how to actually make them work as you go.''

    there's one script I would like to build and it could be quite useful, could you help me out with it?
    I mean the moment I share the code I already have, you'll know what's up so I guess you'll get a method for free?
    I'll pm you if you're fine with it.
     
  11. MatthewGraham

    MatthewGraham Jr. VIP Jr. VIP

    Joined:
    Oct 6, 2015
    Messages:
    1,683
    Likes Received:
    2,226
    Gender:
    Male
    Occupation:
    Rolling Face on Keyboard
    Location:
    United States of America
    Home Page:
    Sounds good to me. Feel free to send a PM with details!
     
  12. buybdl

    buybdl Registered Member

    Joined:
    Aug 3, 2015
    Messages:
    77
    Likes Received:
    11
    Bookmarked!!! ;)
     
  13. picapiedra18

    picapiedra18 Regular Member

    Joined:
    Jan 30, 2009
    Messages:
    420
    Likes Received:
    247
    Gender:
    Male
    Location:
    La Perla del Pacifico
    Mind doing a little video how to run this script for the not so savvy guys in here.

    Thanks
     
  14. ThatSEO

    ThatSEO Jr. VIP Jr. VIP

    Joined:
    Jan 22, 2016
    Messages:
    1,056
    Likes Received:
    647
    Gender:
    Male
    Occupation:
    Self employed marketing stuff
    Location:
    Sometimes UK
    If we all did this at the same time, we could DDOS their own servers lol
     
    • Thanks Thanks x 1
  15. bArAsc

    bArAsc Regular Member

    Joined:
    Jun 2, 2015
    Messages:
    235
    Likes Received:
    83
    Gender:
    Male
    • Thanks Thanks x 3
  16. theRevolt

    theRevolt Jr. VIP Jr. VIP

    Joined:
    Jul 29, 2009
    Messages:
    2,130
    Likes Received:
    875
    Occupation:
    Click below to find out
    Location:
    CPA Money
    Home Page:
    You can also use these very simple solution using importxml, just paste this into B1, in a1 write your domain .

    =IMPORTXML(CONCAT("https://www.google.com/search?q=site:",A1),"//div[@id='resultStats']")

    A1 = yoursite.com
    B1 , paste formula above
    C1 will show you either blank cell or X results, to actually show you how m any URLs are being indexed

    simple and no scripts needed
     
    • Thanks Thanks x 3
  17. illuminateme

    illuminateme Jr. VIP Jr. VIP

    Joined:
    Feb 24, 2016
    Messages:
    424
    Likes Received:
    213
    Occupation:
    human meat bag
    Location:
    the universe
  18. PennyAHREF

    PennyAHREF Newbie

    Joined:
    Jul 16, 2017
    Messages:
    27
    Likes Received:
    9
    Gender:
    Female
    Damn good man. this will make the job much easier
     
  19. aidenhera

    aidenhera Elite Member

    Joined:
    Nov 30, 2016
    Messages:
    2,661
    Likes Received:
    686
    Gender:
    Male
    Great one! I never bothered checking what is indexed, I just blasted links to it and assumed it is. But this is great and quick tool to do it.

    @MatthewGraham do you have an idea how to make the script check how many results are returned?

    Like:

    [​IMG]

    And put the 261,000,000 into excel? Would save my life.
     
  20. MatthewGraham

    MatthewGraham Jr. VIP Jr. VIP

    Joined:
    Oct 6, 2015
    Messages:
    1,683
    Likes Received:
    2,226
    Gender:
    Male
    Occupation:
    Rolling Face on Keyboard
    Location:
    United States of America
    Home Page:
    I messed with it some and came up with the code below. It mostly works, but is pretty buggy.

    If you can fix the bugs and make a few adjustments, it should work.

    Code:
    function checkIfPageIsIndexed(url)
    {
       if (!url)
         return "N/A";
    
       url = "https://www.google.com/search?q=site:"+url;
       var options = {
         'muteHttpExceptions': true,
         'followRedirects': false
       };
       var response = UrlFetchApp.fetch(url, options);
       var html     = response.getContentText();
    
       if ( html.match(/Your search -.*- did not match any documents./) )
         return "URL is Not Indexed";
    
       // <div id="resultStats">1 result<nobr> (0.21 seconds)&nbsp;</nobr></div>
       // <div id="resultStats">About 112,000,000 results<nobr> (0.22 seconds)&nbsp;</nobr></div>
       var num_pages_indexed = false;
       if ( num_pages_indexed = html.match(/id="resultStats">.*result/) ) 
       {
         num_pages_indexed = num_pages_indexed[0] + "(s)";
         num_pages_indexed = num_pages_indexed.replace("id=\"resultStats\">","");
         // num_pages_indexed = num_pages_indexed.replace("<","");
       }
       else
         num_pages_indexed = "ERROR";
       return "URL is Indexed and returned "+num_pages_indexed;
    }
    
    Example Output
    page-index-buggy-code.png
     
    • Thanks Thanks x 2