1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Question for a google query whizz

Discussion in 'BlackHat Lounge' started by LukesDad, Dec 6, 2012.

  1. LukesDad

    LukesDad Junior Member

    Joined:
    Oct 24, 2009
    Messages:
    135
    Likes Received:
    71
    Location:
    Düsseldorf
    Home Page:
    Hi,

    when I checked the domain of a new client on google (site:domain.tld) I got a list of 37 pages, all of them with the info that they could not be crawled because of an idiotic robots.txt
    This is going to be easy money :)

    I was very surprised that google knows the pages that it must not crawl!

    Can anyone come up with a query to find more domains that google must not crawl?
     
  2. upl8t

    upl8t Regular Member

    Joined:
    Apr 9, 2008
    Messages:
    475
    Likes Received:
    84
    Location:
    New Scotland
    Great question. I've been seeing these robots.txt error messages (can't be crawled) more and more. Can't think of a query that would work... but you could scrape the serps for all kinds of keywords and then parse the results looking for that error message. Interesting opportunity.