1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to stopping getting blocked by Google when using site: operator?

Discussion in 'BlackHat Lounge' started by jamie3000, Jul 11, 2016.

  1. jamie3000

    jamie3000 Supreme Member

    Joined:
    Jun 30, 2014
    Messages:
    1,414
    Likes Received:
    655
    Occupation:
    Finance coder looking for semi-retirement
    Location:
    uk
    So I'm scraping to check if websites are indexed on google using the URL below with the site search operator, but Google seems to block me after about 30 queries even when I'm leaving a 1min ish gap between requests with with 10+ seconds of random gap too.

    https://www.google.com/search?btnG=1&pws=0&q=site:"+ HttpUtility.UrlEncode(_website) + "&gws_rd=ssl

    Is this normal? If so any idea how I can get around this?

    Thanks! :)
     
  2. sumitgupta1992

    sumitgupta1992 Junior Member

    Joined:
    Jul 20, 2013
    Messages:
    176
    Likes Received:
    26
    Gender:
    Male
    Multiple high quality rotating proxies are the only solution.Search in the marketplace, there are a lot of high quality proxy providers.
     
  3. jamie3000

    jamie3000 Supreme Member

    Joined:
    Jun 30, 2014
    Messages:
    1,414
    Likes Received:
    655
    Occupation:
    Finance coder looking for semi-retirement
    Location:
    uk
    Ummm its for a desktop application so I was hoping I could avoid having to get users to load in proxies. That 1 min delay and I'm still getting blocked just seems massive though, maybe I'll try a headerless browser.
     
  4. sumitgupta1992

    sumitgupta1992 Junior Member

    Joined:
    Jul 20, 2013
    Messages:
    176
    Likes Received:
    26
    Gender:
    Male
    Without proxies,I'd doubt its viability.
     
  5. jamie3000

    jamie3000 Supreme Member

    Joined:
    Jun 30, 2014
    Messages:
    1,414
    Likes Received:
    655
    Occupation:
    Finance coder looking for semi-retirement
    Location:
    uk
    Probably right, but I will experiment and post for others if I have any success :)
     
  6. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    588
    IT is not you. I can use the 'site' operator in a browser, scroll down the page and not find what I am looking for, change to the next page. Generally by page three a captcha is thrown up.
     
  7. jamie3000

    jamie3000 Supreme Member

    Joined:
    Jun 30, 2014
    Messages:
    1,414
    Likes Received:
    655
    Occupation:
    Finance coder looking for semi-retirement
    Location:
    uk
    ummm well thats annoying! I'm going to try a different google URL and rotating user agents, might help...
     
  8. 5zz

    5zz Newbie

    Joined:
    May 23, 2016
    Messages:
    45
    Likes Received:
    15
    If you're doing THE SAME query, even every 10 minutes, it will seem weird for Google's algorithms.
    So even with lots of proxies, you'd probably get blocked.
     
  9. jamie3000

    jamie3000 Supreme Member

    Joined:
    Jun 30, 2014
    Messages:
    1,414
    Likes Received:
    655
    Occupation:
    Finance coder looking for semi-retirement
    Location:
    uk
    I did think of mixing it up a bit but then that's my request frequently double / trippled