1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Bypass Google scrapping detection

Discussion in 'White Hat SEO' started by content spinner, Aug 18, 2015.

  1. content spinner

    content spinner Newbie

    Joined:
    Aug 18, 2015
    Messages:
    2
    Likes Received:
    0
    Hi deers seo-men and seo-women,
    I'm working on a Google scrapper to perform a rankchecker.

    And i would know how to scrap Google without get bust by their anti scrapping detection.
    Indeed in the case i get caught i would know if it's possible to break the captcha (if it's not expensive, of course)

    To bypass detection, i switch my user agent, waiting a random time beetween two query, delete google cookie


    Thank you very much
     
  2. puneetas3

    puneetas3 Senior Member

    Joined:
    Jan 8, 2012
    Messages:
    909
    Likes Received:
    390
    I hate to break the secret but here it is. Get any cheap $5 vps that provides /64 or smaller IPv6 block. Install debian/ubuntu and setup a random 10K to 25K IPv6 addresses from your assigned block in your network interface. Now create a php script which can read from the interface and get all the 10K to 25K IPv6. Now use curl in your php script to randomly use any one of the IPv6 you just read. Make outgoing calls with curl to google and scrape as you like. With each request a random IPv6 is used. I personally use this setup and scrape at speed of 2 page/second and it doesn't get me banned on google. You can generate new random IPv6 from your block once a week and replace the old ones in the interface.

    You can use it to scrape any IPv6 enabled site. This works great for scraping on server side. If you are using a software to scrape then you will need to do following. Edit windows host file and make it so that google.com is associated with your server's IPv4. Now whenever you make request to google.com all files will be served through your above php curl script which is listening at IPv4 address and making outgoing request with random IPv6.

    If you do lots of scraping you can just add multiple cheap vps with different /64 block.
     
    • Thanks Thanks x 7
  3. content spinner

    content spinner Newbie

    Joined:
    Aug 18, 2015
    Messages:
    2
    Likes Received:
    0
    Aw what an awesome trick !
    Thank you very much for your answer puneetas3
     
  4. SEO FOX

    SEO FOX Jr. VIP Jr. VIP

    Joined:
    Apr 27, 2015
    Messages:
    3,713
    Likes Received:
    753
    Gender:
    Male
    Location:
    Infront Of U!!
    Home Page:
    Nice way dude. thanks for info.
     
  5. Asterixpro

    Asterixpro Junior Member

    Joined:
    Jan 9, 2009
    Messages:
    133
    Likes Received:
    33
    Occupation:
    Internet Marketer SEO!
    Location:
    London
    Home Page:

    How do you take care of latency and errors?
     
  6. puneetas3

    puneetas3 Senior Member

    Joined:
    Jan 8, 2012
    Messages:
    909
    Likes Received:
    390
    Curl can handle error and output the error. If there is error make the last request again. There shouldn't be any latency issue. Its just like browsing normal site.
     
  7. Ramse

    Ramse Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 6, 2014
    Messages:
    838
    Likes Received:
    93
    Home Page:
    This neat little trick came out of nowhere. Thank you for sharing it :)
     
  8. sergey007

    sergey007 Jr. VIP Jr. VIP

    Joined:
    Nov 13, 2014
    Messages:
    900
    Likes Received:
    332
    Location:
    pbn.rocks
    Puneetas3, wow, have been looking for a way to solve the problem and it seems that you have just solved it. Thanks a lot!
     
  9. puneetas3

    puneetas3 Senior Member

    Joined:
    Jan 8, 2012
    Messages:
    909
    Likes Received:
    390
    Thought to create a video showing scraping on the vps using the above method, ofcourse if I get time.
     
  10. Mister Don

    Mister Don Junior Member

    Joined:
    May 30, 2014
    Messages:
    113
    Likes Received:
    24
    Great solution you have there. I would be eager to see that video!
     
  11. M4DM4X

    M4DM4X Regular Member

    Joined:
    Jan 21, 2015
    Messages:
    246
    Likes Received:
    35
    I'm searching for a coder who can implement this on my server for a few bucks :)
     
  12. viliche

    viliche Newbie

    Joined:
    Nov 4, 2015
    Messages:
    2
    Likes Received:
    0
    I wanted to scrape Google using a wide IPv6 range. I didn?t tested it with 10K or 25K but "only" 500 random addresses. I?ve started scrapping Google (3 firsts serps, 1 query for each) using specific useragent and headers (well, simulating a browser as much I can with Curl). Despite using a different IPv6 for each request, and using a random query interval of 2-10 seconds, I got always blocked after scraping 80 URLS or so. I really wonder what made your solution worked so fine? Any clue? (Or any question, please feel free to ask).
     
  13. epitome

    epitome Newbie

    Joined:
    Apr 7, 2013
    Messages:
    25
    Likes Received:
    3
    Home Page:
    2-10 secs probably isn't enough delay on a single ip to keep it alive for very long
     
  14. viliche

    viliche Newbie

    Joined:
    Nov 4, 2015
    Messages:
    2
    Likes Received:
    0
    I agree, it is just that puneetas3 said he scraped at speed of 2 page/second without getting banned on google using this setup. My guess is that using thousands of IPv6 equals to use a single IPv4. On my side, running a query every 15sec looks optimum when using a single IP.
     
  15. DSNYC

    DSNYC Regular Member

    Joined:
    Dec 23, 2014
    Messages:
    204
    Likes Received:
    45
    Location:
    New York, NY
    You just need to take this method, turn it into some bullshit SAAS platform, and sell it to BHW users for $39.99 / month.
     
  16. razzaguhl

    razzaguhl Newbie

    Joined:
    Jan 12, 2016
    Messages:
    22
    Likes Received:
    6
    The method http://www.blackhatworld.com/members/puneetas3.254804/ mentioned is not working for me.
    I scraped with a 45sec delay, but after a few minutes google bans the hole /64 IPv6 block and just send me errors (503)