1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Phantomjs Google scraper using antigate

Discussion in 'Hire a Freelancer' started by terebi, Jun 17, 2016.

  1. terebi

    terebi Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 11, 2011
    Messages:
    357
    Likes Received:
    94
    Someone to code phanjom js script that can scrape a Google page and use anti-gate to solve the captcha.

    I have example code to get you started. Looking for people to bid up to $250 on this project.

    Send me message please.
     
  2. Javardo69

    Javardo69 Junior Member

    Joined:
    Jul 19, 2014
    Messages:
    106
    Likes Received:
    6
    the captcha its just a text code or its that new Recaptcha where you have to select pictures?

    I've only used phantom js to browser automation (because it is headless), but if possible i try to avoid, send me a pm message with the link and i'll tell you if i can do something or not
     
  3. terebi

    terebi Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 11, 2011
    Messages:
    357
    Likes Received:
    94
    Captcha is standard re-captcha (not selecting pictures).

    Code should be based on this:
    https://github.com/PavelPolyakov/parsing-with-php/blob/master/5_phantomjs_antigate/antigate.js

    Remember the page being visited is a Google url
     
  4. Javardo69

    Javardo69 Junior Member

    Joined:
    Jul 19, 2014
    Messages:
    106
    Likes Received:
    6
    That is a code to send a captcha to antigate and get the captcha solution text, antigate has 11 languages examples to use their API as you can see here https://anti-captcha.com/apidoc if you skip intro and go for code examples, here's for python for instance https://github.com/gotlium/antigate.

    I've made a small script to scrape the first 20 images of google images on python without the need of automating a browser just replicating the calls that the browser does, thats why i was asking for the google page that you want to scrape and what you want to be scraped. I asked if the re-captcha was that one of selecting pictures because i have no solution for that but if it is the standard with a image and answer with a text antigate service should be enough to tackle the problem.
     
  5. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    974
    Likes Received:
    680
    Occupation:
    Web/Bot Developer
    Interesting but I've never been hit rate limiting when scraping Google using headless browsers like PhantomJS or CasperJS. How many requests are you making per second? Are you using proxies? Have you correctly set all your headers?
     
  6. terebi

    terebi Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 11, 2011
    Messages:
    357
    Likes Received:
    94
    I want to scrape Google search results. But after a while you get that captcha pop up (just simple captcha). Then I need script to detect that, grab it and send to anti-gate for solving.
     
  7. terebi

    terebi Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 11, 2011
    Messages:
    357
    Likes Received:
    94
    I totally agree that it shouldn't, but here are the edge cases

    a) the current PC IP has been marked by Google already (eg running some other scraping tools)
    b) using proxies, which google is really good at banning

    Im making no more than 1 call every 30-60 seconds, so rate shouldn't be an issue.

    Can you tell me more about the headers I should be setting? So far i set
    * UserAgent
    * Encoding
    * View port size

    What other ones should I be setting?