Parse google with python...

Discussion in 'General Programming Chat' started by William Rufino, Aug 18, 2011.

Tags:
  1. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    Hello everyone,

    I'm trying to make a bot, but i'm getting a little problem with pycurl and google is giving me a 400 error code, but when i do it in php it actually works.

    I'm even using the same user agent.. I already have it in php but i wanna port it to python to use multi thread
     
  2. Mosquera

    Mosquera Newbie

    Joined:
    Feb 24, 2009
    Messages:
    32
    Likes Received:
    18
    Any more info? There are times where google says you're a bot and asks for you to complete a captcha.

    I'm not sure what error code it returns in those cases, but it could be 400. I don't remember though, I would check everything before assuming something.
     
  3. confined

    confined Regular Member

    Joined:
    Jan 4, 2009
    Messages:
    216
    Likes Received:
    91
    it has to do with scraping frequency, proxies and random sleeps help with that.
     
  4. wu1239

    wu1239 Newbie

    Joined:
    Jun 4, 2011
    Messages:
    16
    Likes Received:
    0
    send us more info and we will help you
    limited detail won't help you
     
  5. Xooor

    Xooor Newbie

    Joined:
    Aug 14, 2011
    Messages:
    18
    Likes Received:
    17
    I've got a working Google scraping bot of mine up, I wrote it in PHP using CURL and switches between 25 paid proxies that I have to avoid getting banned by Google.

    I also am a great fan of python, I would recommend looking into urllib2 and BeautifulSoup, which are two brilliant python libs I use for writing bots.

    If you need any help say so. Wish you good luck