Scrape google...

Discussion in 'General Programming Chat' started by William Rufino, Sep 28, 2011.

Tags:
  1. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    Hello There,

    Now google isnt taking the parameter num anymore to get more than 10 results...

    Has anyone seen a workaround for that?
     
  2. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    207
    Location:
    Kontiki
    Home Page:
    Yes, just turn off Google Instant from Preferences.
     
  3. webmast

    webmast Regular Member

    Joined:
    Dec 18, 2010
    Messages:
    337
    Likes Received:
    238
    Code:
    http://www.google.com/search?q=buy+viagra&num=100
    Still works for me :)


    [​IMG]
     
    Last edited: Sep 28, 2011
  4. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    207
    Location:
    Kontiki
    Home Page:
    There are more ways to do the same thing:
    Code:
    http://www.google.com/preferences
     
  5. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    I know that guys, but i'm talking about when programming, o use CURL to crawl google, and preferences save a cookie, how do i retrieve that cookie so i can also use it on curl?
     
  6. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    207
    Location:
    Kontiki
    Home Page:
    Check out the Google search API, you can use it with curl
    Code:
    http://code.google.com/apis/websearch/docs/
    Code:
    curl -e http://www.my-ajax-site.com \
    'https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Paris%20Hilton&key=INSERT-YOUR-KEY'
    
    Never mind, it seems that there is a limitation for 10 results only, when using the API.
    You can store the cookie in a file, and use it from there.
     
  7. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    the problem with the API is that it shows different results than on the web.... :/
     
  8. scriptomania

    scriptomania Junior Member

    Joined:
    Dec 28, 2010
    Messages:
    127
    Likes Received:
    250
    Occupation:
    A full time pirate at sea
    Location:
    The European capital of politics
    1. Look around through some useragents list
    2. Change yours to an old ass one (win98 era)
    3. ???
    4. Profit!
     
    • Thanks Thanks x 1
  9. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    975
    Likes Received:
    682
    Occupation:
    Web/Bot Developer
    I stopped using PHP/cURL to scrape Google. Now using Python...way more powerful and my scripts are now much less brittle.
     
    • Thanks Thanks x 1
  10. christoss1959

    christoss1959 Senior Member

    Joined:
    Nov 25, 2010
    Messages:
    894
    Likes Received:
    1,153
    Read this:
    Code:
    http://www.our-picks.com/archives/2007/01/30/google-search-urls-revealed-or-how-to-create-your-own-search-url/
     
  11. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    I use python too, but i uyse PyCURL + beautifulsoup :p

    i'll check a list of old user agents to try
     
  12. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    975
    Likes Received:
    682
    Occupation:
    Web/Bot Developer
    Creator of Beautifulsoup isn't supporting it anymore. Take a look at lxml!
     
  13. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    seriously? didn't know that! but as for now beautifulsoup is pretty awesome...

    And it was just a matter of user agent! hehe fixed it now thx guys
     
  14. xenon2010

    xenon2010 Regular Member

    Joined:
    Apr 27, 2010
    Messages:
    231
    Likes Received:
    48
    Occupation:
    web and desktop apps programmer
    Location:
    prison
    yea just use random user-agents and you will be fine...
     
  15. omega369

    omega369 BANNED BANNED

    Joined:
    Sep 20, 2015
    Messages:
    37
    Likes Received:
    17
    /Up