1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrape google...

Discussion in 'General Programming Chat' started by William Rufino, Sep 28, 2011.

Tags:
  1. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    Hello There,

    Now google isnt taking the parameter num anymore to get more than 10 results...

    Has anyone seen a workaround for that?
     
  2. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    207
    Location:
    Kontiki
    Home Page:
    Yes, just turn off Google Instant from Preferences.
     
  3. webmast

    webmast Regular Member

    Joined:
    Dec 18, 2010
    Messages:
    337
    Likes Received:
    238
    Code:
    http://www.google.com/search?q=buy+viagra&num=100
    Still works for me :)


    [​IMG]
     
    Last edited: Sep 28, 2011
  4. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    207
    Location:
    Kontiki
    Home Page:
    There are more ways to do the same thing:
    Code:
    http://www.google.com/preferences
     
  5. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    I know that guys, but i'm talking about when programming, o use CURL to crawl google, and preferences save a cookie, how do i retrieve that cookie so i can also use it on curl?
     
  6. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    207
    Location:
    Kontiki
    Home Page:
    Check out the Google search API, you can use it with curl
    Code:
    http://code.google.com/apis/websearch/docs/
    Code:
    curl -e http://www.my-ajax-site.com \
    'https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Paris%20Hilton&key=INSERT-YOUR-KEY'
    
    Never mind, it seems that there is a limitation for 10 results only, when using the API.
    You can store the cookie in a file, and use it from there.
     
  7. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    the problem with the API is that it shows different results than on the web.... :/
     
  8. scriptomania

    scriptomania Junior Member

    Joined:
    Dec 28, 2010
    Messages:
    127
    Likes Received:
    249
    Occupation:
    A full time pirate at sea
    Location:
    The European capital of politics
    1. Look around through some useragents list
    2. Change yours to an old ass one (win98 era)
    3. ???
    4. Profit!
     
    • Thanks Thanks x 1
  9. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    950
    Likes Received:
    662
    Occupation:
    Web/Bot Developer
    I stopped using PHP/cURL to scrape Google. Now using Python...way more powerful and my scripts are now much less brittle.
     
    • Thanks Thanks x 1
  10. christoss1959

    christoss1959 Senior Member

    Joined:
    Nov 25, 2010
    Messages:
    894
    Likes Received:
    1,150
    Home Page:
    Read this:
    Code:
    http://www.our-picks.com/archives/2007/01/30/google-search-urls-revealed-or-how-to-create-your-own-search-url/
     
  11. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    I use python too, but i uyse PyCURL + beautifulsoup :p

    i'll check a list of old user agents to try
     
  12. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    950
    Likes Received:
    662
    Occupation:
    Web/Bot Developer
    Creator of Beautifulsoup isn't supporting it anymore. Take a look at lxml!
     
  13. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    seriously? didn't know that! but as for now beautifulsoup is pretty awesome...

    And it was just a matter of user agent! hehe fixed it now thx guys
     
  14. xenon2010

    xenon2010 Regular Member

    Joined:
    Apr 27, 2010
    Messages:
    231
    Likes Received:
    48
    Occupation:
    web and desktop apps programmer
    Location:
    prison
    Home Page:
    yea just use random user-agents and you will be fine...