1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

Scrape google...

Discussion in 'General Programming Chat' started by William Rufino, Sep 28, 2011.

Tags:
Thread Status:
Not open for further replies.
  1. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    Hello There,

    Now google isnt taking the parameter num anymore to get more than 10 results...

    Has anyone seen a workaround for that?
     
  2. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    208
    Location:
    Kontiki
    Home Page:
    Yes, just turn off Google Instant from Preferences.
     
  3. webmast

    webmast Regular Member

    Joined:
    Dec 18, 2010
    Messages:
    337
    Likes Received:
    238
    Code:
    http://www.google.com/search?q=buy+viagra&num=100
    Still works for me :)


    [​IMG]
     
    Last edited: Sep 28, 2011
  4. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    208
    Location:
    Kontiki
    Home Page:
    There are more ways to do the same thing:
    Code:
    http://www.google.com/preferences
     
  5. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    I know that guys, but i'm talking about when programming, o use CURL to crawl google, and preferences save a cookie, how do i retrieve that cookie so i can also use it on curl?
     
  6. scraper1

    scraper1 Regular Member

    Joined:
    May 28, 2011
    Messages:
    214
    Likes Received:
    208
    Location:
    Kontiki
    Home Page:
    Check out the Google search API, you can use it with curl
    Code:
    http://code.google.com/apis/websearch/docs/
    Code:
    curl -e http://www.my-ajax-site.com \
    'https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Paris%20Hilton&key=INSERT-YOUR-KEY'
    
    Never mind, it seems that there is a limitation for 10 results only, when using the API.
    You can store the cookie in a file, and use it from there.
     
    • Thanks Thanks x 1
  7. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    the problem with the API is that it shows different results than on the web.... :/
     
  8. scriptomania

    scriptomania Junior Member

    Joined:
    Dec 28, 2010
    Messages:
    127
    Likes Received:
    250
    Occupation:
    A full time pirate at sea
    Location:
    The European capital of politics
    1. Look around through some useragents list
    2. Change yours to an old ass one (win98 era)
    3. ???
    4. Profit!
     
    • Thanks Thanks x 1
  9. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    975
    Likes Received:
    682
    Occupation:
    Web/Bot Developer
    I stopped using PHP/cURL to scrape Google. Now using Python...way more powerful and my scripts are now much less brittle.
     
    • Thanks Thanks x 1
  10. christoss1959

    christoss1959 Senior Member

    Joined:
    Nov 25, 2010
    Messages:
    894
    Likes Received:
    1,153
    Read this:
    Code:
    http://www.our-picks.com/archives/2007/01/30/google-search-urls-revealed-or-how-to-create-your-own-search-url/
     
  11. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    I use python too, but i uyse PyCURL + beautifulsoup :p

    i'll check a list of old user agents to try
     
  12. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    975
    Likes Received:
    682
    Occupation:
    Web/Bot Developer
    Creator of Beautifulsoup isn't supporting it anymore. Take a look at lxml!
     
  13. William Rufino

    William Rufino Newbie

    Joined:
    Aug 3, 2011
    Messages:
    32
    Likes Received:
    2
    seriously? didn't know that! but as for now beautifulsoup is pretty awesome...

    And it was just a matter of user agent! hehe fixed it now thx guys
     
  14. xenon2010

    xenon2010 Regular Member

    Joined:
    Apr 27, 2010
    Messages:
    231
    Likes Received:
    48
    Occupation:
    web and desktop apps programmer
    Location:
    prison
    yea just use random user-agents and you will be fine...
     
  15. omega369

    omega369 BANNED BANNED

    Joined:
    Sep 20, 2015
    Messages:
    37
    Likes Received:
    17
    /Up
     
  16. maxsch0610

    maxsch0610 Newbie

    Joined:
    Jul 27, 2018
    Messages:
    17
    Likes Received:
    2
    try use proxies as on google you get fast a ban and need to verify that you are a human by recapthca
     
  17. theRevolt

    theRevolt Jr. VIP Jr. VIP

    Joined:
    Jul 29, 2009
    Messages:
    2,130
    Likes Received:
    877
    You see this thread is 7 years old?
     
  18. maxsch0610

    maxsch0610 Newbie

    Joined:
    Jul 27, 2018
    Messages:
    17
    Likes Received:
    2
    I didn't saw , sry
     
Thread Status:
Not open for further replies.