1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[GET] Google Result URL Scraper script

Discussion in 'Black Hat SEO' started by Rudyzplace, Nov 25, 2009.

  1. Rudyzplace

    Rudyzplace Regular Member

    Joined:
    Aug 24, 2009
    Messages:
    266
    Likes Received:
    117
    Occupation:
    SEO expert
    Location:
    GPS signal dead...please hold
    Google search result URL scraper

    I developed this script to scrape Google search results and use them with PR Storm.


    Lately I've noticed there was a high demand for a URL scraper, so I'm giving back to the forum to thank for the tools and knowledge I gained in the past months.

    This is a free distribution so feel free to alter the script if you have different requirements (Scraping Yahoo or Bing etc.)

    Instructions: Simply copy the files to your htdocs folder and browse to GoogleScraper.php --> you will then input a query and number of results multiplied by 100.


    Download link:

    Code:
    http://www.mediafire.com/?vo23zy2yjgj
    
    Please hit the T-H-A-N-K-S button if you find this useful.

     
    • Thanks Thanks x 59
    Last edited: Nov 25, 2009
  2. ariknite

    ariknite Newbie

    Joined:
    Oct 20, 2009
    Messages:
    5
    Likes Received:
    0
    How can I change the result sets to x10?
     
  3. Rudyzplace

    Rudyzplace Regular Member

    Joined:
    Aug 24, 2009
    Messages:
    266
    Likes Received:
    117
    Occupation:
    SEO expert
    Location:
    GPS signal dead...please hold
    Simply divide the input multiplier by 10 --> if you wish to receive 10 results for page, set it on 0.1 (0.1*100=10) if you wish to receive 20 set it on 0.2 and so on..

    Let me know if you encounter any problems, I'll be glad to help.
     
  4. ariknite

    ariknite Newbie

    Joined:
    Oct 20, 2009
    Messages:
    5
    Likes Received:
    0
    Nice!! is there a result limit?
     
  5. Rudyzplace

    Rudyzplace Regular Member

    Joined:
    Aug 24, 2009
    Messages:
    266
    Likes Received:
    117
    Occupation:
    SEO expert
    Location:
    GPS signal dead...please hold
    Last time I've checked Google was limiting each IP to 6000 results in 15 minutes, this might have changed.

    I'm extracting around 1000 results for that period of time and it never gave me the sorry capcha page.
     
  6. redsasy

    redsasy Newbie

    Joined:
    Feb 16, 2009
    Messages:
    45
    Likes Received:
    78
    Occupation:
    In a relationship with a female hacker
    Location:
    Bulgaria
    Are you the owner of
    Code:
    gscrape.org
    if so thank you a lot for this and for the website if its yours :)
     
  7. Icarion

    Icarion Newbie

    Joined:
    Apr 11, 2007
    Messages:
    1
    Likes Received:
    0
    I want to use this for google and I found that the following modifier does this with your script:

    bphonebook:keyword:.com

    But I guess this simply parses domain names and doesnt provide a modifier for certain searches so I can restrict search to say one state?

    Thx!
     
  8. Rudyzplace

    Rudyzplace Regular Member

    Joined:
    Aug 24, 2009
    Messages:
    266
    Likes Received:
    117
    Occupation:
    SEO expert
    Location:
    GPS signal dead...please hold
    The script uses any given input and carries it over to Google search. if it works in there it should work using the script.
     
  9. blackmagicmaster

    blackmagicmaster BANNED BANNED

    Joined:
    Dec 11, 2008
    Messages:
    587
    Likes Received:
    932
    nice code i am also researching at dom features help me lot ! thx for this share !
     
  10. zjfmcl

    zjfmcl Newbie

    Joined:
    May 19, 2009
    Messages:
    2
    Likes Received:
    0
    I can't download it, is there another download link?
     
  11. 1link

    1link Registered Member

    Joined:
    Dec 9, 2008
    Messages:
    93
    Likes Received:
    196
    This is a new thing I am learning. Can anyone explain a bit in more detail, what it is and how does it work ??

    Thanks :D
     
  12. mtravel13

    mtravel13 Registered Member

    Joined:
    Sep 2, 2009
    Messages:
    81
    Likes Received:
    17
    Occupation:
    web designer
    Location:
    internet
    i created one such script in imacros but this one is a genious
    thanks for sharing ..
     
  13. soulfly

    soulfly Junior Member

    Joined:
    Nov 20, 2008
    Messages:
    118
    Likes Received:
    138
    Location:
    BHW
    was looking for such script for long. thank you
     
  14. Rudyzplace

    Rudyzplace Regular Member

    Joined:
    Aug 24, 2009
    Messages:
    266
    Likes Received:
    117
    Occupation:
    SEO expert
    Location:
    GPS signal dead...please hold
    It uses an external unit called HTML DOM.

    this set of commands allow you to locate a <tag> in the HTML and extract it's inner text, link text or outer text.

    for example this tag --> <a class="link" href="somesite.com">link text to extract</a>

    will be located using:

    find me an <a> tag with the class "link" and bring it's link text.

    Result: "Link text to extract"
     
  15. oni3350

    oni3350 Regular Member

    Joined:
    Sep 24, 2008
    Messages:
    361
    Likes Received:
    194
    Occupation:
    Internet Marketer/ Black Hatter
    Location:
    Perth, Western Australia
    Home Page:
    Sorry to bring up an old thread, but this is exactly what i was looking for. something to scrap the site:domain. com URLs into a nice easy copy n paste file.

    THANKS!
     
  16. bbrez1

    bbrez1 Power Member

    Joined:
    Feb 21, 2009
    Messages:
    675
    Likes Received:
    2,360
    I know this is an old thread but : I modified the script for my own needs but got into a problem. I'm using a very "complex" query and Google always blocks (503 error) me after a few pages being parsed. I tried increasing the sleep time, but still it didn't help

    Does anyone have any ideas?
     
  17. kaidoristm

    kaidoristm Power Member

    Joined:
    Feb 13, 2009
    Messages:
    561
    Likes Received:
    726
    Occupation:
    Freelancer
    Location:
    Estonia
    Home Page:
    Simple google notices if your trying to scrape search results your doing it too fast without proxies. And i must admit and even with proxies you will blocked.
    The best idea for scraping i have found is to use Yahoo cause they are not such a bitches as Goolge and are sharing their search results. You can either use their search api although it is limited to 5000 queries a day. So the best option is to use their Search Boss there is no limitation.
     
    • Thanks Thanks x 1
    Last edited: Apr 8, 2010
  18. Rudyzplace

    Rudyzplace Regular Member

    Joined:
    Aug 24, 2009
    Messages:
    266
    Likes Received:
    117
    Occupation:
    SEO expert
    Location:
    GPS signal dead...please hold
    Is there anyway I can help? when i developed the script it didn't get a ban from big G using the sleep, we can improve this and work around it.

    PM me with the script if you would like me to go over it and improve it
     
    • Thanks Thanks x 1
  19. bbrez1

    bbrez1 Power Member

    Joined:
    Feb 21, 2009
    Messages:
    675
    Likes Received:
    2,360
    I think this would only be possible using proxies (for a huge amount of results at least). Google throws out the error even when I manually search for it and browse only about 3 - 5 pages.

    The query was: site:facebook.com inurl:pages/ group names

    It had about 86 pages of results (10 per page) setting it to 100 per page would prob work for me since it only had 9 pages. But I decided that I will write a facebook scraper and get the results from there instead (more results on FB anyway)

    Also: when I first wanted to use the about query in your script (posting into input box) it did not work, so I went and just changed the whole url when calling getresults and then it worked (maybe because of the slash?).

    Thanks to both
     
  20. aftershock2020

    aftershock2020 Senior Member

    Joined:
    Oct 19, 2007
    Messages:
    981
    Likes Received:
    477
    I was thinking the exact same thing. I use this same line of coding for the process, however I pass a variable for the standing keyword and make a list during the workday, dropping it into my database to be searched over a more natural, random cycle of searches throughout the day.