[GET] Google Result URL Scraper script

Rudyzplace

Regular Member
Joined
Aug 24, 2009
Messages
273
Reaction score
120
Google search result URL scraper

I developed this script to scrape Google search results and use them with PR Storm.


Lately I've noticed there was a high demand for a URL scraper, so I'm giving back to the forum to thank for the tools and knowledge I gained in the past months.

This is a free distribution so feel free to alter the script if you have different requirements (Scraping Yahoo or Bing etc.)

Instructions: Simply copy the files to your htdocs folder and browse to GoogleScraper.php --> you will then input a query and number of results multiplied by 100.


Download link:

Code:
http://www.mediafire.com/?vo23zy2yjgj
Please hit the T-H-A-N-K-S button if you find this useful.

 
Last edited:
Simply divide the input multiplier by 10 --> if you wish to receive 10 results for page, set it on 0.1 (0.1*100=10) if you wish to receive 20 set it on 0.2 and so on..

Let me know if you encounter any problems, I'll be glad to help.
 
Last time I've checked Google was limiting each IP to 6000 results in 15 minutes, this might have changed.

I'm extracting around 1000 results for that period of time and it never gave me the sorry capcha page.
 
Are you the owner of
Code:
gscrape.org
if so thank you a lot for this and for the website if its yours :)
 
I want to use this for google and I found that the following modifier does this with your script:

bphonebook:keyword:.com

But I guess this simply parses domain names and doesnt provide a modifier for certain searches so I can restrict search to say one state?

Thx!
 
The script uses any given input and carries it over to Google search. if it works in there it should work using the script.
 
This is a new thing I am learning. Can anyone explain a bit in more detail, what it is and how does it work ??

Thanks :D
 
i created one such script in imacros but this one is a genious
thanks for sharing ..
 
was looking for such script for long. thank you
 
This is a new thing I am learning. Can anyone explain a bit in more detail, what it is and how does it work ??

Thanks :D

It uses an external unit called HTML DOM.

this set of commands allow you to locate a <tag> in the HTML and extract it's inner text, link text or outer text.

for example this tag --> <a class="link" href="somesite.com">link text to extract</a>

will be located using:

find me an <a> tag with the class "link" and bring it's link text.

Result: "Link text to extract"
 
Sorry to bring up an old thread, but this is exactly what i was looking for. something to scrap the site:domain. com URLs into a nice easy copy n paste file.

THANKS!
 
I know this is an old thread but : I modified the script for my own needs but got into a problem. I'm using a very "complex" query and Google always blocks (503 error) me after a few pages being parsed. I tried increasing the sleep time, but still it didn't help

Does anyone have any ideas?
 
Simple google notices if your trying to scrape search results your doing it too fast without proxies. And i must admit and even with proxies you will blocked.
The best idea for scraping i have found is to use Yahoo cause they are not such a bitches as Goolge and are sharing their search results. You can either use their search api although it is limited to 5000 queries a day. So the best option is to use their Search Boss there is no limitation.
 
Last edited:
I know this is an old thread but : I modified the script for my own needs but got into a problem. I'm using a very "complex" query and Google always blocks (503 error) me after a few pages being parsed. I tried increasing the sleep time, but still it didn't help

Does anyone have any ideas?

Is there anyway I can help? when i developed the script it didn't get a ban from big G using the sleep, we can improve this and work around it.

PM me with the script if you would like me to go over it and improve it
 
Is there anyway I can help? when i developed the script it didn't get a ban from big G using the sleep, we can improve this and work around it.

PM me with the script if you would like me to go over it and improve it

I think this would only be possible using proxies (for a huge amount of results at least). Google throws out the error even when I manually search for it and browse only about 3 - 5 pages.

The query was: site:facebook.com inurl:pages/ group names

It had about 86 pages of results (10 per page) setting it to 100 per page would prob work for me since it only had 9 pages. But I decided that I will write a facebook scraper and get the results from there instead (more results on FB anyway)

Also: when I first wanted to use the about query in your script (posting into input box) it did not work, so I went and just changed the whole url when calling getresults and then it worked (maybe because of the slash?).

Thanks to both
 
The script is huge.
Why didn't you just use this?

Code:
<?php


$query = urlencode("[COLOR="Red"]ohohoh[/COLOR]"); 

preg_match_all('/<a title=".*?" href=(.*?)>/', file_get_contents("http://www.google.com/ie?q=" . urlencode($query) . "&num=100&start=1"), $matches); 

print implode("<br>", $matches[1]);

?>
Change red to keyword you want to search for.

I was thinking the exact same thing. I use this same line of coding for the process, however I pass a variable for the standing keyword and make a list during the workday, dropping it into my database to be searched over a more natural, random cycle of searches throughout the day.
 
Back
Top