[GET] Google Result URL Scraper script

Rudyzplace · Nov 25, 2009

Google search result URL scraper

I developed this script to scrape Google search results and use them with PR Storm.

Lately I've noticed there was a high demand for a URL scraper, so I'm giving back to the forum to thank for the tools and knowledge I gained in the past months.

This is a free distribution so feel free to alter the script if you have different requirements (Scraping Yahoo or Bing etc.)

Instructions: Simply copy the files to your htdocs folder and browse to GoogleScraper.php --> you will then input a query and number of results multiplied by 100.

Download link:

Code:

http://www.mediafire.com/?vo23zy2yjgj

Please hit the T-H-A-N-K-S button if you find this useful.

ariknite · Nov 25, 2009

How can I change the result sets to x10?

Rudyzplace · Nov 25, 2009

Simply divide the input multiplier by 10 --> if you wish to receive 10 results for page, set it on 0.1 (0.1*100=10) if you wish to receive 20 set it on 0.2 and so on..

Let me know if you encounter any problems, I'll be glad to help.

ariknite · Nov 25, 2009

Nice!! is there a result limit?

Rudyzplace · Nov 25, 2009

Last time I've checked Google was limiting each IP to 6000 results in 15 minutes, this might have changed.

I'm extracting around 1000 results for that period of time and it never gave me the sorry capcha page.

redsasy · Nov 26, 2009

Are you the owner of

Code:

gscrape.org

if so thank you a lot for this and for the website if its yours

Icarion · Nov 28, 2009

I want to use this for google and I found that the following modifier does this with your script:

bphonebook:keyword:.com

But I guess this simply parses domain names and doesnt provide a modifier for certain searches so I can restrict search to say one state?

Thx!

Rudyzplace · Nov 29, 2009

The script uses any given input and carries it over to Google search. if it works in there it should work using the script.

blackmagicmaster · Nov 29, 2009

nice code i am also researching at dom features help me lot ! thx for this share !

zjfmcl · Dec 1, 2009

I can't download it, is there another download link?

1link · Dec 1, 2009

This is a new thing I am learning. Can anyone explain a bit in more detail, what it is and how does it work ??

Thanks

mtravel13 · Dec 1, 2009

i created one such script in imacros but this one is a genious
thanks for sharing ..

soulfly · Dec 1, 2009

was looking for such script for long. thank you

Rudyzplace · Dec 9, 2009

1link said:
This is a new thing I am learning. Can anyone explain a bit in more detail, what it is and how does it work ??

Thanks

It uses an external unit called HTML DOM.

this set of commands allow you to locate a <tag> in the HTML and extract it's inner text, link text or outer text.

for example this tag --> <a class="link" href="somesite.com">link text to extract</a>

will be located using:

find me an <a> tag with the class "link" and bring it's link text.

Result: "Link text to extract"

oni3350 · Apr 8, 2010

Sorry to bring up an old thread, but this is exactly what i was looking for. something to scrap the site:domain. com URLs into a nice easy copy n paste file.

THANKS!

bbrez1 · Apr 8, 2010

I know this is an old thread but : I modified the script for my own needs but got into a problem. I'm using a very "complex" query and Google always blocks (503 error) me after a few pages being parsed. I tried increasing the sleep time, but still it didn't help

Does anyone have any ideas?

kaidoristm · Apr 8, 2010

Simple google notices if your trying to scrape search results your doing it too fast without proxies. And i must admit and even with proxies you will blocked.
The best idea for scraping i have found is to use Yahoo cause they are not such a bitches as Goolge and are sharing their search results. You can either use their search api although it is limited to 5000 queries a day. So the best option is to use their Search Boss there is no limitation.

Rudyzplace · Apr 9, 2010

bbrez1 said:
I know this is an old thread but : I modified the script for my own needs but got into a problem. I'm using a very "complex" query and Google always blocks (503 error) me after a few pages being parsed. I tried increasing the sleep time, but still it didn't help

Does anyone have any ideas?

Is there anyway I can help? when i developed the script it didn't get a ban from big G using the sleep, we can improve this and work around it.

PM me with the script if you would like me to go over it and improve it

bbrez1 · Apr 9, 2010

Rudyzplace said:
Is there anyway I can help? when i developed the script it didn't get a ban from big G using the sleep, we can improve this and work around it.

PM me with the script if you would like me to go over it and improve it

I think this would only be possible using proxies (for a huge amount of results at least). Google throws out the error even when I manually search for it and browse only about 3 - 5 pages.

The query was: site:facebook.com inurl

ages/ group names

It had about 86 pages of results (10 per page) setting it to 100 per page would prob work for me since it only had 9 pages. But I decided that I will write a facebook scraper and get the results from there instead (more results on FB anyway)

Also: when I first wanted to use the about query in your script (posting into input box) it did not work, so I went and just changed the whole url when calling getresults and then it worked (maybe because of the slash?).

Thanks to both

aftershock2020 · Apr 9, 2010

M.A.D said:
The script is huge.
Why didn't you just use this?

Code:

<?php $query = urlencode("[COLOR="Red"]ohohoh[/COLOR]"); preg_match_all('/<a title=".*?" href=(.*?)>/', file_get_contents("http://www.google.com/ie?q=" . urlencode($query) . "&num=100&start=1"), $matches); print implode("<br>", $matches[1]); ?>

Change red to keyword you want to search for.

I was thinking the exact same thing. I use this same line of coding for the process, however I pass a variable for the standing keyword and make a list during the workday, dropping it into my database to be searched over a more natural, random cycle of searches throughout the day.

[GET] Google Result URL Scraper script

Regular Member

Newbie

Regular Member

Newbie

Regular Member

Newbie

Newbie

Regular Member

BANNED

Newbie

Junior Member

Registered Member

Junior Member

Regular Member

Senior Member

Power Member

BANNED

Regular Member

Power Member

Senior Member

Main Menu

Marketplace

Making Money

BlackHat World