Scrapebox problem with .edu and .gov

kalseo · Dec 30, 2010

Hi black hatter friends,

Today I found that I have problem scraping .edu and .gov domains with SB. Does any of you have the same problem?

Cheers

Kal

cyberzilla · Dec 30, 2010

What problem you are facing exactly?

kalseo · Dec 30, 2010

cyberzilla said:
What problem you are facing exactly?

I am not getting any .edu or .gov domains. I am using custom footprints:

inurl:.edu ?Powered by wordpress?
inurl:.gov ?Powered by wordpress?

The only results I am getting are websites that have mention those footprints.

Yzord · Dec 30, 2010

I checked it out with the same footprint. Got 200 results, but a few of them are .edu's. Guess we have to use another footprint

pasdoy · Dec 30, 2010

dont forget the keyword field...

cyberzilla · Dec 31, 2010

The footprint which you are using is correct. Make sure you are using proxies for scraping. I think there is a soft block on your IP by search engines, that's why you are not getting any results. You can also try the below given footprints.

Edit : Oops just now saw your second update..so you are getting the results. It's common to get inappropriate result. Just export the result and remove all the non-edu sites

site:.edu inurl:blog "Add comment" "Notify me when new comments are added" -"comments closed" -"you must be loggedin"
site:.edu inurl:blog "Write a comment" -"comments closed" -"you must be loggedin"
site:.edu inurl:blog "Notify me of followup comments via e-mail" -"comments closed" -"you must be loggedin"
site:.edu in url:blog "comment" -"you must be logged in" -"posting closed" -"comment closed" "keyword"
site:.gov in url:blog "comment" -"you must be logged in" -"posting closed" -"comment closed" "keyword"
site:.gov 'Leave a Reply' 'Name (required)' 'Mail (will not be published) (required)' 'Website' + 'Keyword'
site:.edu 'Leave a Reply' 'Name (required)' 'Mail (will not be published) (required)' 'Website' + 'Keyword'
"site:.edu" "Powered By Wordpress" + 'keyword'

kalseo · Dec 31, 2010

cyberzilla said:
The footprint which you are using is correct. Make sure you are using proxies for scraping. I think there is a soft block on your IP by search engines, that's why you are not getting any results. You can also try the below given footprints.

Edit : Oops just now saw your second update..so you are getting the results. It's common to get inappropriate result. Just export the result and remove all the non-edu sites

site:.edu inurl:blog "Add comment" "Notify me when new comments are added" -"comments closed" -"you must be loggedin"
site:.edu inurl:blog "Write a comment" -"comments closed" -"you must be loggedin"
site:.edu inurl:blog "Notify me of followup comments via e-mail" -"comments closed" -"you must be loggedin"
site:.edu in url:blog "comment" -"you must be logged in" -"posting closed" -"comment closed" "keyword"
site:.gov in url:blog "comment" -"you must be logged in" -"posting closed" -"comment closed" "keyword"
site:.gov 'Leave a Reply' 'Name (required)' 'Mail (will not be published) (required)' 'Website' + 'Keyword'
site:.edu 'Leave a Reply' 'Name (required)' 'Mail (will not be published) (required)' 'Website' + 'Keyword'
"site:.edu" "Powered By Wordpress" + 'keyword'

Thanks mate, those extra footprints will come very useful

jrtaylor · Dec 31, 2010

I'm new at scrapebox.

if I have "inurl:blog xyz" is this equivalent to a search string for the URL rather than the contents of the URL?

artetatu · Dec 31, 2010

can you give me more details?

Scrapebox problem with .edu and .gov

kalseo

Newbie

cyberzilla

Elite Member

kalseo

Newbie

Yzord

Newbie

pasdoy

Senior Member

cyberzilla

Elite Member

kalseo

Newbie

jrtaylor

Newbie

artetatu

Registered Member

Main Menu

Marketplace

Making Money

BlackHat World