How do find local sites for a specific industry using scrapebox?

massonspy

Junior Member
Joined
Apr 3, 2010
Messages
141
Reaction score
29
I am trying to specifically target websites of a certain industry for example hair dresser and get their email from their contact page.

Anyone can help me with a footprint that will save me time?
 
For Harvester check "Custom Footprint" in the box enter something like "Hair Dresser in Boston" click "Start Harvester" after that completes go to the "Grab" menu and select "Grab Emails From Harvester List" and you're done.
 
For Harvester check "Custom Footprint" in the box enter something like "Hair Dresser in Boston" click "Start Harvester" after that completes go to the "Grab" menu and select "Grab Emails From Harvester List" and you're done.
didnt give many results. I need a method to extract all business links.
 
you could try allintitle:"hair dresser" OR allinurl:"hair dresser"

you could try that...or if you like i have a program that i can grab tons of business listings for that market.

Shoot me a pm
 
Just try similar to above, but then after you get your initial results, you need to extract other pages, as they may not have the email address on the given page you get back from google.

So you would want to trim the urls to root and remove duplicate domains. Then load them back into the keyword harvester box and do a site:

site:http://www.domain.com

And/or load them into the link extractor addon and extract internal links. Then the results back in 2 or 3 times or however many you like, and extract internal links.

Then load them into the urls harvested section, remove duplicate urls. Then export and randomize. Then go to >> import and add to - and import them and grab emails.

Its important that you use the import and add to and not the import and replace on that last step. The import and replace sorts the list in alphabetical order, so you are hitting the same domain with multiple simultaneous connections. That can result in IP bans or the site taking too long to respond, and then you lose that email.

Bringing them back in randomized lets you grab emails and spread out your hits among many domains at once.
 
Just try similar to above, but then after you get your initial results, you need to extract other pages, as they may not have the email address on the given page you get back from google.

So you would want to trim the urls to root and remove duplicate domains. Then load them back into the keyword harvester box and do a site:

site:http://www.domain.com

And/or load them into the link extractor addon and extract internal links. Then the results back in 2 or 3 times or however many you like, and extract internal links.

Then load them into the urls harvested section, remove duplicate urls. Then export and randomize. Then go to >> import and add to - and import them and grab emails.

Its important that you use the import and add to and not the import and replace on that last step. The import and replace sorts the list in alphabetical order, so you are hitting the same domain with multiple simultaneous connections. That can result in IP bans or the site taking too long to respond, and then you lose that email.

Bringing them back in randomized lets you grab emails and spread out your hits among many domains at once.

Thnx for the suggestion but i have tried this earlier and ended up with less than 50 emails. The problem is the footprint to get good results initially. Its not getting m far and i know should be at least a 1000
 
Just try similar to above, but then after you get your initial results, you need to extract other pages, as they may not have the email address on the given page you get back from google.

So you would want to trim the urls to root and remove duplicate domains. Then load them back into the keyword harvester box and do a site:

site:http://www.domain.com

And/or load them into the link extractor addon and extract internal links. Then the results back in 2 or 3 times or however many you like, and extract internal links.

Then load them into the urls harvested section, remove duplicate urls. Then export and randomize. Then go to >> import and add to - and import them and grab emails.

Its important that you use the import and add to and not the import and replace on that last step. The import and replace sorts the list in alphabetical order, so you are hitting the same domain with multiple simultaneous connections. That can result in IP bans or the site taking too long to respond, and then you lose that email.

Bringing them back in randomized lets you grab emails and spread out your hits among many domains at once.

Excellent advice, and the method I was about to explain - but loopline has done it so much better here.
Scritty
 
Thnx for the suggestion but i have tried this earlier and ended up with less than 50 emails. The problem is the footprint to get good results initially. Its not getting m far and i know should be at least a 1000

Well can you give some examples of pages you do want? I mean with the info you have Id search

hair dresser

I mean thats about the extent of it based on your info provided.

If you have a particular CMS that these hair dressers use then we can target that, or if there are commonalities among pages your looking for we can make a footprint, but you will have to provide several examples of pages you are looking for. Also more detail would be needed.

Making footprints is an art and it varies Dramatically from application to application of them. So the more detail you can give the better.

The one thing is, that if there is no commonalities between the pages you are looking for then your just dealing with sheer numbers of using a basic footprint, such as

hair dressers

However you can do things like

"hair dressers" "a"
"hair dressers" "b"
"hair dressers" "c"
"hair dressers" "1"
"hair dressers" "2"
"hair dressers" "3"
"hair dressers" "dallas texas"

etc... this gives you more volume of results and more finite control on a geo graphic area or whatever your looking for.

Also check your settings in sbox, because that could totally skew the entire result data set if your settings are messed up. Specifically you should look to make sure you results are set to 1000, that you are searching google.com and don't have a custom google set, and then go to settings >> use multi threaded harvester, and uncheck it. Then watch the status column as you harvest to make sure you aren't getting a bunch of 302 blocked on google.

Also then when your done you should make sure all keywords completed in green, else export the red ones and rerun them. Id use the single threaded harvester (unchecking the muli threaded harvester) in your situation anyway. As the single threaded harvester is built for accuracy and has stuff going on in the back end that the multi threaded harvester does not have. Mutli is built for volume and errors towards volume sacrificing accuracy if need be, which is how it should be. Single threaded errors towards accuracy sacrificing speed, which is also how it should be.

Edit:
After thinking on this a bit longer, perhaps you also taking about using like @hotmail.com so you could do things like.

"hair dressers" "a" "@hotmail.com"
"hair dressers" "b" "@gmail.com"
"hair dressers" "c" "email"
"hair dressers" "1" "e-mail"
"hair dressers" "2" "mail"
"hair dressers" "3" "contact"
"hair dressers" "dallas texas" "contact us"
etc...
 
Last edited:
Well can you give some examples of pages you do want? I mean with the info you have Id search

hair dresser

I mean thats about the extent of it based on your info provided.

If you have a particular CMS that these hair dressers use then we can target that, or if there are commonalities among pages your looking for we can make a footprint, but you will have to provide several examples of pages you are looking for. Also more detail would be needed.

Making footprints is an art and it varies Dramatically from application to application of them. So the more detail you can give the better.

The one thing is, that if there is no commonalities between the pages you are looking for then your just dealing with sheer numbers of using a basic footprint, such as

hair dressers

However you can do things like

"hair dressers" "a"
"hair dressers" "b"
"hair dressers" "c"
"hair dressers" "1"
"hair dressers" "2"
"hair dressers" "3"
"hair dressers" "dallas texas"

etc... this gives you more volume of results and more finite control on a geo graphic area or whatever your looking for.

Also check your settings in sbox, because that could totally skew the entire result data set if your settings are messed up. Specifically you should look to make sure you results are set to 1000, that you are searching google.com and don't have a custom google set, and then go to settings >> use multi threaded harvester, and uncheck it. Then watch the status column as you harvest to make sure you aren't getting a bunch of 302 blocked on google.

Also then when your done you should make sure all keywords completed in green, else export the red ones and rerun them. Id use the single threaded harvester (unchecking the muli threaded harvester) in your situation anyway. As the single threaded harvester is built for accuracy and has stuff going on in the back end that the multi threaded harvester does not have. Mutli is built for volume and errors towards volume sacrificing accuracy if need be, which is how it should be. Single threaded errors towards accuracy sacrificing speed, which is also how it should be.

Edit:
After thinking on this a bit longer, perhaps you also taking about using like @hotmail.com so you could do things like.

"hair dressers" "a" "@hotmail.com"
"hair dressers" "b" "@gmail.com"
"hair dressers" "c" "email"
"hair dressers" "1" "e-mail"
"hair dressers" "2" "mail"
"hair dressers" "3" "contact"
"hair dressers" "dallas texas" "contact us"
etc...

I am trying to extract the emails from dental offices particularly located in one geographical area. Dentists in Florida. They usually have a contact form and not an email address so it will probably be a combination of both methods.
1- extracting from business directories
2- extracting directly from their web page.

- I want to first be able to find their links then if i can not get their email i will have to use their contact form manually.
- Cant use CMS platform as commonalities because some dentists use html sites and some WP or Joomla.
- Commonalities i can mention are: keywords like dentist in Florida, dental implants, cosmetic dentist florida, teeth whitening, This are very common for dental websites because these are services that all of the offer.
- I guess i can use Intitle: intext: but whats the right combination for a footprint that would give you accurate results for keyword + location combination.


Hope these extra info helps

PS
You suggest using google.com even for Canadian pages?
"hair dressers" "dallas texas" i have tried but it gives u lots of directory links.
 
I think I would start by scraping for like

"dental keywords" "Florida city" "email"
"dental keywords" "Florida city" "contact"

Then Id scrape out the emails you could, and save them off.

Then Id load the rest up in xrumer and turn on self learning. Create a comment like, hey I would like more info aobut dental services for my family.

Then comment the daylights out of the forms and put in a legit email where you can get responses.

Then use the self learning to work on any forms that it didn't recognize, filter out the successful so you aren't hitting them 10 times in a row. Then keep working on it.

Use scrapebox to scrape, Hrefer is poor for this type of scraping application and most scraping in general IMHO. Use Xrumer to post.

Then wait a week until you get responses. Then capture all those email addresses that respond and thats your gold.

Also don't run the xrumer blacklist against your scrapes, its built to weed out contact forms. I know I have hit an awful lot of contact forms like this. Also don't put any legit info into the campaign other then a throw away email addy. Planning how your going to scrape the email addys from the emails you receive ahead of time would be a good idea too, so you get a throw away email from somewhere you don't have to copy and paste your response email addys from.

Scrapebox is awesome for scraping, great flexibility, its great for posting via the learning mode addon on lots of different platforms, but the learning mode addon isn't built to quickly adapt to multiple different forms in 1 run, so xrumer would be a better choice in this instance for posting to the forms.

My 2 cents.
 
I think I would start by scraping for like

"dental keywords" "Florida city" "email"
"dental keywords" "Florida city" "contact"

Then Id scrape out the emails you could, and save them off.

Then Id load the rest up in xrumer and turn on self learning. Create a comment like, hey I would like more info aobut dental services for my family.

Then comment the daylights out of the forms and put in a legit email where you can get responses.

Then use the self learning to work on any forms that it didn't recognize, filter out the successful so you aren't hitting them 10 times in a row. Then keep working on it.

Use scrapebox to scrape, Hrefer is poor for this type of scraping application and most scraping in general IMHO. Use Xrumer to post.

Then wait a week until you get responses. Then capture all those email addresses that respond and thats your gold.

Also don't run the xrumer blacklist against your scrapes, its built to weed out contact forms. I know I have hit an awful lot of contact forms like this. Also don't put any legit info into the campaign other then a throw away email addy. Planning how your going to scrape the email addys from the emails you receive ahead of time would be a good idea too, so you get a throw away email from somewhere you don't have to copy and paste your response email addys from.

Scrapebox is awesome for scraping, great flexibility, its great for posting via the learning mode addon on lots of different platforms, but the learning mode addon isn't built to quickly adapt to multiple different forms in 1 run, so xrumer would be a better choice in this instance for posting to the forms.

My 2 cents.

Make sense. Thanks i will try it and let you know what happens.
 
Seems to work but only 10% responses. What do you use to save all emails received in a text file? What email account do you think i should use to comment with xrumer ? An email account that saves all sender emails automatically in the address book so you can export them.
 
Use the right tool for the right job.

An app like Places Scout will work a lot better for what you are trying to do. And save you a whole lot of time.
 
Back
Top