ScrapeBox: Question about extracting URL's

CoyoteAssassin

Elite Member
Joined
Jan 3, 2010
Messages
1,868
Reaction score
3,988
I have ScrapeBox, I've added the ScrapeBox Link Extractor Addon, imported my URL, and started to extract Internal URL's.

But it only has two results.

All of the links in the HTML are "/folder/folder2/page1.html". How do I get ScrapeBox to grab these links since they are not full URL's?

Thanks!
 
Sometimes Link Extractor Addon acts weird for me. When I choose internal, I get few internal links, but when I choose the option "both" I get all the internal as well as external links. So try to extract both link and see how it works.

The other option is you can try SB Sitemap Extractor if that site has sitemap. This works perfectly to scrape all internal pages.

If there is no Sitemap or if SB doesn't recognize the sitemap, studying the URL structure is the only way, it is pretty much easy for us to grab the internal links as per our requirement. Put the site URL in with operator "site:" and see the URL structure. Most of the blogs/sites URL would be like this...

Year wise:

So I use the custom footprint like this

Category wise:

so the footprint would be
site:http://dogsite.com/dogfood/
site:http://dogsite.com/dogtraining/
site:http://dogsite.com/dogallergies/

Hope it helps...
 
Thanks for the reply. Nothing new for me but it is good to see that others are experiencing the same issue. I'm pretty good at building links and extracting them.

I decided to spend 15 minutes and manually build the category links and then loaded those into SB. I then ran SB which extracted the profile pages (which were full URL's).

Thanks.
 
Back
Top