scrape url's of my article site

Krusty

Registered Member
Joined
Jul 14, 2014
Messages
98
Reaction score
36
Is there a scraper I can use to scrape the url's of all my articles on my website. there is over 1,000 articles and I don't want to do it manually. I want to be able to submit all the links to scrapebox.
Is scrapebox designed to scrape a single website for all url's?
any suggestions?
 
I believe scraping your own website for link using the "site:yourdomain.com" should work. It gives you the links indexed by google I believe
 
You can use xml-sitemaps.com , it will generate a sitemap of your website with all the URLs.
 
If its wordpress based, there are a number of sitemap plugins you can install that will help you out...
 
Using screaming frog would probably be the most accurate solution. just take care not to put too much threads otherwise your own server could ban your ip (it already happened to me... ^^)
 
As Bass Tracker Boats noted, Screaming frog will actually "crawl" your website.

If you want to use Scrapebox, and there is a sitemap (Which I doubt or you wouldn't be asking, but if there is) then you can use the sitemap addon and select deep crawl in the settings.

Else you can do a

site:domain.com
in google for example and then load the results into the link extractor addon and extract internal links. Then export the results and load them back in and extract internal. Keep doing this until you are happy that you have all the urls.

Whichever program you use bear in mind not to turn the connections up too high, it may seem like you want to go faster but remember you are hammering away on your own webserver and you could take it down. I inadvertantly crashed a webserver using Scrapebox. :)
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock