scrape url's of my article site

Krusty

Registered Member
Joined
Jul 14, 2014
Messages
98
Reaction score
36
Is there a scraper I can use to scrape the url's of all my articles on my website. there is over 1,000 articles and I don't want to do it manually. I want to be able to submit all the links to scrapebox.
Is scrapebox designed to scrape a single website for all url's?
any suggestions?
 

cottonwolf

Regular Member
Joined
Jan 20, 2015
Messages
469
Reaction score
242
I believe scraping your own website for link using the "site:yourdomain.com" should work. It gives you the links indexed by google I believe
 

M4XW3LL

Elite Member
Joined
Feb 5, 2013
Messages
1,629
Reaction score
2,152
You can use xml-sitemaps.com , it will generate a sitemap of your website with all the URLs.
 

Automated

Regular Member
Joined
Jun 7, 2012
Messages
290
Reaction score
125
If its wordpress based, there are a number of sitemap plugins you can install that will help you out...
 

Florent1933

Registered Member
Joined
Nov 23, 2010
Messages
59
Reaction score
52
Using screaming frog would probably be the most accurate solution. just take care not to put too much threads otherwise your own server could ban your ip (it already happened to me... ^^)
 

loopline

Jr. Executive VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
6,026
Reaction score
3,428
Website
contactformmarketing.com
As Bass Tracker Boats noted, Screaming frog will actually "crawl" your website.

If you want to use Scrapebox, and there is a sitemap (Which I doubt or you wouldn't be asking, but if there is) then you can use the sitemap addon and select deep crawl in the settings.

Else you can do a

site:domain.com
in google for example and then load the results into the link extractor addon and extract internal links. Then export the results and load them back in and extract internal. Keep doing this until you are happy that you have all the urls.

Whichever program you use bear in mind not to turn the connections up too high, it may seem like you want to go faster but remember you are hammering away on your own webserver and you could take it down. I inadvertantly crashed a webserver using Scrapebox. :)
 
Top