I want to 'clone' a wordpress blog site. The website is huge - 2200 pages in google.
Is there any tool that would scrape the content as it is? Maybe use category specific rss feeds to scrape and put them in the right categories on my domain?
Well.. there are a few things I can think of. I downloaded this off of BHW I think, at the time it was completely free but I'm not sure which version it is. It's called full text RSS, it's from here: http://fivefilters.org/content-only/
Here is the mediafire link to the version I have uploaded on my server, I don't know rules about sharing links or whatever as I've never done it but here is the mediafire link & virus total..
So anyways, it takes whatever RSS feed and puts it into full post format.. fully preserved in formatting I believe. What I did was use backlink energizer which posts content from URL's and put in the URL that the RSS tool generated, but the problem is I think it only does up to 30 posts... can't remember. At least it's a start for a possible solution
Other thing I can think of is using iMacros which the full version is available in the download section somewhere.. it can scrape the entire site.. you could use scrapebox or something to get all of the URL's and plug it in and scrape away.. then you can use a macro also to post it to your own blog. Can put all of the scraped HTML files into a folder and then open each individual URL on your pc like C:\blackhat\user\pages\1.html etc and post it onto your blog that way. Sorry this is kind of mangled lol, didn't sleep well last night
Other than that I can't really think of an easy way to do it. There is probably an easier way of doing it. Another thing just thought of is you could scrape all of the URL's of the site, and then put them in the format of site:blahblah.com/URL or whatever and then use those as the keywords in the autoblog tool on your wordpress thing and have it do them all right awqy. Not sure how it would work out but hopefully it can give you some ideas on where to start
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.