1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Is there any tool to scrape content off wordpress blogs?

Discussion in 'BlackHat Lounge' started by daserpent, Aug 4, 2011.

  1. daserpent

    daserpent Power Member

    Joined:
    May 10, 2010
    Messages:
    762
    Likes Received:
    470
    I want to 'clone' a wordpress blog site. The website is huge - 2200 pages in google.

    Is there any tool that would scrape the content as it is? Maybe use category specific rss feeds to scrape and put them in the right categories on my domain?
     
  2. thesuvo

    thesuvo Registered Member

    Joined:
    Feb 3, 2010
    Messages:
    82
    Likes Received:
    18
    Location:
    India
    Home Page:
    I'm also looking for the same tool Bro :)
     
  3. stimpo321

    stimpo321 Registered Member

    Joined:
    Aug 31, 2010
    Messages:
    69
    Likes Received:
    6
    Occupation:
    Seo
    Location:
    Worcs UK
    Would wp robot do that?
     
  4. DustinX

    DustinX Junior Member

    Joined:
    Nov 20, 2009
    Messages:
    105
    Likes Received:
    30
    Gender:
    Male
    Location:
    United States
    Well.. there are a few things I can think of. I downloaded this off of BHW I think, at the time it was completely free but I'm not sure which version it is. It's called full text RSS, it's from here: http://fivefilters.org/content-only/

    Here is the mediafire link to the version I have uploaded on my server, I don't know rules about sharing links or whatever as I've never done it but here is the mediafire link & virus total..

    So anyways, it takes whatever RSS feed and puts it into full post format.. fully preserved in formatting I believe. What I did was use backlink energizer which posts content from URL's and put in the URL that the RSS tool generated, but the problem is I think it only does up to 30 posts... can't remember. At least it's a start for a possible solution

    Other thing I can think of is using iMacros which the full version is available in the download section somewhere.. it can scrape the entire site.. you could use scrapebox or something to get all of the URL's and plug it in and scrape away.. then you can use a macro also to post it to your own blog. Can put all of the scraped HTML files into a folder and then open each individual URL on your pc like C:\blackhat\user\pages\1.html etc and post it onto your blog that way. Sorry this is kind of mangled lol, didn't sleep well last night

    Other than that I can't really think of an easy way to do it. There is probably an easier way of doing it. Another thing just thought of is you could scrape all of the URL's of the site, and then put them in the format of site:blahblah.com/URL or whatever and then use those as the keywords in the autoblog tool on your wordpress thing and have it do them all right awqy. Not sure how it would work out but hopefully it can give you some ideas on where to start

    Good luck!
     
    • Thanks Thanks x 1
  5. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,935
    Occupation:
    Design director
    Location:
    Paris (France)
    Yes, use RSS feed and WP-Robot or WP-o-Matic.
    Also, look at my old thread to create RSS feed from non-RSS website :
    Code:
    http://www.blackhatworld.com/blackhat-seo/blogging/285876-guide-any-content-jacking.html
    Beny
     
    • Thanks Thanks x 1