1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Clone content and categories from another website

Discussion in 'Blogging' started by darksider8, Nov 12, 2014.

  1. darksider8

    darksider8 Newbie

    Joined:
    Dec 9, 2013
    Messages:
    7
    Likes Received:
    0
    Location:
    France
    Hello,

    I want to start a new website with wordpress. It will be some kind of movie database (just info no download links). I would like to start with a good amount of movies. For my project, I need to start with ~10 000 movies (posts), so adding movies one by one will take years. There is a website using wordpress that has a good amount of movie posts and I would like to use its content. I read here that some softwares are useful for that kind of things but in my specific case, is it possible ? The website uses 3 custom taxonomies as well in each posts.

    I've never did that before, so I'm a bit lost in what is possible or not. I will add that my website will not be a clone, I will add new categories and content.
     
  2. Zwielicht

    Zwielicht Moderator in Training Jr. VIP Premium Member

    Joined:
    Aug 31, 2013
    Messages:
    3,822
    Likes Received:
    6,756
    Gender:
    Male
    Occupation:
    Liquidator
    Location:
    Riverside County, California
    Home Page:
    You can certainly pilfer the content verbatim, although do not be surprised when you receive a duplicate content penalty. If you're going to do this anyway, then at least run the content through a good content spinner (or hire someone to spin it for you).
     
  3. west555

    west555 Regular Member

    Joined:
    Dec 4, 2011
    Messages:
    326
    Likes Received:
    130
    Location:
    /etc/passwd
    You want to have full articles on them or just info and short reviews from IMDB ?
     
  4. salmanseo982

    salmanseo982 Regular Member

    Joined:
    Jan 28, 2014
    Messages:
    465
    Likes Received:
    40
    Hey can you save your site from ( piracy websites ) if DMCA find they will so sue you be careful buddy I suggest you try some different Idea that will good
     
  5. Zwielicht

    Zwielicht Moderator in Training Jr. VIP Premium Member

    Joined:
    Aug 31, 2013
    Messages:
    3,822
    Likes Received:
    6,756
    Gender:
    Male
    Occupation:
    Liquidator
    Location:
    Riverside County, California
    Home Page:
    OP is not talking about creating a video download website, OP wants to create a website that has information about the movies like IMDB.
     
  6. gordop

    gordop Registered Member

    Joined:
    Mar 30, 2013
    Messages:
    61
    Likes Received:
    19
    Occupation:
    Part-time web developer
    Location:
    San Jose, CA
    Most of the WP plugins for cloning require access to the original site, so they won't be able to help. You can clone with HTTrack but I don't know how well it would work for a really huge site. Also I dont think you can convert the download directly back to WP. Check it out to see or maybe someone else here with more experience can chime in.
     
  7. GreyWolf

    GreyWolf Executive VIP Jr. VIP

    Joined:
    Aug 17, 2009
    Messages:
    1,930
    Likes Received:
    5,387
    Gender:
    Male
    Occupation:
    Artist / Craftsman
    Location:
    sitting at my PC
    Yeah, if you don't have backend access to the original site then you aren't going to make a simple clone of the site. HTTrack is a pretty cool program, but it only deals with the html that wordpress outputs. What you need to do is use a content scraper to get just the articles, and then feed them to your new wordpress site. Then set up whatever theme you want for the appearance.

    I don't know how well it works, but you could try this plugin for the scraper
    Code:
    http://wordpress.org/plugins/wp-web-scrapper/
    If that doesn't work then just search for "scraping articles" scraping posts" content scraping" "wordpress aggregator" and you'll probably figure some other search terms after seeing the results of those. There are quite a few scrapers you'll find for sell, but you can probably find some free ones that work if you search enough.
     
  8. darksider8

    darksider8 Newbie

    Joined:
    Dec 9, 2013
    Messages:
    7
    Likes Received:
    0
    Location:
    France
    the articles are basically info (with a jacket thumbnail) but the content doesn't come from imdb.

    by using a content scraper, images will come as well ?
    So, I can use it for each categories, and feed the content to my website (the posts will be in my dashboard). Am I understand the process correctly ? I'm really new for that kind of things.
     
  9. GreyWolf

    GreyWolf Executive VIP Jr. VIP

    Joined:
    Aug 17, 2009
    Messages:
    1,930
    Likes Received:
    5,387
    Gender:
    Male
    Occupation:
    Artist / Craftsman
    Location:
    sitting at my PC
    It really depends on the scraper you're using and the feeds your getting. If you set up an autoblog pulling straight from rss feeds then there may or may not be images included, it depends on how the feed was set up. If you have a scraper that is creating the feeds for you then it depends on if the scraper is set up to pull the images or not.

    With an rss feed you're pretty dependent on how the original blog set up the feed. With a scraper on the other hand you're dealing with a script that will parse the html output looking for div and content tags to seperate the article from the rest of the page. In that case if you want images as well then the script would also have to deal with the img src tags within the content, copy the source image to you're own site folder and remake the img tags to correspond to the image on your site. It could also just leech the image from the old site too, but that is probably less desirable since it will show up in the original site logs everytime one of your pages load.

    Basically what you're talking about is an autoblog, so do a search on BHW and also a search on google to learn more abut how to set that up. Autoblogs are a little more problematic now because google will mostly consider them spam, but that doesn't mean you can't be a little more selective and creative in setting one up that doesn't shout duplicate content. Here's one article I found that discusses that a little more in depth - http://premium.wpmudev.org/blog/how-to-set-up-a-curated-news-aggregation-site-with-wordpress/

    Also, if you just set up to pull rss feeds then you'll only be getting the most recent articles. That's another reason I suggest looking for a scraper. With the right scraper you could pull all the archived content as well. (I'm not sure but I think that Sweetfunny has a content scraper in addition to Scrapebox. It seems like I saw a salesletter for it once on one of the Scrapebox pages.)

    :ranger:

    Being new to this kind of thing means that you'll have to do some research to figure out how it works. We can point you in the right direction, but you aren't going to get a step by step guide for doing what you're trying to do. It is definitely doable though, but you'll need to do the research yourself to make it happen. Good luck.

    Since you're talking about pulling from the imdb, I think I might have come across a scraper that was specifically for that, but again I don't remember for sure now and No, I'm not going to search for it for you. lol. That's something else you might try to search for though. Basically you'll need to figure out each of the parts I described and then put it all together yourself. Once you've done that then you could maybe even come back and post your own step by step guide on how to do it. :)
     
    Last edited: Nov 12, 2014
  10. west555

    west555 Regular Member

    Joined:
    Dec 4, 2011
    Messages:
    326
    Likes Received:
    130
    Location:
    /etc/passwd
    I have whole website which is working automatic and taking content from IMDB that's why i asked if its from them
     
  11. darksider8

    darksider8 Newbie

    Joined:
    Dec 9, 2013
    Messages:
    7
    Likes Received:
    0
    Location:
    France
    Thank you very much GreyWolf for the very detailed answer.
     
  12. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    707
    Likes Received:
    267
    Location:
    PHP Scripting ;)
    You will probably need a custom scrapper, which scrapes the articles,tags, and categories and enter them as is to your database, or post them using the XML RPC.

    I dont know if Wp-robot does what you are looking for. You should check it out, or else you need to get this coded from someone.