News website scraper

Discussion in 'Black Hat SEO' started by Montella, Aug 5, 2017.

  1. Montella

    Montella Newbie

    Joined:
    Jun 26, 2012
    Messages:
    21
    Likes Received:
    3
    Does anyone know bots/services that can scrape full articles from news websites such as CNN, Huffington Post etc

    Not snippets which redirect to the original article (webhose.io), but where you can imput a website & topic you'd like and it scrapes it fully.
     
  2. andy1

    andy1 Junior Member

    Joined:
    Jan 15, 2009
    Messages:
    103
    Likes Received:
    17
    try this one..httrack website copier ,..hopefully works yr need
     
  3. redarrow

    redarrow Elite Member

    Joined:
    Apr 1, 2013
    Messages:
    7,665
    Likes Received:
    1,972
    wont work classed as duplicate content website get ban.

    rss only way .

    unless you really want it but will get website ban

    copy and paste quicker
     
  4. patriotnews

    patriotnews Supreme Member

    Joined:
    Oct 25, 2015
    Messages:
    1,257
    Likes Received:
    841
    Location:
    under there
    Who will "ban" the website?
    And as to it being classified as duplicate content, so?

    OP: what cms are you using? Lots of good RSS feed autopost plugins for WordPress.
     
    • Thanks Thanks x 1
  5. Montella

    Montella Newbie

    Joined:
    Jun 26, 2012
    Messages:
    21
    Likes Received:
    3
    I'm on Wordpress. Any particular plugins you would recommend?
     
  6. greatops

    greatops Jr. VIP Jr. VIP

    Joined:
    Nov 8, 2014
    Messages:
    587
    Likes Received:
    99
    Location:
    Freedom

    Can code this for you if you want.


    Cheers
     
  7. patriotnews

    patriotnews Supreme Member

    Joined:
    Oct 25, 2015
    Messages:
    1,257
    Likes Received:
    841
    Location:
    under there
    WPeMatico free works good enough to get you started. You can schedule many channels, many schedules.
    With just the free version, I was running 47 different RSS news feeds, over 102 different posting schedules.
     
    • Thanks Thanks x 1
  8. bobojonathan

    bobojonathan Regular Member

    Joined:
    Sep 12, 2014
    Messages:
    292
    Likes Received:
    24
    Do you pull full feeds from the news sites and how do you get rid of duplicate post.
     
  9. patriotnews

    patriotnews Supreme Member

    Joined:
    Oct 25, 2015
    Messages:
    1,257
    Likes Received:
    841
    Location:
    under there
    I pull full feeds from news sides and cut out the ones that block full posts.
    I set it up so it only pulls the most 1-2 most recent posts, and there is a check box on WPeMatico that allows you to not pull duplicate content from the sites, so I avoid pulling the same articles.
     
    • Thanks Thanks x 1
  10. itz_styx

    itz_styx Power Member

    Joined:
    May 8, 2012
    Messages:
    686
    Likes Received:
    359
    Occupation:
    CEO / Admin / Developer
    Location:
    /dev/mem
    Home Page:
    you can do that with argo content's article scraper, you can also filter out urls and unwanted garbage with it.
     
  11. wangbu

    wangbu Registered Member

    Joined:
    Apr 6, 2008
    Messages:
    97
    Likes Received:
    33
    Thank you, I'd check it as well.
     
    • Thanks Thanks x 1
  12. yellowcat

    yellowcat Regular Member

    Joined:
    Aug 27, 2015
    Messages:
    316
    Likes Received:
    196
    I can code this if you'd like
    pm here or on skype @ yellowcat1771