1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

News website scraper

Discussion in 'Black Hat SEO' started by Montella, Aug 5, 2017.

  1. Montella

    Montella Newbie

    Joined:
    Jun 26, 2012
    Messages:
    21
    Likes Received:
    3
    Does anyone know bots/services that can scrape full articles from news websites such as CNN, Huffington Post etc

    Not snippets which redirect to the original article (webhose.io), but where you can imput a website & topic you'd like and it scrapes it fully.
     
  2. andy1

    andy1 Junior Member

    Joined:
    Jan 15, 2009
    Messages:
    101
    Likes Received:
    16
    try this one..httrack website copier ,..hopefully works yr need
     
  3. redarrow

    redarrow Elite Member

    Joined:
    Apr 1, 2013
    Messages:
    5,692
    Likes Received:
    1,305
    wont work classed as duplicate content website get ban.

    rss only way .

    unless you really want it but will get website ban

    copy and paste quicker
     
  4. patriotnews

    patriotnews Senior Member

    Joined:
    Oct 25, 2015
    Messages:
    1,016
    Likes Received:
    592
    Location:
    under there
    Who will "ban" the website?
    And as to it being classified as duplicate content, so?

    OP: what cms are you using? Lots of good RSS feed autopost plugins for WordPress.
     
    • Thanks Thanks x 1
  5. Montella

    Montella Newbie

    Joined:
    Jun 26, 2012
    Messages:
    21
    Likes Received:
    3
    I'm on Wordpress. Any particular plugins you would recommend?
     
  6. greatops

    greatops Jr. VIP Jr. VIP

    Joined:
    Nov 8, 2014
    Messages:
    465
    Likes Received:
    74
    Location:
    Freedom

    Can code this for you if you want.


    Cheers
     
  7. patriotnews

    patriotnews Senior Member

    Joined:
    Oct 25, 2015
    Messages:
    1,016
    Likes Received:
    592
    Location:
    under there
    WPeMatico free works good enough to get you started. You can schedule many channels, many schedules.
    With just the free version, I was running 47 different RSS news feeds, over 102 different posting schedules.
     
    • Thanks Thanks x 1
  8. bobojonathan

    bobojonathan Regular Member

    Joined:
    Sep 12, 2014
    Messages:
    290
    Likes Received:
    22
    Do you pull full feeds from the news sites and how do you get rid of duplicate post.
     
  9. patriotnews

    patriotnews Senior Member

    Joined:
    Oct 25, 2015
    Messages:
    1,016
    Likes Received:
    592
    Location:
    under there
    I pull full feeds from news sides and cut out the ones that block full posts.
    I set it up so it only pulls the most 1-2 most recent posts, and there is a check box on WPeMatico that allows you to not pull duplicate content from the sites, so I avoid pulling the same articles.
     
    • Thanks Thanks x 1
  10. itz_styx

    itz_styx Jr. VIP Jr. VIP

    Joined:
    May 8, 2012
    Messages:
    636
    Likes Received:
    292
    Occupation:
    CEO / Admin / Developer
    Location:
    /dev/mem
    Home Page:
    you can do that with argo content's article scraper, you can also filter out urls and unwanted garbage with it.
     
  11. wangbu

    wangbu Registered Member

    Joined:
    Apr 6, 2008
    Messages:
    98
    Likes Received:
    33
    Thank you, I'd check it as well.
     
    • Thanks Thanks x 1
  12. yellowcat

    yellowcat Regular Member

    Joined:
    Aug 27, 2015
    Messages:
    304
    Likes Received:
    175
    Location:
    internet 24/7
    Home Page:
    I can code this if you'd like
    pm here or on skype @ yellowcat1771