1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

News website scraper

Discussion in 'Black Hat SEO' started by Montella, Aug 5, 2017.

  1. Montella

    Montella Newbie

    Joined:
    Jun 26, 2012
    Messages:
    17
    Likes Received:
    3
    Does anyone know bots/services that can scrape full articles from news websites such as CNN, Huffington Post etc

    Not snippets which redirect to the original article (webhose.io), but where you can imput a website & topic you'd like and it scrapes it fully.
     
  2. andy1

    andy1 Junior Member

    Joined:
    Jan 15, 2009
    Messages:
    104
    Likes Received:
    18
    try this one..httrack website copier ,..hopefully works yr need
     
  3. redarrow

    redarrow Elite Member

    Joined:
    Apr 1, 2013
    Messages:
    4,250
    Likes Received:
    969
    wont work classed as duplicate content website get ban.

    rss only way .

    unless you really want it but will get website ban

    copy and paste quicker
     
  4. patriotnews

    patriotnews Senior Member

    Joined:
    Oct 25, 2015
    Messages:
    840
    Likes Received:
    454
    Location:
    under there
    Who will "ban" the website?
    And as to it being classified as duplicate content, so?

    OP: what cms are you using? Lots of good RSS feed autopost plugins for WordPress.
     
    • Thanks Thanks x 1
  5. Montella

    Montella Newbie

    Joined:
    Jun 26, 2012
    Messages:
    17
    Likes Received:
    3
    I'm on Wordpress. Any particular plugins you would recommend?
     
  6. greatops

    greatops Jr. VIP Jr. VIP

    Joined:
    Nov 8, 2014
    Messages:
    410
    Likes Received:
    72
    Location:
    Freedom

    Can code this for you if you want.


    Cheers
     
  7. patriotnews

    patriotnews Senior Member

    Joined:
    Oct 25, 2015
    Messages:
    840
    Likes Received:
    454
    Location:
    under there
    WPeMatico free works good enough to get you started. You can schedule many channels, many schedules.
    With just the free version, I was running 47 different RSS news feeds, over 102 different posting schedules.
     
    • Thanks Thanks x 1
  8. bobojonathan

    bobojonathan Regular Member

    Joined:
    Sep 12, 2014
    Messages:
    237
    Likes Received:
    18
    Do you pull full feeds from the news sites and how do you get rid of duplicate post.
     
  9. patriotnews

    patriotnews Senior Member

    Joined:
    Oct 25, 2015
    Messages:
    840
    Likes Received:
    454
    Location:
    under there
    I pull full feeds from news sides and cut out the ones that block full posts.
    I set it up so it only pulls the most 1-2 most recent posts, and there is a check box on WPeMatico that allows you to not pull duplicate content from the sites, so I avoid pulling the same articles.
     
    • Thanks Thanks x 1
  10. itz_styx

    itz_styx Jr. VIP Jr. VIP

    Joined:
    May 8, 2012
    Messages:
    356
    Likes Received:
    122
    Occupation:
    CEO / Admin / Developer
    Location:
    /dev/mem
    Home Page:
    you can do that with argo content's article scraper, you can also filter out urls and unwanted garbage with it.
     
  11. wangbu

    wangbu Registered Member

    Joined:
    Apr 6, 2008
    Messages:
    98
    Likes Received:
    32
    Thank you, I'd check it as well.
     
    • Thanks Thanks x 1
  12. yellowcat

    yellowcat Regular Member

    Joined:
    Aug 27, 2015
    Messages:
    295
    Likes Received:
    163
    Location:
    internet 24/7
    Home Page:
    I can code this if you'd like
    pm here or on skype @ yellowcat1771