1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Cleaning Scraped Articles

Discussion in 'Black Hat SEO' started by thepowaz, Jul 15, 2013.

  1. thepowaz

    thepowaz Newbie

    Joined:
    Mar 31, 2013
    Messages:
    1
    Likes Received:
    0
    What's the best way to clean articles that have been scraped from other sites? Meaning remove the html, random tags etc.

    Having issues doing this before sending content to wordai turing api.
     
  2. bartosimpsonio

    bartosimpsonio Jr. VIP Jr. VIP Premium Member

    Joined:
    Mar 21, 2013
    Messages:
    8,953
    Likes Received:
    7,573
    Occupation:
    ZLinky2Buy SEO Services
    Location:
    ⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩
    Home Page:
    Use a HTML parser, extract the text.
     
  3. ShabbySquire

    ShabbySquire Power Member

    Joined:
    Nov 30, 2011
    Messages:
    574
    Likes Received:
    122
    Location:
    UK
    At a basic level, you can use Notepad ++ and use the search & replace feature.

    Example: search for html tag <body> and replace with nothing (leave blank). You can also use Regex (regular expressions) to strip out any old crap.