1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[ASK] Can some WISE Guru tell me how to do this??

Discussion in 'Black Hat SEO' started by MarketerX, Jun 25, 2011.

  1. MarketerX

    MarketerX Regular Member

    Joined:
    Mar 7, 2010
    Messages:
    398
    Likes Received:
    120
    Hello, I use HTTrack to copy webpages.

    I want to copy a site with tons of product reviews. However, I want ONLY the Articles, none of the excess HTML that HTTrack also gathers.

    So I am guessing there is a program, and I would use some sort of Regular Expression to output only the stuff in the article container div, or whatever?

    How do you suggest I go about extracting the raw articles from a ton of HTML files...

    PLEASE HELP GUYS!! This will be a good learning experience for me :p
     
  2. artur6000

    artur6000 BANNED BANNED

    Joined:
    Jan 2, 2010
    Messages:
    188
    Likes Received:
    171
    Make a bot that will extract to every article to a separate txt file in some folder. And make it scrape text after specific html line and finish on another. this way you;ll get only articles and all separated...

    But don't ask me to do it.. I'm not a coder... I'm...... the IDEA MAN :p
     
  3. talruum

    talruum Junior Member

    Joined:
    Dec 21, 2010
    Messages:
    161
    Likes Received:
    30
    If you're a coder, write your own stuff to do it. Perl/python can do that easily.

    If you're not a coder and run windows, try "web content extractor". Google for it.

    []s
     
  4. MarketerX

    MarketerX Regular Member

    Joined:
    Mar 7, 2010
    Messages:
    398
    Likes Received:
    120
    Thanks man, I will look into this.

    I am working on my coding skills but I'm learning PHP/Javascript...:p I figured someone allready made something that will get the job done...

    If anyone else has any more suggestions besides Web Content Extractor, PM me or post here
     
  5. KraftyKyle

    KraftyKyle Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Aug 13, 2008
    Messages:
    1,942
    Likes Received:
    4,610
    Gender:
    Male
    Location:
    Unknown
    There's software around the internet to rip blogs.. some sort of autoblogging software it seems your looking for.
     
  6. m0nster

    m0nster Senior Member

    Joined:
    Oct 20, 2010
    Messages:
    1,044
    Likes Received:
    1,003
    Occupation:
    Offline Marketing
    Location:
    USA
  7. MarketerX

    MarketerX Regular Member

    Joined:
    Mar 7, 2010
    Messages:
    398
    Likes Received:
    120
    Gonna check this out, but I had problems installing .net 4 the other day... :confused::confused::confused:
     
  8. MarketerX

    MarketerX Regular Member

    Joined:
    Mar 7, 2010
    Messages:
    398
    Likes Received:
    120
    It is $149...is there a crack available? Or did you actually purchase it :p