1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to scrape articles into text files ?

Discussion in 'Black Hat SEO' started by devilived, Aug 19, 2009.

  1. devilived

    devilived Junior Member

    Joined:
    Jul 13, 2009
    Messages:
    124
    Likes Received:
    59
    I want to scrape articles from article directories into text files.

    Any one can guide me how can i set this up.

    I want to set up a system that can scrape all articles from a particular category.. like scraping all articles from health category.

    It should be saved into individual txt files.


    Thanx
     
  2. Spaceman

    Spaceman Regular Member

    Joined:
    Aug 8, 2009
    Messages:
    435
    Likes Received:
    53
    I would personally write a small VB6 desktop app.

    You need to learn a programming language (or at least the basics) so that you can do very easy things such as

    Open webbrowser objects
    Load webpages
    Scan the HTML
    Extract the text
    Save to a text file

    The program is only around 200 lines long at tops (and that is to include things like category filters and a DB of HTML code which surrounds the articles on an article site by article site basis (as they are all different).

    Learning VB6 would be an easy way to start - there are better languages (VB.NET or C#) to name two for this app - but VB6 is (as languages go) quite a good one to start out with.

    Spaceman
     
    • Thanks Thanks x 1
  3. devilived

    devilived Junior Member

    Joined:
    Jul 13, 2009
    Messages:
    124
    Likes Received:
    59
    thanx Spaceman for you help .

    Do u know if it will be possible with WinAutomation or Imacro do set up something to accomplish this job ?
     
  4. Spaceman

    Spaceman Regular Member

    Joined:
    Aug 8, 2009
    Messages:
    435
    Likes Received:
    53
    Devilived - I dont know those products - I only know programming languages.

    Honestly - a good VB book and a copy of VB6 if you can get it and you would be writing all your own programs in a month.

    I personally hold the view that learning a language is (perhaps) the best investment anyone in BH IM could make. It takes some time and effort but well worth it.

    I would think there are plenty of scrapers around already - have you googled "article scraper" for example? That may show up a load. I would think you need a desktop scraper - so maybe try searching for "desktop article scraper" or maybe "XP article scraper" or "Vista article scraper" - I am sure you get the picture there.

    If you want to learn a language I would go with any of these but this is also my order of preference (thats personal though to a programmer) - there will be others I dont know of aswell - less mainstream or a whole lot newer.

    VB6 / VB.net
    ASP.Net
    C#
    Java
    PHP

    Spaceman
     
    • Thanks Thanks x 1