1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Content / Site Scraping

Discussion in 'Black Hat SEO' started by mystic, Jan 2, 2009.

  1. mystic

    mystic Junior Member

    Joined:
    Jun 9, 2008
    Messages:
    167
    Likes Received:
    37
    I need to scrape a bunch of articles off a website. Whats the best way of scraping these 1500 pages?

    Sorry if its been posted before but I searched for 20 minutes and haven't found anything
     
  2. eranium

    eranium Newbie

    Joined:
    Nov 23, 2008
    Messages:
    41
    Likes Received:
    15
    use HTTrack, you can scrape the whole site with this.
     
    • Thanks Thanks x 1
  3. mystic

    mystic Junior Member

    Joined:
    Jun 9, 2008
    Messages:
    167
    Likes Received:
    37

    sure does :)

    Thanks a lot.. wish it had an option just to scrape the text off the page and not the whole .html page but this will work.

    Thanks
     
  4. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,227
    Occupation:
    Retired
    Location:
    Old Peoples Home
    You could probably wrote a script to rip what you are after - are there tags around the article that are easy to pick out at all?

    I have been scraping content from pages using a couple of PHP classes and a bit of imaginative scripting!
     
  5. ubermann

    ubermann Registered Member

    Joined:
    Jul 11, 2008
    Messages:
    51
    Likes Received:
    44
    This can just scrape the article content off the website. There is a free trial. And I think on this forum there is a trial reset if you do a search for it.
     
  6. mystic

    mystic Junior Member

    Joined:
    Jun 9, 2008
    Messages:
    167
    Likes Received:
    37
    I tried that link uberman and it didnt quite work like I needed.. thanks though
     
  7. mystic

    mystic Junior Member

    Joined:
    Jun 9, 2008
    Messages:
    167
    Likes Received:
    37
    anyone have any other recommendations? I would like to find something that rips the text (article) and not the whole html file.. anyone have any other recommendations?
     
  8. miker99

    miker99 Registered Member

    Joined:
    Oct 15, 2008
    Messages:
    61
    Likes Received:
    8
  9. mystic

    mystic Junior Member

    Joined:
    Jun 9, 2008
    Messages:
    167
    Likes Received:
    37
    any more suggestions before I just pay someone to rip the site for me? Every scraper ends up ripping the whole site that is over 10gb with a shitload of useless files.. i just need the 800 or so articles ;D
     
  10. bhw123

    bhw123 Registered Member

    Joined:
    Jul 27, 2007
    Messages:
    51
    Likes Received:
    68
    scraped the content for mystic, 3k+ pages
    time-consuming work:(
     
  11. virus_1720

    virus_1720 Jr. VIP Jr. VIP Premium Member

    Joined:
    May 9, 2008
    Messages:
    1,686
    Likes Received:
    1,197
    Location:
    BHW
    scraping can be a tedious task but make sure you don't give up.