1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Newb ?: Tool That Will Extract All Pages/Posts Text From Blog/Website

Discussion in 'Black Hat SEO Tools' started by madman, May 25, 2009.

  1. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    Hi, I am looking for a tool that will crawl a website like B00k Maarkiing Deemmon but will return all the Pages/Posts Text not just a snippet of the text. Sorry if this is such a basic question.

    Thanks
     
  2. Rendias

    Rendias Registered Member

    Joined:
    May 14, 2009
    Messages:
    91
    Likes Received:
    34
    You want to scrap the whole website/blog or just getting the latest blog posts?
     
  3. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2

    Hi Rendias,

    I'm interested in extracting all the website/blog pages/posts.

    Thanks
     
  4. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    Any chance that one experienced member could offer a quick answer to my question? I would truly appreciate any input.

    Thanks
     
  5. markm455

    markm455 Registered Member

    Joined:
    Apr 1, 2007
    Messages:
    92
    Likes Received:
    10
    Have you tried something like "offline explorer pro"? They have a 30 day free trial of the software if you want to give it a shot.
     
    • Thanks Thanks x 1
  6. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    Thanks markm455. I did not know that there was commercial software that would do what I wanted.

    Thanks again
     
  7. nipester

    nipester Regular Member

    Joined:
    Feb 1, 2009
    Messages:
    256
    Likes Received:
    28
    Have you heard about wget? It's free and can run from VPSes.
     
    • Thanks Thanks x 1
  8. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    nipester, I have not heard of wget. Is VPSes just a server like hosgator? I don't know much about running scripts if I have to set them up. If you could explain a litle more in detail what this involves I may understand better.

    Thanks
     
  9. zone69

    zone69 Junior Member

    Joined:
    Nov 24, 2008
    Messages:
    196
    Likes Received:
    1,290
    • Thanks Thanks x 1
  10. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    zone69,

    Thanks for the info. Does wget have a GUI interface or do you have to use command line instructions to use the program?

    Thanks
     
  11. ru ru

    ru ru Junior Member

    Joined:
    Mar 23, 2009
    Messages:
    109
    Likes Received:
    30
    I'd rather use httrack.com than wget.
     
    • Thanks Thanks x 1
  12. niggles

    niggles Newbie

    Joined:
    May 23, 2009
    Messages:
    38
    Likes Received:
    14
    Occupation:
    Web developer
    Location:
    Melbourne Australia
    Home Page:
    Are you wanting to grab a whole page or just the articles?

    If it's just the articles, then even if you've grabbed the whole page, to automatically strip the article, you need to figure out the tags the article is between to strip out the actual article e.g <div id="article">some text...</div>

    Cheers,
    Niggles
     
    • Thanks Thanks x 1
  13. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    Niggles,

    Just the articles. B00kMarkkingDeemon has a function where it crawls a website but it only returns approx.100-200 charcters of the article in txt format, not the whole article. So I'm looking for something like BMD except that it will return the whole article text.

    Thanks
     
  14. matapples01

    matapples01 Regular Member

    Joined:
    May 15, 2008
    Messages:
    358
    Likes Received:
    208
    There's a Jr VIP Neta10 who has a paid scraper that may do what you're looking to do. I haven't used it to pull content but you enter the tags to search between and it can pull content from between the tags. It's here..

    HTML:
    http://neta1o.com/index.php?/programs.html
     
    • Thanks Thanks x 1
  15. cemdev

    cemdev Newbie

    Joined:
    May 7, 2009
    Messages:
    22
    Likes Received:
    14
    x2 httrack - it'll copy an entire website.
     
    • Thanks Thanks x 1
  16. twobol2002

    twobol2002 Newbie

    Joined:
    Mar 12, 2009
    Messages:
    40
    Likes Received:
    13
    you can also use TelePort
     
    • Thanks Thanks x 1
  17. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2

    bobfrapples,

    You have finally described what I am looking for. Essentially, I want a content scraper that will pull content from niche blogs so that I can rewrite the content to be unique. Just like rewriting an EEzineeAArticle. Are there other scrapers out ther too?

    Thanks
     
  18. nirose

    nirose Senior Member

    Joined:
    Oct 24, 2008
    Messages:
    992
    Likes Received:
    439
    Location:
    somake.us
    I too use httrack. its very usefull and you can download with the filetype that you want.
     
  19. madman

    madman Newbie

    Joined:
    Apr 11, 2009
    Messages:
    17
    Likes Received:
    2
    Hi nirose, I downloaded and installed httrack. I cannot figure out how to just download the blog posts text, during a test using one of my own blogs.

    Could you explain how to configure the program to accomplish this task.

    Thanks
     
  20. cemdev

    cemdev Newbie

    Joined:
    May 7, 2009
    Messages:
    22
    Likes Received:
    14
    madman - httrack doesn't work that way. it'll make an offline copy of an entire website.

    if you want to extract just blog posts you're going to have to write a scraper - and it's most likely going to have to be modified to the site you're scraping. you don't have to start from scratch - there are hundreds of free scraper scripts out there - pick one that comes close to what you want and modify it for the site you want to scrape. it shouldn't take long or be very hard, but you'll need some basic programming knowledge (this is a good thing to acquire anyways)
     
    • Thanks Thanks x 1