1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Server-Based Website-Crawler

Discussion in 'Black Hat SEO Tools' started by dddd, May 5, 2009.

  1. dddd

    dddd Jr. VIP Jr. VIP Premium Member

    Joined:
    Jun 4, 2007
    Messages:
    164
    Likes Received:
    158
    Hey guys, does somebody of you know of a good server-based website-crawler?

    What I mean by that is:

    (a) input URL of website

    (b) crawler starts to crawl every single page on the website and saves the page as file

    result: a directory on the server with every single page the website has for offline "processing / viewing" ;)

    Getting single URL's is pretty easy, but crawling indefinite levels deep is a different thing... :-( So anybody any suggestions please?
     
  2. JinxY

    JinxY Regular Member

    Joined:
    May 14, 2007
    Messages:
    376
    Likes Received:
    66
    try with wget or curl
     
  3. bipingaur

    bipingaur Newbie

    Joined:
    Apr 22, 2009
    Messages:
    11
    Likes Received:
    0
    Occupation:
    Last job I held was of a Delivery Manager, I am co
    Location:
    New delhi India
    Home Page:
    Try whttrack it is free and fast
     
  4. iintense

    iintense BANNED BANNED

    Joined:
    May 1, 2009
    Messages:
    49
    Likes Received:
    56
    Yes, HTTrack is a great program and very fast. It also allows options such as which images to pull and howin depth you want to copy the site.
     
  5. zone69

    zone69 Junior Member

    Joined:
    Nov 24, 2008
    Messages:
    196
    Likes Received:
    1,290
    Create a directory to store the site on your server and run wget below, replacing http://somesite.com with the site you wish to download.
    Code:
     wget -r -k http:/somesite.com
    
    This will recursively download the site locally.


    Or you can download wget for windows and do the same on your windows machine.
    Code:
    http://gnuwin32.sourceforge.net/packages/wget.htm
    
    Hope that helps.
     
    • Thanks Thanks x 3
  6. the_demon

    the_demon Jr. Executive VIP

    Joined:
    Nov 23, 2008
    Messages:
    3,177
    Likes Received:
    1,563
    Occupation:
    Search Engine Marketing
    Location:
    The Internet
    Why server side???

    Just Google search backstreet browser. It's a great website mirroring program.
     
  7. dddd

    dddd Jr. VIP Jr. VIP Premium Member

    Joined:
    Jun 4, 2007
    Messages:
    164
    Likes Received:
    158
    (a) server-side because it's faster than with my dsl connection
    (b) thanks zone69 for the wget idea, didn't know it has crawling options, will try this, again thanks!