1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

scraping website from web.archive - is it possible?

Discussion in 'BlackHat Lounge' started by davids355, Dec 8, 2014.

  1. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,787
    Likes Received:
    6,326
    Home Page:
    I normally use winHTtrack software for scraping websites.
    However, I want to get a website from web.archive.
    The site is there, but that software doesnt seem capable of scraping from the archive site.
    Any other way to do it?
     
  2. mktanny

    mktanny Regular Member

    Joined:
    Oct 22, 2009
    Messages:
    225
    Likes Received:
    62
    Occupation:
    Blog editor and IM
    Try offline explorer pro , i have been using it successfully with some custom coding ...

    It is not perfect , without doing some custom work ...like removing archive.org comments , replacing all external links etc
     
  3. accessted

    accessted Junior Member

    Joined:
    Aug 14, 2014
    Messages:
    179
    Likes Received:
    50
    Anything else available for scraping Archive.org sites? User friendly without custom coding?
     
  4. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,787
    Likes Received:
    6,326
    Home Page:
    I heard someone is building a tool to do just this. It would be pretty useful. But I havent found anything else usable so far.
     
  5. laur.laurix

    laur.laurix Regular Member

    Joined:
    May 8, 2013
    Messages:
    408
    Likes Received:
    154
    Location:
    Mars
    I had a thread opened about this topic not long ago. None of the services I tried convinced me. I`m still looking for a method or a software that can perform the scrapping from archive.