archive.org scrapper needed - been searching everywhere for days

mg3hockey · May 13, 2013

Hello all,

I recently pickedup about 15 domains with legit pr3-pr7 to link to my money sites, but putting the sites back together with original content is a complete bitch from archive.org.

I have been search for days for a script that would do this for me but have found 0 results. I have tried 4-5 different google codes

from httrack.com to webscraping.com to warrick and nothing seems to work.

All I need is a download of all the pages wayback machine has archived for its most recent cache of the site and be able to upload to a server.

I do need footer and header links and mention of archive.org removed as well obviously. If anyone can point me in the right direction that would be awesome.

Michael

mikeydell · May 13, 2013

Im not sure how advanced you are, but you can or used to be able to download the archive.org source code they use for the engine, and at one time it had a api to pull data straight from archive.org. This was a few years back so not sure today, but I had a copy of it setup on a server a few years ago and it was pretty straight forward and a great way to use there data.

mak3r · May 13, 2013

do you speak russian? i came across a russian tool the other day that can do that I suppose

mg3hockey · May 13, 2013

Hmm didnt think about the API will have to look into this.. any other replies welcome!!

Montgomery76 · May 13, 2013

Use Web Archive Downloader (google it and download from cnet) . It is far from perfect but it browses web archvive and download all pages for the years you select.

mg3hockey · May 13, 2013

^ that is a great tool to download sites that are currently live but I am needing a tool to download sites that are down/expired as I am purchasing expired domains.

Montgomery76 · May 13, 2013

mg3hockey said:
^ that is a great tool to download sites that are currently live but I am needing a tool to download sites that are down/expired as I am purchasing expired domains.

Hm, no it crawls wayback machine at internet archive and download past content of the site - not live ones.

mg3hockey · May 13, 2013

Hmm thats strange because I downloaded it and selected years 2004-2010 and tried to download things and got "could not connect to remote server errors"

so either archive.org is not accept their API or something is not working in between.

Montgomery76 · May 13, 2013

Actually it works fine on one of my computers but not so good (same as yours) on this one so I would give it a try again.. sorry but thats the only software I have found and I have used it succesfully for 10+ domains.

archive.org scrapper needed - been searching everywhere for days

mg3hockey

Newbie

mikeydell

Senior Member

mak3r

Supreme Member

mg3hockey

Newbie

Montgomery76

Registered Member

mg3hockey

Newbie

Montgomery76

Registered Member

mg3hockey

Newbie

Montgomery76

Registered Member

Main Menu

Marketplace

Making Money

BlackHat World