1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Help with archiving a web page

Discussion in 'Web Design' started by uregister, Feb 7, 2014.

  1. uregister

    uregister Newbie

    Joined:
    Jul 14, 2013
    Messages:
    5
    Likes Received:
    0
    I'm looking at a web page that has links to 575 PDFs. The problem is that the webpage itself has nice, descriptive titles for each PDF, but the file names are just the dates plus one or two extra numbers. So, the most desirable thing to do would be to archive the web page itself and use the web page as an interactive index for when I want to explore these PDFs. However, obviously I can't just mass download the PDFs and save the webpage and expect the web page and the PDFs to be linked. So, what I'm wondering is can I do what I just described and then connect the folder that I downloaded the PDFs to to the archived web page, that way I have a fully functional version of the web page.

    Sorry, I don't know much about web design.

    By the way, I already tried HTTrack, and despite experimenting with different mirroring depths, I wasn't able to obtain what I want. I think it may have to do with the fact that this is on a member's only section of the website.
     
  2. cyrix

    cyrix Junior Member

    Joined:
    Sep 19, 2008
    Messages:
    179
    Likes Received:
    61
    Occupation:
    Full Time Internet Marketer\Developer
    Location:
    United States
    I could write a tool to go in and download those PDF's and build an html index for you. Let me know if your interested.
     
  3. Nattsurfaren

    Nattsurfaren Regular Member

    Joined:
    Apr 12, 2010
    Messages:
    409
    Likes Received:
    49
  4. uregister

    uregister Newbie

    Joined:
    Jul 14, 2013
    Messages:
    5
    Likes Received:
    0
    Someone on IRC told me that one just has to find/replace the website links to the PDFs with a forward slash, which would make the web page pull the PDFs from the same folder in which the web page is located. He did this for me, and I placed a PDF in the same folder of the archive web page, what happened is the forward slahs gets transformed into a rerquest for the C drive. Placing the PDF in the C drive does indeed make the index functional, but I know from other archived websites that I have that a simple forward slash should not redirect to the C drive. Why is it defaulting to the C drive. I'm assuming some type of header is also coming into play.

    Nattsurfaren, I tried that extrension and it didn't recognize that there were any PDF files on the web page for some reason.

    cyrix, thanks, but I think I'll just try to work this out.