1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Site Extractor - Help

Discussion in 'Web Design' started by qwicksilver, Jul 19, 2012.

  1. qwicksilver

    qwicksilver Junior Member

    Joined:
    Aug 15, 2011
    Messages:
    137
    Likes Received:
    55
    There is a website I would like to get some practice tests from, but do not want to do it manually. What I would like to do is be able to download each page from an address like this:

    http://www.xxxx.com/test.php?do=taketest&resultsid=4501

    Where "resultsid=4501" I want to make it so that I can download not only 4501 but 0 - 40000. Something like, "resultsid=x", where the program where download all filed 0 - 40000. But I want it to only download files if they contain information (ie. no blank page or error). I wont really be downloading 40000, more like 1000 pages, as most of them are blank or contains an error.

    Summary: I am looking for a program/script that will download all the pages by replacing "x" with a number, and it must only download those without a blank or an error on the page.

    Can any one help me out or at least point me in the right direction?

    Thanks,

    -- qwicksilver
     
  2. neomaeva

    neomaeva Newbie

    Joined:
    Sep 13, 2011
    Messages:
    25
    Likes Received:
    5
    hi qwicksilver

    I suggest to use Imacros
    I wrote a script for you:

    Code:
    
    SET !ERRORIGNORE YES
    SET !EXTRACT_TEST_POPUP NO
    
    
    SET !LOOP 4000
    URL GOTO=yoururl&resultsid={{!LOOP}}
    SAVEAS TYPE=MHT FOLDER=* FILE={{!LOOP}}
    
    
    here the macro will save yoururl&resultsid=4000 then yoururl&resultsid=4001 etc...
    so change the value of "SET !LOOP 4000" to start where you want

    But if you didn't know what is Imacros, I guess that it will be hard to understand ...

    Hope it helps
    good luck
     
    • Thanks Thanks x 1
  3. qwicksilver

    qwicksilver Junior Member

    Joined:
    Aug 15, 2011
    Messages:
    137
    Likes Received:
    55
    Thanks for the script neo. But I also need a filter so that it doesn't download the pages that contain no information. They are all valid pages (all have their css design), but they do not all have text on them. So I do not need to download all 40000 pages, only those that contain text. Any ideas?
     
  4. neomaeva

    neomaeva Newbie

    Joined:
    Sep 13, 2011
    Messages:
    25
    Likes Received:
    5
    oh ok ...
    however, with imacros you can save specific text of a page... but I think it is not what you are looking for ...
    you can't use condition with Imacros
    So I don't have any (simple) idea :/ sorry
     
    Last edited: Jul 19, 2012
  5. qwicksilver

    qwicksilver Junior Member

    Joined:
    Aug 15, 2011
    Messages:
    137
    Likes Received:
    55
    Thanks for the help neo.

    I cannot set a range with "httrack" or "website ripper copier". I might try Acunetix 8, but I am not sure if it will do what I want it. I'd rather not install Acunetix (never to sure about cracked programs), as I do not want to install VB/VMware to test it out right now.

    Any other ideas as to how I can download the pages with a range and set conditions?

    Others are welcomed to join this discussion =)
     
  6. qwicksilver

    qwicksilver Junior Member

    Joined:
    Aug 15, 2011
    Messages:
    137
    Likes Received:
    55
    Update: I went ahead and sandboxed acunetix and got a script going that only pulls pages with a minimum word count. Thanks for the help =)