1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Best language to use for this, how long to create, page monitoring/scraper?

Discussion in 'Black Hat SEO Tools' started by Viltedali, Jul 27, 2012.

  1. Viltedali

    Viltedali Regular Member

    Joined:
    Feb 10, 2008
    Messages:
    305
    Likes Received:
    32
    Location:
    Midwest-US
    I need to be able to monitor for new reports added to this page (there are more, but I'll start with this one):

    Code:
    http://uniontwpoh.policereports.us/
    The date must be input for you to be able to see the reports posted for a certain day, which, the date entered by the code would be the current day, for the most part.

    If a new report has been posted, an email is sent to me.

    I imagine once per hour is often enough to check. Just to see something work, it just searching the current day's date is good. Once working, it would need to be modified to check the current day's date each interval, plus, one day of the last ten. So, if July 26th, it would search July 26th, plus, July 25th. The next time it ran, it would search July 26th, plus July 24th. Next time, 26th and 23rd, etc, back to the 16th, then it would start back with the 26th and 25th.

    I guess it would just need to monitor for new content added to the page (new reports).

    What would be the best way of doing this, with what coding language, php and a cron job, macro(?), etc

    Any idea on how long to create?
     
  2. csguy

    csguy BANNED BANNED

    Joined:
    Jul 13, 2012
    Messages:
    396
    Likes Received:
    42
    I don't see any data on that page.
     
  3. Viltedali

    Viltedali Regular Member

    Joined:
    Feb 10, 2008
    Messages:
    305
    Likes Received:
    32
    Location:
    Midwest-US
    Thanks for the reply.

    You have to enter a date into the date field and click the button, to then see the reports (if any) for that day.
     
  4. csguy

    csguy BANNED BANNED

    Joined:
    Jul 13, 2012
    Messages:
    396
    Likes Received:
    42
    Can you give an example day? I tried a few and didn't find any. You could possibly do it with yahoo pipes if the data is easily extractable. If not a ruby or perl script would do the trick. But really it doesn't matter what language you use as long as you can use regex and http requests.
     
    • Thanks Thanks x 1
  5. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    example date: 07/24/12

    i think you could realise that with php and crontabs, in some reasonably time.
    your main problem is that, they dont give you a get api, so you will need to use post to get something out of it.

    Greetz
     
    • Thanks Thanks x 1
  6. Viltedali

    Viltedali Regular Member

    Joined:
    Feb 10, 2008
    Messages:
    305
    Likes Received:
    32
    Location:
    Midwest-US
    Yes, 07/24/12 was the example date I had in mind as well.

    I thought this would be a relatively simple task, but no?
     
  7. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    hey,

    well the problem is with the api they provide, which is not exactly anything at all it's just 3 fields.

    Second usually you try to scrape the result based on a api like if you want scrape google,

    https://www.google.com/search?q=google

    so you have as GET q which is the keyword, so now if you want to make another query:

    https://www.google.com/search?q=twitter

    so you can easly make a scraper and then just pick out the results you want.

    So i checked it a little out.
     
    • Thanks Thanks x 1
  8. csguy

    csguy BANNED BANNED

    Joined:
    Jul 13, 2012
    Messages:
    396
    Likes Received:
    42
    What do you want, a spreadsheet format? Will csv do? I think you can get someone on oDesk to do that for $5-10.
     
    • Thanks Thanks x 1
    Last edited: Jul 27, 2012
  9. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    They try to hide the parameters and use the paramter hidden.

    Code:
    http://uniontwpoh.policereports.us/search_post.html?action=7CrmeBcr5tY%253D&ReportID=&Date=07%2F24%2F12&Victim=&x=55&y=3[code]
    
    [code]
    http://uniontwpoh.policereports.us/search_post.html
    //url
    
    ?
    action=7CrmeBcr5tY%253D
    //first param
    &
    ReportID=
    // param for an id
    &
    Date=07%2F24%2F12
    // param for a date
    &
    Victim=
    //param for a victim
    &
    x=55
    // not given
    &
    y=3
    // not given param
    
    Now we know the api :)

    you can freely change the date and i will rederict you back to the page with the results and ofcourse with results :)
     
    • Thanks Thanks x 1
  10. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    Why do they setup their main website with wordpress and not update it?

    WordPress 3.2.1
     
  11. Viltedali

    Viltedali Regular Member

    Joined:
    Feb 10, 2008
    Messages:
    305
    Likes Received:
    32
    Location:
    Midwest-US
    Thanks for the information.

     
  12. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    Hmm,

    that depends on how do you want to use it?

    Do you have a desktop/laptop and fire the program up and want then notification?

    or

    Do you have a server where it can run forever and send you an email?

    also

    Operating system? Windows, Mac, Linux

    :) and there you go
     
  13. Viltedali

    Viltedali Regular Member

    Joined:
    Feb 10, 2008
    Messages:
    305
    Likes Received:
    32
    Location:
    Midwest-US
    Well, running on a cron would be nice, so I'd not have to think about it. Running on a server would be fine, as, I guess that's how it would be with a cron job.

    Linux
     
  14. csguy

    csguy BANNED BANNED

    Joined:
    Jul 13, 2012
    Messages:
    396
    Likes Received:
    42
    cronjob that runs a script and then emails the output to you.
    You can do the email output part with crontab MAILTO="whatever- at -foo"

    The script just needs to echo something useful and it will get emailed.
     
  15. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    I <3 linux on servers,

    i think the best way to do this would be creating a php application which can be saved on the server and scrapes for this kind of pdfs.

    but if you want me to help you out drop me a pm.

    Greetz
     
  16. Viltedali

    Viltedali Regular Member

    Joined:
    Feb 10, 2008
    Messages:
    305
    Likes Received:
    32
    Location:
    Midwest-US
    My regular coder said that it looks like it uses javascript, so not all of the parameters are visible or something, so, all things needed to get to the reports pages can't be determined?
     
  17. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    Hey,

    the output is not based on JVscript.

    uniontwpoh.policereports.us/search_post.html?

    action=7CrmeBcr5tY%253D

    &

    Date=07%2F24%2F12


    Those 2 are the main parameters to get the output, i think what he means is fetching/downloading the pds.
     
    • Thanks Thanks x 1
  18. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    So next post, analysing the JS

    Code:
    
    function DisplayReportP2 (reportid) 
    //function name
    { 
    
         
            var thetarget = "ReportView"; 
    //assign a variable
    
             thetarget =  window.open('http://uniontwpoh.policereports.us/viewreportpdfrawv.html?sid=q3ml0q3dpqai7i8ojap3se3qa0&rid='  + reportid + '&f=report.pdf', thetarget,  'screenX=200,screenY=200,width=700,height=550,toolbar=0,location=0,status=0,scrollbars=0,resizable=yes'); 
    
    //seems like they have a hash q3ml0q3dpqai7i8ojap3se3qa0
    // and after that they just ask for the pdf on that date
    
            thetarget.focus(); 
    
        } 
    
    
     
  19. necro

    necro Regular Member

    Joined:
    Dec 23, 2010
    Messages:
    292
    Likes Received:
    189
    All i want to know is, where does this hash come from?

    q3ml0q3dpqai7i8ojap3se3qa0

    If you got this question filtered out, you most likely could pull down all their pdfs...
     
    • Thanks Thanks x 1
  20. sockpuppet

    sockpuppet Junior Member

    Joined:
    Nov 7, 2011
    Messages:
    155
    Likes Received:
    145
    The hash is your session id.
     
    • Thanks Thanks x 1