Javascript interpreter for scraping

Discussion in 'HTML & JavaScript' started by Grizzy, Feb 5, 2009.

  Grizzy

    Grizzy

    Nov 11, 2008
    I've never seen anyone talk about this on this forum before, but I have been using an open source javascript interpreter, produced by some people at a tech school, called Cr0wbar. It uses Mozilla's headless firefox-like enviroment and is able to interpret most javascript I have come across. So when you have some content you want to scrape that is buried or hidden in javascript, you can get at it no prob! I have made tons of bots that perform functions that would not be possible without this. It is available for Windows and linux and is pretty easy to install too.

    Once you have everything up and running this program will run as a sort of proxy on your localhost (or LAN) that will act as the interpreter for whatever method you are using to scrape (ie cURL).

    So basically you would use cURL to GET http://localhost:10000/?url= There are a few more arguments then that, but that is basically the jist it. Cr0wbar will pass the the interpreted web page on to cURL for you to massage and manipulate in any way you wish.

    You can even use a program like Proxychains or FreeCap to force Cr0wbar to communicate through a proxy so you can scrape anonymously.

    I don't know if anyone finds this useful, I've just used it so much I thought I should share it. There isn't much documentation to speak off, but there is enough on the homepage to get you started. Happy scraping!
  ashilicious

    ashilicious

    Aug 14, 2008
    Hey grizzy, thanks heaps for the heads up on this.

    You don't know if it is possible to do the same thing with flash somehow do you?

    Cheers again :)