Crawling to a site that uses ajax and XML

ortal

Junior Member
Joined
May 27, 2008
Messages
104
Reaction score
11
I try to crawl to a site that shows on the screen data that doesn't appear in the html source. Instead it calls on loading to an ajax javascript function from an exernal file which performs an XML request. How can get this data from a perl script?
I don't know much on XML and ajax but I can learn whatever is needed.
 
Clarify your question a little.

If you are referring to extracting the source code from the files of the site that involve php/perl, you need to forget about it and move on or have a programmer build a clone of the features you want.

Browsers can't read php/perl and if there is an XML script involved for a full ajax application like you are talking about here...It is mostlikely, using javascript and php to construct the pages within the browser and then using the XML as a security/redirect feature application.

As an ajax programmer, these features are pretty common practice.

Without seeing the site you are talking about, I'm not at liberty to say what it is. However, most sites that use this construction method set the site up as a secure looping process to protect their code from being ripped.

The reason this is so effective and how it works is that when you try to rip/scrape the protected area, the site application uses a redirect to lead you to the alternative XML page that you are seeing as the end result. By doingthat, the site doesn't let you remotely close to the source code through the normal scraping access points.

Short of being literally hacked or cloned outright, you won't get the information you are looking for. I suggest you pay a programmer and have it cloned.


I try to crawl to a site that shows on the screen data that doesn't appear in the html source. Instead it calls on loading to an ajax javascript function from an exernal file which performs an XML request. How can get this data from a perl script?
I don't know much on XML and ajax but I can learn whatever is needed.
 
I wil explain it better - when I surf manually to the site I see some rough data.
This is what I need and no the perl/php script. the HTML source has in the corresponding place one <div> line - something like <div id="data" >&nbsp</div>. I try to scrape this data with a perl script.
Wish it's clearer now.
 
you should be able to see what URL the ajax data is being pulled from if you look through the source. load that page in your browser, and see if the data displays. then view source.
 
Back
Top