1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Crawling to a site that uses ajax and XML

Discussion in 'Black Hat SEO' started by ortal, Nov 29, 2008.

  1. ortal

    ortal Junior Member

    Joined:
    May 27, 2008
    Messages:
    106
    Likes Received:
    10
    I try to crawl to a site that shows on the screen data that doesn't appear in the html source. Instead it calls on loading to an ajax javascript function from an exernal file which performs an XML request. How can get this data from a perl script?
    I don't know much on XML and ajax but I can learn whatever is needed.
     
  2. aftershock2020

    aftershock2020 Senior Member

    Joined:
    Oct 19, 2007
    Messages:
    981
    Likes Received:
    477
    Clarify your question a little.

    If you are referring to extracting the source code from the files of the site that involve php/perl, you need to forget about it and move on or have a programmer build a clone of the features you want.

    Browsers can't read php/perl and if there is an XML script involved for a full ajax application like you are talking about here...It is mostlikely, using javascript and php to construct the pages within the browser and then using the XML as a security/redirect feature application.

    As an ajax programmer, these features are pretty common practice.

    Without seeing the site you are talking about, I'm not at liberty to say what it is. However, most sites that use this construction method set the site up as a secure looping process to protect their code from being ripped.

    The reason this is so effective and how it works is that when you try to rip/scrape the protected area, the site application uses a redirect to lead you to the alternative XML page that you are seeing as the end result. By doingthat, the site doesn't let you remotely close to the source code through the normal scraping access points.

    Short of being literally hacked or cloned outright, you won't get the information you are looking for. I suggest you pay a programmer and have it cloned.


     
  3. ortal

    ortal Junior Member

    Joined:
    May 27, 2008
    Messages:
    106
    Likes Received:
    10
    I wil explain it better - when I surf manually to the site I see some rough data.
    This is what I need and no the perl/php script. the HTML source has in the corresponding place one <div> line - something like <div id="data" >&nbsp</div>. I try to scrape this data with a perl script.
    Wish it's clearer now.
     
  4. Steb

    Steb Registered Member

    Joined:
    Dec 20, 2006
    Messages:
    64
    Likes Received:
    0
    you should be able to see what URL the ajax data is being pulled from if you look through the source. load that page in your browser, and see if the data displays. then view source.
     
  5. ortal

    ortal Junior Member

    Joined:
    May 27, 2008
    Messages:
    106
    Likes Received:
    10
    it's just like you said.
    thanks