1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

cross domain iframe scraping

Discussion in 'General Programming Chat' started by wseo12, Mar 12, 2014.

  1. wseo12

    wseo12 Regular Member

    Joined:
    May 23, 2009
    Messages:
    221
    Likes Received:
    7
    Occupation:
    FUll TIME IM
    whats up guys I need a script that can scrape an iframe the iframe is on my site the src of the iframe leads to another site I have been having trouble trying to find a script that can do this I would really appreciate if some one can provide me with a script or some sample code how I can go about scraping a cross domain iframe thanks

    I tried doing it in javascript I don't think a browser based language will work I was thinking a server side language such as php should work
     
  2. wseo12

    wseo12 Regular Member

    Joined:
    May 23, 2009
    Messages:
    221
    Likes Received:
    7
    Occupation:
    FUll TIME IM
    I would really appreciate if some one will contribute I really need help i have been stuck on this for a while.
     
  3. wseo12

    wseo12 Regular Member

    Joined:
    May 23, 2009
    Messages:
    221
    Likes Received:
    7
    Occupation:
    FUll TIME IM
    over 80 views and not one reply I know some one on here can help thanks
     
  4. Gogol

    Gogol Elite Member

    Joined:
    Sep 10, 2010
    Messages:
    3,062
    Likes Received:
    2,872
    Gender:
    Male
    Why not just use php's file_get_contents / curl?

    it will be like:
    Code:
    <?php
    $url = 'youriframeurl';
    echo file_get_contents($url);?>
    
    instead of
    Code:
    <html>
    <iframe src="youriframeurl"/>
    </html>
    
    let me know if you need a curl req instead of the built in function.
     
  5. gtownfunk

    gtownfunk Registered Member

    Joined:
    Jan 26, 2011
    Messages:
    99
    Likes Received:
    26
    Occupation:
    Software Developer
    Location:
    Austin, TX
    Home Page:
    Do you know how to do any programming? I've done some of this screwing around with the browser control in c#. They try to 'protect' from cross-site scripting attacks, its one of the few protections that javascript has in the browser.

    Also, you might be able to disable cross-site scripting protection using PhantomJS. Check out PhantomJS.

    gtownfunk
     
  6. Schvamp

    Schvamp Power Member

    Joined:
    Feb 13, 2012
    Messages:
    684
    Likes Received:
    549
    Location:
    Hogwarts
    If you go for g0g0l's solution, which I would recommend, you could use http://simplehtmldom.sourceforge.net/ to scrape a more specific element or text.
    Not the easiest task if you have no coding skills tho.
     
    • Thanks Thanks x 1
  7. gtownfunk

    gtownfunk Registered Member

    Joined:
    Jan 26, 2011
    Messages:
    99
    Likes Received:
    26
    Occupation:
    Software Developer
    Location:
    Austin, TX
    Home Page:
    I was answering under the assumption that its not as simple as just requesting the URL in the iframe in this case. That would clearly be the simplest and if that works without a problem.. then agreed, do g0g0ls script. If some javascript needs to be run in the context of the iframe or in the iframe src url that's not going to work, though.

    gtownfunk
     
  8. wseo12

    wseo12 Regular Member

    Joined:
    May 23, 2009
    Messages:
    221
    Likes Received:
    7
    Occupation:
    FUll TIME IM
    hi yea I would consider my self advanced in php and intermediate in javascript gtownfunk was the best reply I already used curl but this will not work because when I call the page with curl the browser session gets lost my whole purpose for this is I need the browser session to be active as I am trying to scrape an email from the iframe and the email only displays in an iframe but does not display when I call it with curl because curl is a browser the session is new with curl

    I want the email so I can display it on my page for example welcome sampleemail@yahoo. and another big reason I need it is so I can prepopulate my form with the user email if you get what if saying (;
     
    Last edited: Mar 18, 2014
  9. Chakeda

    Chakeda Newbie

    Joined:
    Nov 11, 2013
    Messages:
    35
    Likes Received:
    3
    Home Page:
    Use AJAX.

    Code:
    $.ajax({
                    url: 'http://www.thewebsitewithiframe.com'
                    type: 'POST',
                    crossDomain: true,
                    data: {email:email },
                    success:function(data){
                         $("#email").html(data);                
                    },
                    error:function(xhr,status,error){
                        alert(status);
                    }
    });
    
    I'm not great with AJAX and this is an abstract example, but I think you can call file_get_contents using cross domain AJAX. Might wanna look into AJAX and check stackoverflow.
     
  10. Gogol

    Gogol Elite Member

    Joined:
    Sep 10, 2010
    Messages:
    3,062
    Likes Received:
    2,872
    Gender:
    Male
    To have that working, the other domain needs to allow cross origin calls. It will have issues with IE < 9 too.
     
  11. gtownfunk

    gtownfunk Registered Member

    Joined:
    Jan 26, 2011
    Messages:
    99
    Likes Received:
    26
    Occupation:
    Software Developer
    Location:
    Austin, TX
    Home Page:
    Yeah, I'd look at doing it in C# with a webbrowser control or like I said, look into PhantomJS.

    gtownfunk
     
    • Thanks Thanks x 1
  12. wseo12

    wseo12 Regular Member

    Joined:
    May 23, 2009
    Messages:
    221
    Likes Received:
    7
    Occupation:
    FUll TIME IM


    Thanks gtownfunk I will look into phantomjs will phantomjs solve cross domain issues plus keep all session data ?
     
  13. gtownfunk

    gtownfunk Registered Member

    Joined:
    Jan 26, 2011
    Messages:
    99
    Likes Received:
    26
    Occupation:
    Software Developer
    Location:
    Austin, TX
    Home Page:
    If it doesn't let you do the cross domain scripting out of the box there are compiled binaries floating around that have this security feature disabled.

    It'll keep session data, but it'll be a slightly different way of looking at things. Your current code might work just fine though.

    gtownfunk
     
    • Thanks Thanks x 1
  14. wseo12

    wseo12 Regular Member

    Joined:
    May 23, 2009
    Messages:
    221
    Likes Received:
    7
    Occupation:
    FUll TIME IM
    thanks gtownfunk I appreciate it any idea where I can get the compiled binaries
     
  15. Grizzy

    Grizzy Senior Member

    Joined:
    Nov 11, 2008
    Messages:
    919
    Likes Received:
    999
    Since you don't have control over the iframe the same-origin policy will totally prevent the sort of thing your looking to do... no ifs, ands, or buts. So outside of some super rare third party information leakage/SOP bypass browser vulnerability, it's 100% impossible.