1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Curl and regex

Discussion in 'PHP & Perl' started by Packers, Apr 10, 2011.

  1. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Hey guys,

    I can't see why the following isn't working on yahoo


    Code:
      preg_match_all('/<\s*form[^<>].*?name=[\'"]?(.*?)?[\s\'"].*>.*?<\/form>/',$data,$match);
    
    If i remove the .*?<\/form> it finds the right thing. any ideas where i'm screwing up? Think it worked for google...
     
  2. eskimo

    eskimo Regular Member

    Joined:
    Dec 1, 2008
    Messages:
    475
    Likes Received:
    178
    im not great with regex, but i can try help
    what you trying to do here?
     
  3. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    I'm just trying to get the name of all the forms on the page. Its a little task for me to learn some regex. Really frustrating though!

    So, here's where I am

    Code:
    preg_match_all('/<\s*form[^<>].*?name=[\'"]?(.*?)?[\s\'"].*>/i',$data,$match);
    This returns

    Array
    (
    [0] => Array
    (
    [0] => <form role="search" name="sf1" method="get" id="p_13838465-searchform" class="search-form" action="linkwhichicannotpost">
    )

    [1] => Array
    (
    [0] => sf1
    )

    )

    for yahoo. Which is good.

    This on the other hand,

    Code:
    preg_match_all('/<\s*form[^<>].*?name=[\'"]?(.*?)?[\s\'"].*>(.*?)<\/form>/i',$data,$match);
    returns an empty array.

    Thinkin I should play around with /s tag
     
  4. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Code:
    preg_match_all('/<\s*form[^<>].*?name=[\'"]?(.*?)?[\s\'"].*>(.*?)<\/form>/si',$data,$match);
    Still an empty array... Must be something wrong in (.*?) now I think?
     
  5. eskimo

    eskimo Regular Member

    Joined:
    Dec 1, 2008
    Messages:
    475
    Likes Received:
    178
    im asking my clever buddy on skype quick, gimme 2 minutes hopefully he can help, he is a regex nerd :)

    update: he is having a look at it
     
    Last edited: Apr 10, 2011
  6. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Thanks :) Regex is so confusinggggg! :p

    EDIT: I'l have to try whatever he says when I get back from uni :( Will be a few hours. Will let you know how it goes. Thanks again!
     
    Last edited: Apr 10, 2011
  7. eskimo

    eskimo Regular Member

    Joined:
    Dec 1, 2008
    Messages:
    475
    Likes Received:
    178
    ok he says you should just try get it with strpos from the first result:

    Code:
    preg_match_all('/<\s*form[^<>].*?name=[\'"]?(.*?)?[\s\'"].*>/i',$data,$match);
    
    because you have the entire form field in the array there. so find the position of the name=" part of the return data, and get all the characters after that until the next "

    i can do it for you quick if you need help

    PM me when you back if you need help
     
    Last edited: Apr 10, 2011
  8. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    The thing is, I get the right result when I just have that code above. It works. I don't follow why it doesnt work when I try to find whats in between the <form ...>more stuff i want to store</form>.
    In the end I'd like to be able to grab every tag on a page, and every attribute of that tag and its values. I'm sure regex alone should be able to handle this!

    It appears that there is a mistake with >(.*?)<\/form> and I thought this might be down to the new lines and stuff in the source code. Mmmm
     
  9. tantanleblanc

    tantanleblanc Newbie

    Joined:
    Dec 16, 2009
    Messages:
    3
    Likes Received:
    0
    preg_match_all('/<.*form.*name=[\'"]?([^\'"]*)[\'"]?[^>]*>/iUs',$data,$match);
     
  10. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,252
    Likes Received:
    3,518
    Occupation:
    Full time IM
    I would recommend using simple_html_dom class when dealing with DOM. it is less error prone and much more simple to use, especially for what you want.
     
    • Thanks Thanks x 1
  11. kaidoristm

    kaidoristm BANNED BANNED

    Joined:
    Feb 13, 2009
    Messages:
    564
    Likes Received:
    727
    Ok answer for your first question

    PHP:
    preg_match_all('|<form[^<]*name=[\'"](.*?)[\'"][^<]*>|'$from$form);
    Which returns you all form names if there is an name for form.
    for your yahoo test getting "sf1" as only form name is correct cause there is some more forms but without names.

    For your second one i can see that youll wish to get data between form tags with specific form name

    PHP:
    preg_match_all('|<form[^<]*name=[\'"](.*?)[\'"][^<]*>|'$from$form);

    foreach(
    $form[1] as $value)
    {
        
    preg_match('|<form[^<]*'.$value.'[^<]*>(.*?)</form>|'$from$forms);

         
    print_r($forms[1]);
    }
    Which will display you all data between form tags form all form names.
     
    • Thanks Thanks x 1
    Last edited: May 1, 2011
  12. kaidoristm

    kaidoristm BANNED BANNED

    Joined:
    Feb 13, 2009
    Messages:
    564
    Likes Received:
    727
    Hm seems that you have to clean your html from unvanted characters which will break regex ok here's an edited version

    PHP:

        
    // Whatever your using to grab content file_get_contents or curl output holds your html
        
    $output "YOUR HTML HERE";
        
    $replacement = array('/\s\s+/','/\v/');
        
    $from preg_replace($replacement,' '$output);
        
        
    preg_match_all('|<form[^<]*name=[\'"](.*?)[\'"][^<]*>|'$from$form);

        foreach(
    $form[1] as $key => $value)
        {
            
    preg_match('|<form[^<]*name=[\'"]'.$value.'[\'"][^<]*>(.*?)</form>|'$from$forms);

            
    print_r($forms);
        }
    Now that's an badass solution ;)