1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Can someone help with Regex on google serps?

Discussion in 'Other Languages' started by tpickett, Jan 31, 2013.

  1. tpickett

    tpickett Newbie

    Joined:
    Feb 1, 2012
    Messages:
    30
    Likes Received:
    8
    Occupation:
    SEO *****
    Location:
    Kansas City
    I have a script that gets all the results from google serps. It is working just fine. But when the big "G" throws in the news results or image results, my regex doesn't catch it and thinks it part of the top 10 that I am trying to capture.
    Here is what im doing with my regex after curl of the serp page:
    PHP:
    preg_match_all('/<h3 class="r"><a href="([^"]+)">(.*?)<\/a><\/h3>/'$scraped$preUrls);
    ^^This is basically just finding all the h3's with a class of "r".

    Then im finding the span with the actual URL in it:
    PHP:
    preg_match_all('/<span class="st">(.*?)<\/span>/'$scraped$predesc);
    and parsing the URL out of the span:
    PHP:
    preg_replace('/\/url\?q=/','',$preUrls[1]);
    preg_replace('/&.*/','',$repbeg);
    Again my issue is that the regex is catching anything H3 with a class of "r". How would I restucture my regex to only grab the H3's with a class of "r" that are inside a Div with a class of "vsc"?

    Help is greatly appreciated!
     
  2. bpmik

    bpmik Newbie

    Joined:
    Feb 4, 2013
    Messages:
    45
    Likes Received:
    8
    I would write code to take everything between "<div class=\"vsc\"" and "</div>" then run your regex in there.

    If there are divs inside that div, the logic gets a little more complicated