Can someone help with Regex on google serps?

tpickett

Newbie
Joined
Feb 1, 2012
Messages
30
Reaction score
8
I have a script that gets all the results from google serps. It is working just fine. But when the big "G" throws in the news results or image results, my regex doesn't catch it and thinks it part of the top 10 that I am trying to capture.
Here is what im doing with my regex after curl of the serp page:
PHP:
preg_match_all('/<h3 class="r"><a href="([^"]+)">(.*?)<\/a><\/h3>/', $scraped, $preUrls);
^^This is basically just finding all the h3's with a class of "r".

Then im finding the span with the actual URL in it:
PHP:
preg_match_all('/<span class="st">(.*?)<\/span>/', $scraped, $predesc);

and parsing the URL out of the span:
PHP:
preg_replace('/\/url\?q=/','',$preUrls[1]);
preg_replace('/&.*/','',$repbeg);

Again my issue is that the regex is catching anything H3 with a class of "r". How would I restucture my regex to only grab the H3's with a class of "r" that are inside a Div with a class of "vsc"?

Help is greatly appreciated!
 
I would write code to take everything between "<div class=\"vsc\"" and "</div>" then run your regex in there.

If there are divs inside that div, the logic gets a little more complicated
 
Back
Top