I have a script that gets all the results from google serps. It is working just fine. But when the big "G" throws in the news results or image results, my regex doesn't catch it and thinks it part of the top 10 that I am trying to capture.
Here is what im doing with my regex after curl of the serp page:
^^This is basically just finding all the h3's with a class of "r".
Then im finding the span with the actual URL in it:
and parsing the URL out of the span:
Again my issue is that the regex is catching anything H3 with a class of "r". How would I restucture my regex to only grab the H3's with a class of "r" that are inside a Div with a class of "vsc"?
Help is greatly appreciated!
Here is what im doing with my regex after curl of the serp page:
PHP:
preg_match_all('/<h3 class="r"><a href="([^"]+)">(.*?)<\/a><\/h3>/', $scraped, $preUrls);
Then im finding the span with the actual URL in it:
PHP:
preg_match_all('/<span class="st">(.*?)<\/span>/', $scraped, $predesc);
and parsing the URL out of the span:
PHP:
preg_replace('/\/url\?q=/','',$preUrls[1]);
preg_replace('/&.*/','',$repbeg);
Again my issue is that the regex is catching anything H3 with a class of "r". How would I restucture my regex to only grab the H3's with a class of "r" that are inside a Div with a class of "vsc"?
Help is greatly appreciated!