1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

CHALLANGE: Weird issue scraping a retailer w/ curl PHP ...

Discussion in 'PHP & Perl' started by kylestyle, Jun 4, 2016.

  1. kylestyle

    kylestyle Newbie

    Joined:
    Jun 4, 2016
    Messages:
    3
    Likes Received:
    0
    Gender:
    Female
    Setup...
    I have been trying to scrape pricing information from retailer "lowes" hardware store but they have some kind of redirect going on that appears to attempt to set your local store zip code by the IP address you are coming from. The page does not show the price of the product until it sets the cookie.

    How to duplicate...
    1) Disable cookies in Chrome
    2) Goto lowes website
    3) Watch it try and run some auto zip code cookie setter code and redirect indefinitely (since cookies are disabled)
    4) You will see the price on the page for the product is missing until you allow it to set this cookie

    Problem Trying To Solve...
    How to set the correct cookie in php through a proxy so that the price shows up in the html response? Then delete the cookie when done. Any ideas? I have never seen this before and I have been scraping webpages for years... I am not sure what cookie it is and/or if there is a script you can POST to set it.
     
  2. kylestyle

    kylestyle Newbie

    Joined:
    Jun 4, 2016
    Messages:
    3
    Likes Received:
    0
    Gender:
    Female
    More info...
    I think this is the form that needs to be submitted before price shows up...

    HTML:
    <div id="enterZip">
    <form method="post" action="/LowesStoreSearchCmd" name="storeSearchForm" id="storeSearchForm">
    <fieldset>
    <input type="hidden" name="URL" value="TopCategoriesDisplayView"/>
    <input type="hidden" name="storeId" value="10151" />
    <input type="hidden" name="catalogId" value="10051" />
    <input type="hidden" name="langId" value="-1"/>
    <input type="hidden" name="findStoreErrorURL" value="StoreLocatorDisplayView"/>  
    <input type="hidden" name="firstReferURL" value="" />
    <input type="text" name="zipCode" id="zipCode" maxlength="20" value="Enter ZIP Code"/>
    <button type="submit" class="button secondary" id="submitZip"><span>Submit</span></button>  
    </fieldset>
    </form>
     
  3. kylestyle

    kylestyle Newbie

    Joined:
    Jun 4, 2016
    Messages:
    3
    Likes Received:
    0
    Gender:
    Female
    I think i solved it...

    PHP:
    $url 'http://www.urltostore.com/pagetoproductgoeshere';
    $cookiefile str_replace('\\','/',dirname(__FILE__)).'/cookies/somename.txt';
    $zipcode '90210';
    $posturl 'http://www.urltostore.com/LowesStoreSearchCmd';
    $postfields 'URL=TopCategoriesDisplayView&storeId=10151&catalogId=10051&langId=-1&findStoreErrorURL=StoreLocatorDisplayView&firstReferURL=&zipCode='.$zipcode;

    // POST: Zip code to store
    $ch curl_init();
    curl_setopt($chCURLOPT_COOKIEJAR$cookiefile);
    curl_setopt($chCURLOPT_URL,$posturl);
    curl_setopt($chCURLOPT_POST1);
    curl_setopt($chCURLOPT_POSTFIELDS$postfields);
    ob_start();      // prevent any output
    curl_exec ($ch); // execute the curl command
    ob_end_clean();  // stop preventing output
    curl_close ($ch);
    unset(
    $ch);

    // GET HTML
    $ch curl_init();
    curl_setopt($chCURLOPT_RETURNTRANSFER,1);
    curl_setopt($chCURLOPT_COOKIEFILE$cookiefile);
    curl_setopt($chCURLOPT_URL,$url);
    $html curl_exec ($ch);
    curl_close ($ch);
     
  4. revproxy

    revproxy BANNED BANNED Jr. VIP Premium Member

    Joined:
    Nov 20, 2015
    Messages:
    396
    Likes Received:
    100
    Gender:
    Male
    Try to use Phantomjs or selenium, its using real browser and the dom is rendered....