1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[ASK] Selective Scraping

Discussion in 'PHP & Perl' started by twisted_one, Mar 10, 2011.

  1. twisted_one

    twisted_one Regular Member

    Joined:
    Oct 11, 2009
    Messages:
    498
    Likes Received:
    82
    There is a basic HTML website static website. I need only selective parts of that site. The site somewhat looks like what I have attached in the Basic HTML.JPG file

    That HTML page is fully made of tables, p tags, h1 tags and nothing more. NO DIVS, NO CSS, NO Dynamic content.

    How can I grab selective parts of that page. I know in php file_get_contents($url) gets you the whole page but I want only selective.

    Lets say for example I want the date and that name Rachel Mathews alone. How can it be done? Any tools or any scripting languages that will help me?
     

    Attached Files:

  2. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    Use simple_html_dom or phpQuery or regex. Whichever you want, any one works.
     
    • Thanks Thanks x 1
  3. twisted_one

    twisted_one Regular Member

    Joined:
    Oct 11, 2009
    Messages:
    498
    Likes Received:
    82
    MadOctopus,

    Thanks for the advice. I am now google'ing those. I am not that good in RegEx so I will try to look at those two options first. In the meantime, do you mind giving me an example?

    EDIT:

    Okay got even the examples at http://simplehtmldom.sourceforge.net/manual.htm

    I now have another query -

    Once I have that value in a variable using PHP. is there any way that I can write it in a Notepad (.txt) file?
     
    Last edited: Mar 10, 2011
  4. abantu

    abantu Newbie

    Joined:
    Oct 4, 2008
    Messages:
    11
    Likes Received:
    22
    PHP:
    $fp fopen('data.txt''w');
    fwrite($fp'1');
    fwrite($fp'23');
    fclose($fp);
     
    • Thanks Thanks x 1
  5. twisted_one

    twisted_one Regular Member

    Joined:
    Oct 11, 2009
    Messages:
    498
    Likes Received:
    82
    I have a doubt

    Code:
    <?php
    include('simplehtmldom/simple_html_dom.php');
    $html = file_get_html('http://somedomain.com?QryParm=1');
    echo $html
    ?>
    Let us say for example I will be iterating this code block for QryParm 1 to 10000. I can do that using a for loop. BUT, my problem is this link
    Code:
    http://somedomain.com?QryParm=1
    is redirected to some other URL. How do I handle this URL redirect?
     
  6. zelma143

    zelma143 Power Member

    Joined:
    Jun 25, 2010
    Messages:
    571
    Likes Received:
    37
    Occupation:
    PHP programmer,Bot maker,iMacro script maker
    try to do with preg_match...

    it can help you out...
     
  7. twisted_one

    twisted_one Regular Member

    Joined:
    Oct 11, 2009
    Messages:
    498
    Likes Received:
    82
    Friends,

    I managed to find a solution using simple html dom finally :) Thanks to all those who replied and helped. I did not know about that html dom before until I heard it here. I really thank you guys!

    I however have a doubt. As of now,

    $html = file_get_html('URL1');

    BUT this URL 1 redirects to some other URL 2 on that same domain with some query parameters. How can this be handled in the simple html dom? Any advise please?
     
  8. easyroms

    easyroms Newbie

    Joined:
    Nov 5, 2009
    Messages:
    15
    Likes Received:
    1
    Your best bet is to use CURL,

    php.net/manual/en/curl.examples-basic.php

    CURL can also follow header redirects.