1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How would you scrape content from these sites?

Discussion in 'General Scripting Chat' started by Frankie4Fingers, Mar 27, 2010.

  1. Frankie4Fingers

    Frankie4Fingers Power Member

    Joined:
    Jan 8, 2009
    Messages:
    676
    Likes Received:
    214
    Basically I'd like to scrape the results of these custom search engines and put it on one of my Wordpress site:

    1) http://www.raiway.rai.it/index.php?lang=IT (click on the map of Italy, then on the city, then on the town to see the results page)

    2) http://www.mediasetpremium.mediaset.it/informazione/copertura/copertura.shtml


    Looking around in the forum I found this program suggested for a similar problem:

    http://simplehtmldom.sourceforge.net/

    Do you think it could work for my case or would you suggest another solution?

    Thank you. :)
     
  2. c0ntenth|ef

    c0ntenth|ef Power Member

    Joined:
    May 20, 2009
    Messages:
    788
    Likes Received:
    118
    Location:
    california
    do the have rss feeds? just fetch their rss and put it on ur own site
     
  3. Frankie4Fingers

    Frankie4Fingers Power Member

    Joined:
    Jan 8, 2009
    Messages:
    676
    Likes Received:
    214
    If they had feeds, I wouldn't have asked this question ;)
     
  4. Deprecated

    Deprecated Registered Member

    Joined:
    May 19, 2009
    Messages:
    78
    Likes Received:
    25
    Unfortunately that's a job for Perl. If I have to script a scraper for something like that I use Perl's LWP library and some regular expressions to get the job done. This might be something to hire a coder for.
     
    • Thanks Thanks x 1
  5. Frankie4Fingers

    Frankie4Fingers Power Member

    Joined:
    Jan 8, 2009
    Messages:
    676
    Likes Received:
    214
    Do you have any idea on how much that would cost? And doing this way, would it work like RSS feed parsing (e.g., I want on my page only content related to a specific town, I enter in the script as a keyword the name of that town and then the information correlated get scraped and put on my page)?