1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Can I do this with PHP

Discussion in 'PHP & Perl' started by bmstalker, Feb 1, 2012.

  1. bmstalker

    bmstalker Newbie

    Joined:
    Jan 8, 2010
    Messages:
    31
    Likes Received:
    5
    Hi all,

    I've recently been learning a bit of PHP/mySQL..... just because :)

    I'm not sure where else I can ask this so I hope the BHW community has some talented PHP coders out there. What I'd like to know is this:

    Can php/mysql import data into it's database from another site, say every hour, even if the site doesn't have an export api? Such as the every changing price of an item in an online shop, or some sort of statistic on a site. I would want to export this data, from 20 different sites and put it all in one database to run queries from.

    Is this possible?

    PS, I don't need the script, just a yes/no perhaps how you would do it. I'd pay a PHP developer to build the app for me.
     
  2. Narrator

    Narrator Power Member

    Joined:
    Oct 5, 2010
    Messages:
    507
    Likes Received:
    396
    Occupation:
    Internet Marketing
    Location:
    /dev/null
    Yes, you can scrape data from other sites and have it added to your database. You can schedule it to run with cron jobs.

    But if you need to scrape every product on a site it wouldn't be very efficient.
     
  3. bmstalker

    bmstalker Newbie

    Joined:
    Jan 8, 2010
    Messages:
    31
    Likes Received:
    5
    Awesome, I've heard the term cron, but never really looked into what it means. Eseentially, I'd like to have an app build that will take say 60 events a week and scrape 3 pieces of info from 20 sites for each event perhaps every 2 hours or so. I'll store this info in the database and then use it to run various comparison queries on. Is that feesable? How long would it take a competant PHP developer to write something like that?
     
  4. maximviper

    maximviper BANNED BANNED

    Joined:
    Oct 25, 2010
    Messages:
    338
    Likes Received:
    86
    from what i understood . u need yo write php scraper script whihc extracts data from those pages after regular interval of time using cron jobs and update ur Db if there are any chages?
     
  5. maximviper

    maximviper BANNED BANNED

    Joined:
    Oct 25, 2010
    Messages:
    338
    Likes Received:
    86
    cron jobs is just something through which u can call any php scripts on the server.
    for example i write a php script to post 1 articles to my website everytime it runs.and i want 24 articles to be posted on my site everyday.
    so i go to cron job interface and set it for running my script after every 1 hour.

    i think for ur work it can easily be handles on a regular php 5 enabled webhost. and the script creation cost wont be too much as its just a simle scriptp
     
  6. bmstalker

    bmstalker Newbie

    Joined:
    Jan 8, 2010
    Messages:
    31
    Likes Received:
    5
    Ah, that's great info, thanks
     
  7. HealeyV3

    HealeyV3 Power Member

    Joined:
    Mar 4, 2009
    Messages:
    521
    Likes Received:
    344
    Look into cURL Scraping and parsing. Shouldn't be too horrible depending on how many pages you have to scrape.

    If you're looking to hire a PHP programmer, let me know :)
     
  8. Bryan

    Bryan Power Member

    Joined:
    Aug 25, 2009
    Messages:
    565
    Likes Received:
    292
    look into regular expression, learn it because it's probably one of the most useful thing you can learn while learning php. i LOVE regular expression, it's fun also and with it you can scrape pretty much anything
     
  9. randolph60

    randolph60 Junior Member

    Joined:
    May 13, 2011
    Messages:
    191
    Likes Received:
    48
    Yes, in PHP it's possible to communicate in both direction.

    You can call trough an URL a PHP Script with ?parameter (file_get_contents) and (depending on the parameter) you can create a result which is parsed from the root domain.

    Look at http://www.phpclasses.org for parsing classes. Then it's very easy.
     
  10. inviz

    inviz Newbie

    Joined:
    Jun 15, 2010
    Messages:
    45
    Likes Received:
    5
    I would recommend you to check out the simplehtmldom library, it's very easy to learn, and in most cases you dont have to sit there for hours trying to figure out regexp to parse the data!

    If the sites you are going to scrape is large, php is not recommended at all, if you want to run in on a regular basis. I have scraped some large sites in the past using cURL and preg_match, simplehtmldom and the PHP's built-in XML parsers, none of those solutions are fast, so if these sites infact are large, I would recommend looking into nodeJS or python wich is multithreaded.

    Good luck, and if you have any questions about scraping, feel free to PM me, I do this kind of work every day.
     
  11. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,143
    The problem with simplehtmldom is that it 's a real memory hog. Try using Zend_Dom.
     
  12. Narrator

    Narrator Power Member

    Joined:
    Oct 5, 2010
    Messages:
    507
    Likes Received:
    396
    Occupation:
    Internet Marketing
    Location:
    /dev/null
    An awesome tool for learning regex is regexBuddy. Its worth the money with the time it saves, at least for someone like me that doesn't work with regex that often.
     
    • Thanks Thanks x 1
  13. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,143
    +1 for RegexBuddy - a time saver!
     
  14. bmstalker

    bmstalker Newbie

    Joined:
    Jan 8, 2010
    Messages:
    31
    Likes Received:
    5
    Thanks for all the helpful posts guys, as with always in PHP/programming in general, there is a lot of reading needing done :)

    Thanks for all the answers though.
     
  15. madsem

    madsem Junior Member

    Joined:
    Aug 23, 2010
    Messages:
    121
    Likes Received:
    40
    IMHO it would be better to use curl in combination with a task management queue like beankstalkd or a custom task magement, where you save each task to a db table along with status information about the task "active", "finished" etc and then have the job processor query the table in intervals either by using a cronjob or by firing the script when a page on your domain is visited.

    You could also run your php task a daemon, but this solution is really whacky and needs a lot of monitoring.

    PHP and long running tasks is a real pain, I think python in general has better options for scraping.
     
  16. xpwizard

    xpwizard Junior Member

    Joined:
    Nov 6, 2010
    Messages:
    198
    Likes Received:
    122
    Personally I use "RegExr". Great tool and it's free :)

    Code:
    http://gskinner.com/RegExr/
    @madsem -> Perl and Python are better tools for large long running scrapers, but PHP can be just as effective if you're logical with your coding and queries.
     
    • Thanks Thanks x 1
  17. madsem

    madsem Junior Member

    Joined:
    Aug 23, 2010
    Messages:
    121
    Likes Received:
    40
    yeah it can be, but it's a pain to debug until you have it running smoothly without any memory leaks :) and even then it's quite slow, even with APC installed most scrapers run reaaaaally slow if you scrape a lot of data
     
  18. randomnumbers

    randomnumbers Newbie

    Joined:
    Jun 18, 2011
    Messages:
    20
    Likes Received:
    2
    Im using CURL and SimpleHtmlDom. SimpleHtmlDom really easy to learn, it uses css-selectors.