Content scraper with url crawling (PHP and crons)

Discussion in 'Hire a Freelancer' started by micasa001, May 20, 2012.

  1. micasa001

    micasa001 Newbie

    Feb 14, 2009
    Likes Received:
    • [h=3]Budget:[/h] ?0-?250 EUR
    [h=3]Project Description:[/h]I need a couple of content scrapers based on PHP that will work with automatic crons. The functionalities of these scrapers must be at lease:

    1. Easy to manage: in an easy administrator overview dashboard I can add the url that must be scraped. Now I can add the classes on the site that I will scrape and select from a drop down menu in what database colomn I will add the content.

    * For example: I will scrape the title of a site article. For this I add the class 'class=title' and select the database colomn 'colomn1'.
    * Multiple colomns (at lease 10) must be available

    2. Automatic scan the whole website: when I like that the script will crawl the whole website I can select this option in the administrator panel. The script will now crawl every url that correspondent with the url I have submit, and check if the there is corresponding data to add to my database.

    * Check for dubble entries: the script check for dubble entries. Dublicated content will not be placed in the database.
    * Add the url to the database: the urls where the data is scraped must be added to every row of content in the database so the script can check if the url is already crawled.
    * Check daily for new updates: the script can check the sites daily for new content. So when the URL has new articles on the site or new products, the script will automatic pick these items and add the content to my database.
    * Dates: the database must include a date column so I can see when the data is scraped.
    * Use random IP addresses and automatically fill this: the system must have a separated database for IP addresses. A script must update the IP list daily and scrape new IP addresses from websites. Also open source IP website should be used to update the list.
    â—‹ The scraping scripts must use the random IP addresses to scrape data.

    My budget is low so don't bid more than the project budget. Also, only bid and send PM when you can do the job in max. 7 days.

    NOTE: Only bid when you have read the project details. Don't send messages with all kinds of example links you build before that are not relevant. Only send project relevant messages or I will report the messages as spam.

    Thanks already for the replies. Hope to find a long relationship development partner.

    Best regards