1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to scrape data from webpages?

Discussion in 'General Programming Chat' started by back2black, May 12, 2009.

  1. back2black

    back2black Junior Member

    Joined:
    Dec 20, 2008
    Messages:
    115
    Likes Received:
    28
    Hi Guys,

    Does anybody know how to scrape data from websites using c+.

    As in what I am trying to do is load up twitter and then just scrape usernames. Or load up flickr and scrape usernames?

    Does anyone know the functions and stuff you need to do this?

    Thanks very much for your help!


    ps. the reason I havent posted this solely in the c++ section of the programming section is because if anybody know/recommends doing it in a different language... as Ive only just started learning c++ Ive not invested so much time so that I can change languages
     
  2. wowmanwow

    wowmanwow Registered Member

    Joined:
    Nov 7, 2008
    Messages:
    60
    Likes Received:
    12
    VB.NET my friend. You can make badass blackhat apps..
     
  3. harrisunderwork

    harrisunderwork BANNED BANNED

    Joined:
    Jan 20, 2009
    Messages:
    52
    Likes Received:
    4
    I think you have to do socket programming to get web pages .

    I am not quite sure that regular expression can be used with c++.

    Best Of Luck :)
     
  4. jarhead

    jarhead Junior Member

    Joined:
    Jan 17, 2009
    Messages:
    114
    Likes Received:
    45
    Location:
    OZ
    Home Page:
    easy
    winsock + some nice HTTP requests and a decent parsing function.
    Don't go with this .net crap, pure C is the hottest shit out.
     
  5. Rendias

    Rendias Registered Member

    Joined:
    May 14, 2009
    Messages:
    91
    Likes Received:
    34
    I wouldn't scape using your own PC ... try upload and run the scraping script on a free hosting server. PHP or perl should be easy for this task.
     
  6. cemdev

    cemdev Newbie

    Joined:
    May 7, 2009
    Messages:
    22
    Likes Received:
    14
    Google is your friend. Search for 'web page scraping'. You can scrape using almost any language. Unless you're already well experience with c++ i wouldn't recommend such a low-level language.

    PHP is probably a good choice because it's useful for other web related tasks. If you want to use microsoft languages, then at least consider c# over c++ - you'll save yourself a lot of time.
     
  7. sikx

    sikx Registered Member

    Joined:
    Jan 4, 2009
    Messages:
    65
    Likes Received:
    166
    Location:
    Germany
    Home Page:
    The problem with scraping with C++ and C etc. is that they are still a pain in the ass to work with when it comes to string manipulation. Just to properly put integers into a string you will have to create stringstreams etc.
    Unless you are really keen on making it extremely fast (which is most of the time just premature optimization), go with scripting languages like PHP, Ruby or Python.
    There are very nice libraries like cURL, Mechanize and DOM for these that make scraping extremely easy.
    Here is a tutorial I wrote for scraping with PHP and DOM: http://nytemarez.com/scraping-with-php-and-dom/
    Even if you are not coding in PHP, it will help you understanding the basic principles on how to scrape websites (and how to not go detours).
     
    • Thanks Thanks x 1
  8. dwpg002

    dwpg002 Senior Member

    Joined:
    Dec 29, 2008
    Messages:
    917
    Likes Received:
    47
    Can you try this software"http://www.newprosoft.com/web-spider.htm" which solve your problem? I am using "Web Content Extractor" I am doing lot on this which make my work simple