Webpage scraper

Discussion in 'Other Languages' started by soklot, Sep 21, 2012.

  1. soklot

    soklot Newbie

    Joined:
    Aug 24, 2012
    Messages:
    19
    Likes Received:
    3
    Hi to you all
    I saw couple threads about getting data from webpage so this is simple java example
    you will need jsoup jar file in your project
    Code:
            Document doc = null;
            try {
                String currUrl;
                doc = Jsoup.connect(ENTER BLACKHATWORLD URL INCLUDING HTTP).get();
            } catch (IOException e) {
                ;
            }
            
            
               Elements allElsWithClassName = doc.getElementsByClass("threadbit");
               Iterator it = allElsWithClassName.iterator();
               while(it.hasNext()){
                   Element b = (Element)it.next();
                   System.out.println(b.getElementsByClass("threadtitle").text());
               }
    

    what we are doing here is getting names of posts
    i think it is easy to follow this example so if you have any questions just ask :)
     
  2. Question

    Question Registered Member

    Joined:
    Aug 14, 2011
    Messages:
    51
    Likes Received:
    32
    JSoup is a nice lib, but have you tried to use it in very large app? Is it good in terms of performance comparing to traditional build-in parsers? We're building a large-scale multi-threaded crawler, so it would be nice to get some ideas about its performance...