1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Webpage scraper

Discussion in 'Other Languages' started by soklot, Sep 21, 2012.

  1. soklot

    soklot Newbie

    Joined:
    Aug 24, 2012
    Messages:
    19
    Likes Received:
    3
    Hi to you all
    I saw couple threads about getting data from webpage so this is simple java example
    you will need jsoup jar file in your project
    Code:
            Document doc = null;
            try {
                String currUrl;
                doc = Jsoup.connect(ENTER BLACKHATWORLD URL INCLUDING HTTP).get();
            } catch (IOException e) {
                ;
            }
            
            
               Elements allElsWithClassName = doc.getElementsByClass("threadbit");
               Iterator it = allElsWithClassName.iterator();
               while(it.hasNext()){
                   Element b = (Element)it.next();
                   System.out.println(b.getElementsByClass("threadtitle").text());
               }
    

    what we are doing here is getting names of posts
    i think it is easy to follow this example so if you have any questions just ask :)
     
  2. Question

    Question Registered Member

    Joined:
    Aug 14, 2011
    Messages:
    51
    Likes Received:
    32
    JSoup is a nice lib, but have you tried to use it in very large app? Is it good in terms of performance comparing to traditional build-in parsers? We're building a large-scale multi-threaded crawler, so it would be nice to get some ideas about its performance...