COnfused about java bots

Discussion in 'General Programming Chat' started by hszforu, Oct 28, 2012.

  1. hszforu

    hszforu Newbie

    Dec 12, 2009
    I have good knowledge of core java.So i will be looking to make bots using java.
    Now ,after searching BHW about java to make bots, many people recommended Htmlunit.
    However i have following doubts, i tried to ask this in other popular java forums but didn't got any replies.
    So, this is all i want to know:

    1. Does htmlunit uses httpurlconnection internally?
    2. If not which one is more faster in terms of establishing a connection,scraping data from websites, filling out forms etc..
    3. Are there any better things performance wise like socket programming for above mentioned tasks?

    I want to learn all this things.. but i am super confused on what to learn.I played with selenuim
    and java bindings, but they are super slow.Now i am learning htmlunit.
    Do you know any better solution for doing the above mentioned tasks?
  2. m00j99

    m00j99 Registered Member

    Oct 8, 2009
    This depends on your target site.. if its just a simple html site and you dont need javascript etc... probably the fastest and easy solution is to use HttpURLConnection... htmlunit and/or selenium will be an additional overhead.

            try {
                HttpURLConnection con = (HttpURLConnection)new URL("http://www.example.com").openConnection();
                BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
                String inputLine;
                while ((inputLine = in.readLine()) != null) {
            } catch (Exception ex) {
    Another advantage of this solution is, that you can pass an Proxy to openConnection(Proxy p);
  3. matessim

    matessim Junior Member

    Nov 22, 2008
    Apache HttpClient is worth looking at, keep in mind when you are using tools that work with the actual connections and not huge bulky frameworks like Selenium which is more of a macroing framework it is much more likely the bottleneck will be the network, the choice of library HtmlUnit or HttpClient will probably not pose such a performance difference, remember how long a network operation takes versus local actions, calculations and operations etc.

    As i said a few times here, pick something you like and stick with it