1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Mulitthreading httprequest

Discussion in 'Visual Basic .NET' started by energy, Oct 2, 2010.

  1. energy

    energy Newbie

    Joined:
    Dec 18, 2008
    Messages:
    32
    Likes Received:
    2
    I currently have a web scraper which runs well but slowly as it only does one url at a time.
    I am trying to get threading to work and use a couple of threads each one with a different url
    PHP:
     ThreadPool.SetMaxThreads(51)
    For 
    tonumber To fromnumber
                                 
                    ThreadPool
    .QueueUserWorkItem(Function(oget1page(url(number))
                    
    number += 1
                Next
    So basically I set the and array of url, and ask get1page function to fill a class with data.

    The code kind of works how ever does not always get the data or is incomplete.

    Am I going down the correct rout here or am I totally the wrong way? would anyone mind sharing some code?
     
    Last edited: Oct 2, 2010
  2. smack

    smack Junior Member

    Joined:
    Feb 1, 2010
    Messages:
    182
    Likes Received:
    78
    Occupation:
    Software Engineer/Evil Genius
    Location:
    inside .NET
    well there could be several different issues with the code (don't take that offensively), or it could not be the code at all, but rather the site timing out or something else separate from the client causing it to fail.

    when you say this:
    it makes me think that maybe the situation of the remote server timing out is not being handled properly in the code. thus causing it to fail to get the data from the page.

    in regards to your threading, i am not familiar with the threadpool. i don't bother to use it for my collect or send functions when multi-threading. that's not to say it can't be used or shouldn't be used, just that i leverage a different technique.

    typically what i will do is just spawn my threads and use collections with some form(s) of locking to do my coordination.

    as an over-simplistic example, let's say i know that i need to get 10 pages worth of collected data from a site and i want to use 3 threads. i would create a variable to track which page number i am on (let's use a simple integer for this example) and increment it as i go.

    here is what the pseudo code would look like:

    PageNumber as Integer = 10
    PageCount as Integer = 1

    Sub SpawnMyThreads
    I as Integer
    For I = 0 to 2
    MyNewThread as Thread(AddressOf Collect)
    MyNewThread.Start
    Next I
    End Sub

    Sub Collect
    Do
    PageNumForThisSub as Integer = PageCount
    Threading.Interlocked.Increment(PageCount)
    Call GetPageData(PageNumForThisSub)
    Loop Until PageCount > PageNumber
    End Sub

    now i know, i know. incredibly over simplistic pseudo code, but it should give you a high level view in to what i am thinking architecture wise.

    now again, i just want to say that there is nothing wrong with using threadpool, i just tend to go a more simple route and let the threads live as long as they need to then go to the GC when they're done. there are benefits and drawbacks to both methods so what you choose to do is dependent on what your needs are.

    it's hard to say without seeing the rest of the code, but this sounds like it may be more of an architecture/cycle problem instead of a threading problem.

    how are you handling for some kind of error happening inside the method where it requests the page? does it just fail and move on, not worrying about whether it gets a positive result. or does it retry or re-add that page to the queue until it gets the data that it wants?
     
    • Thanks Thanks x 3
  3. energy

    energy Newbie

    Joined:
    Dec 18, 2008
    Messages:
    32
    Likes Received:
    2
    Thank you, I changed the way i am doing it now, instead of passing the variable in the thread as lambda, which I don't quite get...
    I am going to give threadpool another go but if(when) I fail i'll give yours ago.
    I do like your idea of keeping tabs where you are in the array and not just filling up the threadpool queue, I have about 1000000 requests(urls) to deal with which return an xml
    Code:
    Dim newthread As New WaitCallback(AddressOf scraperoutine)
                ThreadPool.QueueUserWorkItem(newthread)
     
  4. energy

    energy Newbie

    Joined:
    Dec 18, 2008
    Messages:
    32
    Likes Received:
    2
    Thanks again smack i've gone from about 1 request a second to 6-10 :D
     
  5. smack

    smack Junior Member

    Joined:
    Feb 1, 2010
    Messages:
    182
    Likes Received:
    78
    Occupation:
    Software Engineer/Evil Genius
    Location:
    inside .NET
    typically when i am a doing a collect my biggest problem is the request timing out. i usually mitigate this issue by running the collect loop until i reach a specific amount of data. for example 100 user names.

    that way if the request for page number 3 times out, it will just skip that and keep going until i have reached my desired amount of collected information.

    lambdas are nifty and can be very powerful, one of my favorite uses for them is in the context of LINQ style queries in to collections. so when i am trying to find something or remove something instead of coding a loop with an iterator i can just write a query.

    they also allow you to interesting things with anonymous or inline delegates such as how you're using them for assigning the method to the thread, or doing a thread safe, cross thread update to a form control of other type of shared member.

    my traditional way of spawning threads limits you a bit with your calls in the way that it's not easy to pass parameters directly to assigned delegate. however with the lambdas as you're doing, that's not a problem because you can populate the method signature directly in the call.

    from an architectural perspective it's a bit of "6 of one kind, half a dozen of another". they both accomplish the same end result, just a different means to the same end.

    good luck, synchronizing multiple threads presents all kind of interesting challenges but once you get it where you want it you have incredible power at your fingertips.

    glad i could help! :)
     
    • Thanks Thanks x 1