1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Let's get a few things clear about multithreading in software posted here

Discussion in 'General Programming Chat' started by petros, Jul 22, 2014.

  1. petros

    petros Newbie

    Joined:
    Jan 23, 2009
    Messages:
    13
    Likes Received:
    7
    A lot of bots/tools I see posted here advertise multithreading support and even go as far to suggest that you can run it with 100s of threads and it'll make it run ungodly fast. This is not how computers work.

    Your computer can only execute as many threads as you have CPU cores. That means if you have a quad-core processor, at most 4 threads can be executing at the same time. Using 100 threads will most likely slow down your application!

    What happens is your operating systems scheduler is trying to balance all these threads that are asking for execution time. Since only X threads can run concurrently, it will pause active threads after they get a bit of execution time and awaken other threads that are waiting for their turn. Not only are you wasting a lot of time doing expensive context switches (suspending and resuming threads), an overwhelming majority of your unholy army of threads will spend most of their time waiting for the scheduler to put them in the game.

    On top of all this, most bots are mostly doing network requests. That means they are I/O bound, not CPU bound. Throwing more threads at it simply isn't the solution. You should be creating a fixed amount of worker threads (around the same number as CPU cores) and using your platforms asynchronous I/O APIs. This way you can still make plenty of concurrent I/O requests (you don't need to wait for a previous one to finish before starting another one) without wasting system resources and time on context switching. Not to mention it's much, much, much more memory and CPU friendly.


    The point of this rant being, buyers please, don't pay extra for "unlimited threads!!!!" (as I've seen in some listings).

    And bot developers: Threads are for concurrently executing CPU-bound tasks (solving a captcha) or background work, use asynchronous I/O for I/O bound tasks (network stuff, reading/writing files, etc)
     
    • Thanks Thanks x 2
  2. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,143
    Petro,

    When a buyer talks about threads, he understands it as concurrency. How you, as a programmer, implement that concurrency is irrelevant to him.

    For example, Node.js is single-threaded and all the IO is asynchronous via the Event Loop. Spawning (say) 100 requests to Google asynchronously doesn't translate to 100 threads, but the end result is what the user understands as threading.

    As for "unlimited threads", if they buyer wants them, let him have them (again, not threads in the technical sense). You just tell him that after X "threads" things will slow down because his PC can't handle more and let him do whatever he wants.

    Edit: Moved the ... thread ... to the proper section :)
     
    • Thanks Thanks x 1
  3. divok

    divok Senior Member

    Joined:
    Jul 21, 2010
    Messages:
    1,015
    Likes Received:
    634
    Location:
    http://twitter.com/divok
    you must be researching this,very hard since 2009.
    even those "unlimited threads" in most soft's use a single core
    edited.
     
    Last edited: Jul 22, 2014
  4. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,143
    @divok,

    What he wrote is technically correct.
     
  5. petros

    petros Newbie

    Joined:
    Jan 23, 2009
    Messages:
    13
    Likes Received:
    7
    I'd be willing to bet 99% of the time it's mentioned in software here it's actually spinning up threads since overlapped I/O on windows isn't something a lot of people mess around with.

    How Node.JS (libuv) works is how I suggested developers handle concurrency in my original post actually. It spins up 4 worker threads (by default) and polls for events to report back to the event loop (callbacks etc are fired from the main thread while the worker threads do their thing).

    I know it seems like I'm making mountains out of mole hills here, it just rubs me the wrong way that it's so commonly promoted as more threads = more speed (and I've been guilty of this kind of thinking myself, I'm pretty sure my first bot had something like that written on it's release page).
     
    Last edited: Jul 23, 2014
  6. petros

    petros Newbie

    Joined:
    Jan 23, 2009
    Messages:
    13
    Likes Received:
    7
    That's completely up the scheduler. Generally it will attempt to fairly distribute the work to all available cores. Threads running on the same core will preempt each other so they will not be executing at the same time unless one gets moved by the scheduler
     
  7. ttrox

    ttrox Regular Member

    Joined:
    Jun 28, 2013
    Messages:
    217
    Likes Received:
    75
    I don't think it's a matter of > # threads > speed. But depending on the operations that the program does, it can indeed enhance it's speed.

    For example, if you have a program that processes hundreds of arithmetic operations on each of its threads non-stop, chances are it won't do anything more worthwhile than a few corresponding to the amount of cores (or amount cores * 2 depending on the architecture).

    However, if you depend on network operations, using threads can come pretty handy as you'll have to wait for server's responses where you won't be using up many resources but you are mostly waiting for the server to respond to your requests (threads "blocked" state).

    This being said, it all depends on the context.
     
  8. petros

    petros Newbie

    Joined:
    Jan 23, 2009
    Messages:
    13
    Likes Received:
    7
    The problem there is a thread won't necessarily yield because of a blocking I/O call, so you might end up having your threads doing nothing and then getting preempted before getting to do any actual work because the scheduler decided it's time was up. Even if it does yield if there was data available now you're paying the cost of a context switch and waiting until the scheduler decides it's your turn again

    For network operations, if you want high concurrency it's probably better to have an event loop (either a single thread or some worker threads) that keeps track of open requests and check for updates on each tick, firing callbacks on completion (this way you can make multiple requests in the same thread without having to read from the socket until there is actually data). All the fun of concurrency without the high cost of threads (setting up a brand new stack, storing/restoring registers, and all that fun setup/teardown stuff)

    Mostly I'm just salty that a lot of people seem to market it as more threads = more speed like it's some magical pill that makes your program faster. I've seen multiple products charging more for versions that use more threads when there's likely no advantage to using 50 threads vs 100 threads* when it comes down to it. The more threads you have the scheduler juggling the more likely it's waking up threads that are still waiting on system calls to return (reading from a socket) and wasting time


    I'm not against threads, just programs promoting the use of 100s of them. There's almost no reason to use 100s of threads in cases like this (maybe useful in a situation where you don't necessarily need a set of tasks done the fastest because of the cost of all those threads preempting each other, but would rather have progress be made concurrently on all of them vs a something like the libuv work queue where X tasks are worked on at a time)

    * these numbers were randomly chosen, your mileage may vary ;)
     
    • Thanks Thanks x 1
    Last edited: Jul 23, 2014
  9. eboetoyep

    eboetoyep Newbie

    Joined:
    Jul 4, 2013
    Messages:
    1
    Likes Received:
    0
    There is a great talk from a few years ago by Rob Pike (of UNIX, Plan9, and Golang fame) called "Concurrency is not Parallelism" that is a fantastic primer on the difference between concurrency and parallelism as well as high-level design considerations for concurrent systems. I can't post a link, but Google "rob pike concurrency is not parallelism" and check out the first Vimeo link.

    OP: Your excitement about Node/event-driven programming is palpable, but you haven't mentioned any of the major drawbacks of event-driven systems. You're coming dangerously close to doing to event loops what you're accusing BHW sellers of doing to threads:
     
  10. petros

    petros Newbie

    Joined:
    Jan 23, 2009
    Messages:
    13
    Likes Received:
    7
    Sorry for the delay, haven't come back to BHW in a bit. I'm not saying event loops are the end all be all. I'm just saying offering options for hundreds of threads and then charging more for that capability (often hundreds more) is silly because just throwing threads at the script isn't going to make your network requests go through any faster and may actually just slow down the program as a whole.

    I'm mostly just warning against paying an extra $100 for 100 threads, $300 for 400 threads, etc as I've seen on some products. Only one (or twoish if we're dealing with hyperthreading) are going to be running in parallel anyway. Once you've got all these threads running concurrently you're losing a lot of time and overhead because the scheduler has to kick in more often and deal with storing the current threads state, tearing down the previous one, and loading the new one. In this specific case, where we are I/O bound, I personally think an event loop type deal will work best in conjunction with threads (1-2 event loop threads per core). This way we've maxed out the possible parallel threads running and can have a lot of concurrent requests executing in each one. This basically does what spawning a bajillion threads does without the huge overhead, so we spend more time doing real work, plus if one request in the thread starts taking a while, the same thread can still work on completing the rest of the requests in it's queue without waiting any cycles. Neither thread-only or a single event loop is a silver bullet for all situations. It just happens in this situation where we want high concurrency an I/O bound applications a nice middle ground will yield the best performance.

    Didn't mean to come off as preaching event loops etc, just warning that customers shouldn't be taken in by the idea that more threads = better performance/speed.
     
    Last edited: Sep 18, 2014
  11. WMAid

    WMAid Newbie

    Joined:
    Sep 9, 2014
    Messages:
    45
    Likes Received:
    2
    Home Page:
    I have own simple multithreaded software.

    On my VDS with only 1 CPU it makes:
    - with 2000 threads: ~2000 HEAD req/sec
    - with 10 threads: ~20 HEAD req/sec

    Where am I wrong?

    Multithreading is really good option with proper programming language.
     
    • Thanks Thanks x 1
  12. zohar

    zohar Newbie

    Joined:
    Jun 24, 2014
    Messages:
    44
    Likes Received:
    5
    IMHO, multi-threading should be avoided as the plague. Best thing is to have modular software design and execute CPU intensive tasks as a seperate console application and run it silently in the background. Just like chess, you have to protect the queen, and the queen being the main process. The act of multithreading is having minor pieces toppling over your game very fast.

    Think about this, the code to stop and start a process is by all chances the MOST optimized code in an operating system. Your app would want to rely on low-level code that are most looked after by your OS designer. Therefore in my opinion, multi-threading is a niche from the '95 era. Why waste time and risk and unstable application. One doesn't have to code true multithreading anymore in a time where 1 GB RAM and 64-bit 3 GHZ processors are becoming the standards. Its a great way to waste one's time. My personal philosophy- let the MS guy sweat in his cubicle about the fact if my app is stable or not, that is his job, not ours as 3rd party coders. To garantuee this, use the OS API's that are most optimalized by the MS guy. Thanks.
     
    Last edited: Sep 24, 2014
  13. gooldude13

    gooldude13 Newbie

    Joined:
    Jun 11, 2011
    Messages:
    24
    Likes Received:
    25
    Lol. You need to be smacked.
     
  14. zohar

    zohar Newbie

    Joined:
    Jun 24, 2014
    Messages:
    44
    Likes Received:
    5
    LOL. You probably work at MS ?
     
  15. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    5,881
    Likes Received:
    7,119
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    i'm running 4 instances of the same bot 24/7, all with 100 threads each and i don't have a very advanced config (lenovo notebook - not even a thinkpad - some newer i3 CPU, 4GB ram on win7 pro 64 bit), if i want i can play fairly new games with those running in the background or watch HD movies

    so what does it seem to be the problem here?

    i don't have a programmer mind, all i know: more threads=more traffic and that's what really counts
     
  16. zohar

    zohar Newbie

    Joined:
    Jun 24, 2014
    Messages:
    44
    Likes Received:
    5
    The 'problem' seems to be that you running 400 threads and wonder why you can cook eggs with your laptop's battery. Get a dedicated server man.
     
  17. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,143
    I 'll try to give an example:

    Think of a big ass highway that can accommodate 5 lanes. If that highway only had 1 lane, it would be a waste. Now if you go an make 100 lanes on that same highway, the lane's width would be smaller that the car and so in practice, you 'd still have 5 lanes. But not only that, you'd have the overhead of cars trying to find a lane to get to. So, it reality, you get say 3 lanes worth of traffic in the same highway.

    A computer can have a maximum amount of threads that are optimal. How many are those? It's case specific. So to find out, the best way is to test your tools with various settings and see where performance starts to go down.
     
    • Thanks Thanks x 1
  18. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    5,881
    Likes Received:
    7,119
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    i don't keep the battery in, the thing is, everything is on optimal temperature with the tools running

    i don't really care about the hardware, it's just a tool, like a hammer for a carpenter, if it goes phut beyond repair - which normally shouldn't happen in years, if you buy a decent machine, even if it's running 24/7 - i just buy a new one, last one lasted 7 years

    i can do the server thing, but i'm not really a big fan of servers, i prefer things running on my machine
     
  19. botcode

    botcode Newbie

    Joined:
    Oct 30, 2014
    Messages:
    16
    Likes Received:
    1
    Creating a multi threaded application can get complicated. Multi threaded applications suffer from bugs that are hard to track/reproduce or fix.