1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Multithreading curl

Discussion in 'PHP & Perl' started by Packers, Apr 27, 2011.

  1. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Hi,

    I have read people saying that an issue with using cURL is the difficulty to multithread everything? Where exactly does the issue lie? I recently made a (crappy) wordpress commenter in php, got it running now on a test run. It seems to me that I can run multiple instances of the script, so where's the problem (other than php itself may be a little slow - but for those that dont want to fork out for scrapebox yet, we put up with speed of php :))

    Cheers!
     
    • Thanks Thanks x 1
  2. jcbizzled

    jcbizzled Registered Member

    Joined:
    Aug 23, 2010
    Messages:
    50
    Likes Received:
    12
    well, i think you're talking about two different things. when people say multithreading with curl, i don't think they're talking about running the same script multiple times concurrently. rather, they're talking about a single script opening many connections concurrently in one single execution.

    is it more complicated than single threaded curl? sure. is it really that complicated though? not really, i mean... you can find classes that will pretty much handle any perceived complications of handling multiple connections in curl for you.

    my only real issue with curl multithreaded was just making sure you use a build of curl that can support the timeouts increments you may need. for example in the past i've needed timeouts on the order of milliseconds rather than seconds. however the particular build of curl i used at the time didn't support that.
     
  3. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Thanks! I just discovered a nice IDE in linux for c# so I might give that a shot. Slightly annoying since I literally just got my php script to work and comment :| still, i'm 500 links richer though it is bloody slow!
     
  4. Autumn

    Autumn Elite Member

    Joined:
    Nov 18, 2010
    Messages:
    2,197
    Likes Received:
    3,041
    Occupation:
    I figure out ways to make money online and then au
    Location:
    Spamville
    I use the slow but simple and reliable method of multiple instances of PHP rather than multithreading. It is a bit heavy on the RAM but with 8GB of RAM on my server I can easily have 2K+ instances running at full blast. Just use a single launcher script to launch your processes and a single database on the backend to store your info and you're fine. If you're working from text files, the unix `split` is your friend.

    There's no real advantage IMHO in using a compiled language because all of the bottlenecks when building bots are in the HTTP transactions rather than the computational time. You can lower the memory footprint but RAM is so cheap that it's a poor use of time.

    Server time is exceptionally cheap, but your own programming time is extremely expensive!
     
    • Thanks Thanks x 1
  5. nambooooo

    nambooooo Regular Member

    Joined:
    May 21, 2009
    Messages:
    219
    Likes Received:
    71
    Location:
    GreenMTN
    Don't know where do you guys get that multithreaded curl is slow. IMHO speed is limited mainly by your Internet connection speed, other than that it's fine for me. It would be nice though to set the unwanted extensions, so it doesn't download images. The one issue I've seen with it is that it's blocking. Meaning, if you have a list of 100 URLs and do it in batches of 10 threads each batch will wait for the longest thread to complete, even if it's a time-out of 60 seconds. Although if you know how, this is not a problem at all. I've seen a class before that once a thread is complete, it doesn't wait, it takes another URL from the stack, so it keeps the number of threads constant. Forgot what it's called, google "non-blocking multi curl class" or something, maybe you get lucky, I'm just lazy to look.
    I've build a bot for a soc network on PHP+cURL with 10 threads, and don't see any major decrease in time, comparing to my regular browsing.
     
    • Thanks Thanks x 1
  6. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    So I gave it a shot. I ran about 20 instances of the script and my computer nearly died. I couldn't use it, but it seems to be doing the job though. I've managed to ssh in from another computer and it looks like its slowly but surely getting there...

    Question: How long does it take scrapebox to go through say 5000 urls on average?

    My script has managed to get through ~3500 urls on 11 threads (there are some others that I am not counting as they're still running before the threading began!) and I've got about ~1052 successful links posted in the space of about 3 hours. Using an autoapprove list I found in the Downloads section here... Is that good?

    I should add my script immediately checks whether the link has been posted after posting it...