1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

If you ever have/had Scrapebox link checker problems/crashes please read this

Discussion in 'Black Hat SEO Tools' started by m3ownz, Jun 26, 2012.

  1. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    OK, so the built in link checker in SB hangs/crashes for a lot of people.

    I 1st posted about it 1.5 years ago here:

    http://www.blackhatworld.com/blackh...-checker-kicking-my-cpus-ass.html#post2357016

    a "site:blackhatworld.com scrapebox linkchecker crash" returns about 700 results.

    However, when i emailed support (back in 2010), they said that nothing had really changed in linkchecker so it must be something else (bad memory etc).

    All my VPS's (big and small, different providers) have run server 2003 or 2008. All have had the same crash/hang problem. When i test my desktop (xp), it doesnt crash, and CPU useage stays normal. However, as my home DSL is crap (6mb) i cant really test it at speed.

    If you have link checker issues, please post your OS (and VPS provider and spec if you can) here so we might see a pattern.
    In fact, if you dont have these problems, please also post details to compare.
    If this thread gets enough responses, i will at least have enough data to try SB support again.

    I will start with my experiences since that post in 2010:

    xsserver.eu 512mb ram server 2003 - check links crashes
    xsserver.eu 1GB ram server 2003 - check links crashes
    burstnet 1gb ram server 2003 - check links crashes //slow server regardless of SB
    bursnet 2gb ram server 2008 - check links crashes //slow server regardless of SB
    Berman 2gb server 2008 - check links crashes //apart from linkchecker mode, this is fast as hell with anything i put on it.
    desktop 4gb winXP - no apparent problems, but DSL too slow to really check.

    I am also doing a bunch of tests within scrapebox to see if i can pinpoint issues.
     
    • Thanks Thanks x 1
    Last edited: Jun 26, 2012
  2. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Ok, so after a bunch of tests, the issue is heavily spammed blogs. Running linkchecker on lists that wont contain massive 4000 comments type pages (eg forum profiles, wikis etc) works fine, even at max 500 threads. Removing urls that contain 'xxx' (tested various options, down to 100) charecters made no difference. trimming the same crashing list to root before checking avoids the problem (although it wont find any links obviously), presumably because the root wont allow comments so will stay small.

    I have email SB support with full details, hopefully i will get a response.
     
    • Thanks Thanks x 1
  3. Scritty

    Scritty Elite Member Premium Member

    Joined:
    May 1, 2010
    Messages:
    2,818
    Likes Received:
    4,511
    Occupation:
    Affiliate Marketer
    Location:
    UK
    Home Page:
    Nice work.
    Linkchecking is a valuable tool.
    A simple addition that cuts off the links - or breaks the list into several parts.

    End of the day, I don't really want to link from a URL that already has 6000 OBL's on it, regardless of PR, so just the ability to "stop" counting at a certain point would probably suffice.

    Scritty
     
    • Thanks Thanks x 1
  4. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Yeah, i dont really want links on these type of sites either, just to be able to check a list that might contain these big pages without a crash.
    Something like 'do not parse pages greater that xxxmb', like is available for the slow poster, should work.

    Why the free link checker does not crash is another issue, i guess its because it because it just does a simple 'if string contains' check on the source, where as the built in one grabs anchor text and url as well, perhaps via some kind of regex that requires a lot more overhead. If that is the case, the option to 'use simple link checker' that works just like the free one but with adjustable threads to beyond 100 would be great.

    I have not heard back from SB support yet, but will post when i do.

    It would be great if others can carry out this simple check (if link checker crashes/is unstable, try trimming same list to root and recheck. It wont find any links (because you didnt post to homepage) but it should run smoothly) and post results here.

    This issue has been going on for a while, it would be very nice to get this fixed, and multiple people with the same results should ensure it gets looked at.
     
  5. Sweetfunny

    Sweetfunny Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 13, 2008
    Messages:
    1,749
    Likes Received:
    5,040
    Location:
    ScrapeBox v2.0
    Home Page:
    Yes, when you start hitting pages with 4,000 comments the page can be 5-10MB in size. If you are checking a heavily spammed AA list with a lot of those pages and are running say 50 connections it's a huge amount of data (50x10MB = 1/2 GB) that the sockets are trying to download and parse. If you open a 1/2 GB file in Notepad it would probably hang right away before you tried to search in it for links/text, especially if you try it on a 512MB VPS.

    Do you have a list you can upload to Mediafire or somewhere that i can run it and pinpoint the exact problem or url?
     
    • Thanks Thanks x 1
  6. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Thanks a lot for replying. I included a sample list with my email to support, and will PM you the same.

    I completely understand that processing massive html files is resource intensive, my question is what the free standalone linkchecker is (or isnt) doing to avoid the issue?

    I am running that sample list now, i will post details of time taken in built in linkchecker vs external, plus resources used and links found in a few minutes

    EDIT ok, so you dont seem to accept PMs, ill post it here then, its hardly a secret: http://pastebin.com/zavb2rCY just a sample list of 3k urls.
    This is the list im testing now.
    So far, built in checker has taken 24 minutes, and according to the GUI last time it refreshed, had processed 1300 urls. Cpu is hitting 50% most of the time (ie 100% of 1 core), memory is at 30% of 2GB.
     
    Last edited: Jun 28, 2012
  7. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Ok, so an update.

    I ran this test list through the inbuilt checker at 100 threads, max timeout.
    It took 72 minutes to complete (dispite SB displaying 8:44 time - see windows time in pics), and found 1191 links.
    Cpu spikes up to 50% (100% of 1 core) throughout, GUI freezes most of the time.

    http://i1158.photobucket.com/albums/p603/m3ownz22/inbuilt_1.jpg
    http://i1158.photobucket.com/albums/p603/m3ownz22/inbuilt_2.jpg
    http://i1158.photobucket.com/albums/p603/m3ownz22/inbuilt_3.jpg

    Then i ran free link checker at 100 threads.
    It took 12 minutes and found 2074 links. Cpu spikes a few times, but mostly within the 20-40% range. GUI remains responsive.

    http://i1158.photobucket.com/albums/p603/m3ownz22/free_1.jpg
    http://i1158.photobucket.com/albums/p603/m3ownz22/free_2.jpg

    So the inbuilt checker is doing something different from the free one.

    This test result is much the same as others i have done on other VPSs from different providers, so i doubt its a windows issue.
    Even if it is a windows bug, inbuilt checker has to be doing something different from free checker to trigger this bug.

    I appreciate you looking into this Sweetfunny.
     
  8. Z0mbie

    Z0mbie Regular Member

    Joined:
    Jun 24, 2012
    Messages:
    339
    Likes Received:
    151
    m3ownz, have you ever used the sick marketing link checker?
     
  9. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Yes, at one point. Do you have a question about it, or where you suggesting it as an alternative?
    If the latter, thankyou, but my aim with this thread is to diagnose the inbuilt link checker, not find something else.
     
  10. Z0mbie

    Z0mbie Regular Member

    Joined:
    Jun 24, 2012
    Messages:
    339
    Likes Received:
    151
    I haven't used it yet, haven't done a mass link checking since I heard about it, but I'm curious as to how it compares to the inbuilt and standalone link checker.
     
  11. dowser

    dowser Power Member

    Joined:
    Jun 5, 2011
    Messages:
    685
    Likes Received:
    122
    Location:
    canada
    It's a good point, it used to drive me bananas!

    I don't use huge lists any more, and even got away with the vps, but I used to use malware addon to get rid of many urls that would cause crash and it would help in many cases.

    Let's hope there is a relatively simple solution to that.

    Are you happy with your current vps and if so - would you mind sharing? I'm without one for a month and I'm already missing it :)
     
  12. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Ok, well the inbuilt checker for me seems to Die a horrible death IF the list has spammed blogs.

    To compare Sick vs SB Free, they are much the same, the Sb one has a nicer GUI and better export options, so you may as well use that.

    Yes i am looking forward to getting to the bottom of this. Malware check does not seem to help for me (though why a sockets based program would be effected by malware in the 1st place is another question all together?), and the example posted above is testing a tiny 3K list, so its not a list size issue.
    It does look like its an html size issue, but im sure Sweetfunny will get to the bottom of it.

    Regarding VPS's, i'll PM you.
     
    • Thanks Thanks x 1
  13. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Carried out of few more tests on other lists, results the same - spammed lists cause inbuilt checker to be unstable and incredably slow, even at a few threads.

    Looking forward to hearing back from SB support/Sweetfunny.
     
  14. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Have been in contact with SB support. They cant replicate this issue which is making it hard to diagnose, but they are being very thorough, so its looking promising.

    This does lead back to my 1st post about this issue only effecting some setups, so i would still appreciate any BHW users that can post their experiences.
     
  15. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,441
    Likes Received:
    1,829
    Gender:
    Male
    Home Page:
    Just to throw my 2 cents in.

    I use multiple dedicated servers.

    Some quad core, 8GB of memory or so some dual quad cores with 12 - 24GB of memory. They are from various countries as well as various locations in any given country.

    What I have seen over the past couple years is this - randomness.

    So let me explain. With any given list that I link check there are lots of factors that come into play, such as:

    Data center network congestion/DNS congestion
    My Server load
    Size of list (number of urls)
    Size of the actual urls when they are loaded (actual page size)
    Physical location of the urls being loaded vs the physical location of my servers
    Gravitational pull of the moon in conjunction with the migration of the hump back whales (not really) :D


    I have, at times, link checked during off peak hours, and it will literally FLY thru them. Set it at 200 connections and it runs lightning fast. Do the same thing at another time and its dog slow. If I happen to say be scraping a list of urls for terms that specifically apply to Sweden, then its "probable" that the sites are going to be Swedish sites hosted in Sweden. If I have a server in Sweden then I am going to get faster results then if I have a server in Australia.

    If I am building an auto approve list and I have come across a lot of fresh domains then link checking goes quicker then if I happen to be link checking a auto approve list I found for free on BHW. As the free BHW one is likely spammed beyond belief. So its going to take forever for the pages to load.

    Also sometimes when I start the link checker, I can jump to another window and then come back and it will come right back up, or come back in 3 hours etc.. This typically corresponds to when its Flying along. When its going slow, it typically is running in the background fine, but for windows to drudge it up and paint the screen, especially on a remote desktop connection, it can take 10-15 minutes. I mean literally I might click the the icon in the start bar that has the link checker going and then go back to my work on my other monitor and 10 minutes later, all of sudden it pops up and shows me its status.

    I consider this normal. Because so much is going on in the back ground that windows just doesn't have the resources to "show" you whats going on, its putting everything into the actual processing of the link checking. When I do a link check, I hit go and expect it to be fine. 98% of the time it is, and I just leave it, even when it won't come up. It finishes normally and when its done, the window is useable again, like normal. But while its running, its basically unusable. The other 2% of the time, windows is stupid or life happens and it crashes/hangs indefinitely.

    When I link check multiple hundreds of thousands of urls, I hit start, and come back like 3 days later and its all good. But during the 3 days, it wasn't viewable. You just have to get used to gauging how big of a list your loading in, and how long, on average, a list like that takes. Then come back and check it out about that time. If its been a LOT longer then that time frame, I kill the process and start over. If it happens on the same list twice, then I chop the list into pieces and try it. If its still failing, then there is something bad in the list. This is when I remove urls with a % in them and remove urls longer then 250 characters and most of the time it works fine.

    Dunno if thats helpful, but thats what I got.
     
    • Thanks Thanks x 2
  16. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Thanks for posting your experiences Matt. That you have (sometimes) experienced link checker death on machines with that much muscle at least confirms its not just my bad luck picking dodgy servers.
    I wonder if you have done any checks with the free independent checker? Because that was the main thing for me - the fact that the free one performed so much better in this test (and other spammed blog lists tests.), eg 12 minutes vs 72 minutes.

    Side note, i see your Lion list is sold out, looking forward to getting the next one.

    This may all soon be irrelevant anyway, as SB support have been excellent as usual, and i am currently testing a pretty fantastic update, so watch this space!
     
    Last edited: Jul 2, 2012
  17. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,441
    Likes Received:
    1,829
    Gender:
    Male
    Home Page:
    I actually have never done any large scale testing with the free link checker. I got scrapebox only a few months after it came out, and I didn't find out about the free link checker until after I had already purchased scrapebox. So I just always used the built in one, as I figured its more then likely the same code anyway. I also like to turn the threads up higher then the free one will go.

    Sweetfunny is always awesome and puts their heart into the software. They have always only made the program better with any feedback I have given, so I am sure they will make it as optimal as possible.

    As for the list, I am actually going to link check the last round of the next list shortly. ;) So I should hopefully have it out today or worst case tomorrow.
     
    Last edited: Jul 2, 2012