1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Why can't scrapebox be more like hrefer?

Discussion in 'Black Hat SEO Tools' started by justenjoy, Nov 30, 2012.

  1. justenjoy

    justenjoy Registered Member

    Joined:
    Feb 22, 2008
    Messages:
    50
    Likes Received:
    26
    I don't use scrapebox that much but when I do I miss some superior features that hrefer has. Like advanced filtering, removing duplicates and finding new proxies on the fly. I only find SB good for scraping few k's of sites at a time after removing duplicates then you have to restart everything while hrefer can run indefinetely. Or perhaps I'm doing something wrong. I'm only using public proxies for scrapeing.

    Even then why SB devs wont implement those features?

    Please share your experience.
     
  2. audioguy

    audioguy Power Member

    Joined:
    Jun 12, 2010
    Messages:
    609
    Likes Received:
    224
    Location:
    Anywhere in the world building WP sites.
    Yea experiencing the same thing. Underestimate how long a harvesting session will go. So I run it, and now after 7 days, still running. Fetching more than 10 million urls now. If I cancel, I have to start all over again.

    And because I use public proxies, I can only presume, the rate can get only slower and slower. There's no indication at all about where I am in the list of keyword and footprint. Seems like I have to wait for a couple of days.

    If SB can refresh new proxies on the fly, even if I have to do it manually and paste them into a file, then it could run as long as I want it to without slowing down. Also an indicator about where I am in the list can really help.

    Or is there anyway I can find out?
     
  3. justenjoy

    justenjoy Registered Member

    Joined:
    Feb 22, 2008
    Messages:
    50
    Likes Received:
    26
    Yes uploading proxies to a file would be a big help. Some proxy scrapers can do that easily so should not be a problem to automate.

    I wonder how many duplicates you have in there. Yesterday I scraped some list of 1kk results but after removing duplicates I ended up with only 2k domains. Of course you can get more depeding on footprints youre using.
     
  4. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,383
    Likes Received:
    1,801
    Gender:
    Male
    Home Page:
    Its quite funny, I find hrefer to quite inferior when compared to scrapebox. I know a LOT of people who use scrapebox to scrape and xrumer to post, which is what I do myself. Hrefer, while does have some features, such as getting proxies on the fly and adding engines, lacks many features. It won't support several elements that I use when scraping advanced footprints and it lacks granular control on many filtering options.

    I suppose it boils down to what you use it for.

    In your case, there isn't a "grab new proxies" option persay, however you can stop it any time you want and refresh the proxies. When you stop it, simply export uncompleted keywords, then refresh your proxies, then reimport your uncompleted keywords.

    Yes with public proxies, the majority die in 12-36 hours. I use private proxies for the majority of my scraping, and can scrape for a week straight at the same rate. With 26 proxies I can scrape well over 100 million urls in a week.

    There is no "completion" indicator of where you are in the list, but with exporting failed, you can just come back and reimport. If you wanted to spend a few bucks a month and just get some private proxies, you can set your connections to 20% of your proxies and harvest for days straight and generally never get your ips banned. Also only a few private proxies are considerably faster then a LOT of public proxies. Even when Ive used hrefer, I still use private proxies and avoid the mess and time wasted of testing and messing with public proxies. The only time I use public proxies is when I have all my private ones maxed out and I just need to do a quick scrape.

    To each his own, some people swear by scraping with public proxies, some private, some people swear by hrefer and other scrapebox and others something else still. I myself love granular control and hrefer simply does not offer that level of control like scrapebox does. Most people who use hrfer only want to scrape to use it with xrumer, but I scrape for all manner of uses in scrapebox, xrumer and lots of other functions outside of building links.

    Curious though if you like hrefer better, why are you not just using hrefer?


    What advanced filtering are you talking about? Scrapebox can filter in every way hrefer can, just not on the fly, it happens afterwards. But you can scrape "endlessly" with scrapebox, as endlessly as you want anyway. Like I can and have many times scraped up a hundred million urls or more. Then Ill remove duplicates and filter them how I like in scrapebox.

    Ill admit I do have some custom apps that I have made for specific instances that neither scrapebox nor hrefer direcly support, simply because I run them in the could and you don't get tha flexability with a windows app. On those I do some nice filtering in command line, but otherwise Ive not really found what can't be filtered with sbox (except a few filters I built that use RegEx).

    It doesn't find new proxies on the fly, but I use private proxies so I never have to mess with public proxies. You could use the automator plugin for this "Concept" though. You could take the scrapes you want and build them into batches.

    So say you wanted to scrape 100,000 keywords. You could break that into 25K chunks (just picking numbers), then set the automator to test proxies, then harvest. Then after 25K done, harvest/test proxies, then post to the next 25K. By doing this you could work thru your entire list, and have it getting new proxies after every X keywords.

    Makes sense? Sort of a work around, but "essentially" the same end result.

    I mean I don't see any "features" other then proxies on the fly that hrefer can do that sbox can't. Filtering and such happen post scrape rather then during the scrape, but the end resulting urls are the same (assuming same filters applied). What filtering are you doing? If you can define it clearly, I might be able to help you out.

    Xrumer is a phenomenal program, cleverly created over the years, I don't think most would disagree. Hrefer on the other hand, I simply hate, and I think, especially compared to the immense flexibility of xrumer, that hrefer lacks significant flexibility.
     
    • Thanks Thanks x 2
  5. Mex-deluxe

    Mex-deluxe Regular Member

    Joined:
    May 24, 2010
    Messages:
    252
    Likes Received:
    29
    With the latest update Scrapebox is now more like Hrefer.