1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How does Scrapebox compare to domcop and RC for finding expired domains?

Discussion in 'Black Hat SEO' started by Donald Trump, Jun 11, 2016.

  1. Donald Trump

    Donald Trump Registered Member

    Joined:
    May 15, 2016
    Messages:
    98
    Likes Received:
    13
    Ive been digging through the scraps at domcop and register compass, I found a few decent ones but was about to buy scrapebox if it will make this process a lot easier.

    Is scrapebox just going to turn up domains that are already on domcop? Or are there so many out there that this never happens?
     
  2. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,995
    Gender:
    Male
    Home Page:
    Well it might turn up those domains, but scrapebox is probably more like the "back end" of the places your talking about. Scrapebox is going to go out and do the actual crawling thru websites, checking for available domains, checking metrics and displaying them.

    Scrapebox is NOT going to just go scrape the front end of a bunch of sites like register compass and just display you those results. Scrapebox is an actual crawler.

    I have a video here:
     
    • Thanks Thanks x 3
  3. blackbeans

    blackbeans Jr. VIP Jr. VIP

    Joined:
    Nov 29, 2008
    Messages:
    1,366
    Likes Received:
    239
    Occupation:
    Your Secret Weapon
    Home Page:
    Scrapebox is not just going to show you stuff that you can find on Domcop.

    Scrapebox is more powerful than that.

    You have to remember that people who use Scrapebox have specific niche targets.

    So if you're looking for a very specific range of niches and you know the top domains or authority sites in your niche, Scrapebox is the way to go.

    With Domcop on the other hand, domains that show up there is more like a hit or miss kind of situation because they focus primarily on domains that have the most backlink quality

    indicators or demand indicators.

    While there's nothing necessarily wrong with that form of filtering, people who are looking to build niche-specific PBNs are looking to sacrifice high amounts of desirable domain metrics

    in the interest of niche specificity.

    I'm saying that they're going to throw it out completely.

    Instead, they're shooting for something that's more middle of the road.
     
  4. monetaryguy

    monetaryguy Newbie

    Joined:
    Jun 14, 2016
    Messages:
    7
    Likes Received:
    1
    Gender:
    Male
    If you do not already have scrapebox, you are missing out on a lot than simply an expired domain scraper. Get it and you would be able to simply a lot of SEO related tasks.
     
    • Thanks Thanks x 1
  5. Cshark

    Cshark Jr. VIP Jr. VIP Premium Member

    Joined:
    Feb 25, 2011
    Messages:
    1,361
    Likes Received:
    184
    Gender:
    Male
    Occupation:
    Grinding
    Location:
    NYC
    If you have Scrapebox, you have all of the rest and more.
     
  6. immaletyoufinish

    immaletyoufinish Regular Member

    Joined:
    Mar 3, 2016
    Messages:
    219
    Likes Received:
    111
    It easily finds thousands of crappy low metric zero backlink expired domains. It also finds some good metric domains on .gr TLD or other TLD that you either can't register or is much more expensive than .com.

    It also doesn't keep a record of what it's already seen so as a scraper it's highly inefficient.

    But... I guess it works.
     
  7. BreaknBrix

    BreaknBrix Power Member

    Joined:
    Mar 25, 2014
    Messages:
    756
    Likes Received:
    4,333
    Location:
    NE US
    I use scrapebox, domain hunter gatherer + ahrefs & majestic.

    Scrapebox to create my seed list of authority domains. That's one of the most important parts is finding good sites to scrape. Then DHG to scrape the domains and filter by metrics. Then ahrefs to manually check backlinks (and to a lesser extent majestic).
     
    • Thanks Thanks x 1
  8. Sweetfunny

    Sweetfunny Jr. VIP Jr. VIP

    Joined:
    Jul 13, 2008
    Messages:
    1,785
    Likes Received:
    5,067
    Location:
    ScrapeBox v2.0
    Home Page:
    There's metric filters you can use to filter out low quality domains if you dont want them. The plugin will find whatever expired domains that exist on the target pages or domains you to tell it to crawl. Or are you trying to say there's no quality expired domains on the whole internet?

    Again there's filters you can use to return the domain extensions you are interested in. Some people actually want to find .gr domains, so the tool supports them along with hundreds of other TLD's. If you personally don't want those, simply filter them you can just return com/net/org if that's all you want by adding those extensions to the realtime filter. Perhaps take a look at the tutorial video Loopline made to see how it works:

     
  9. immaletyoufinish

    immaletyoufinish Regular Member

    Joined:
    Mar 3, 2016
    Messages:
    219
    Likes Received:
    111
    I know. I'm trying to say that at first it looks kinda awesome but if you look at the amount of good quality expired .com versus the number of good quality non .com TLDs you realize this is pretty saturated and it's slim pickings without crawling a very large amount of pages for days. Now, I don't mind doing that (and I pretty much just leave it running) but what bothers me about it is the algorithm is very naive and doesn't make use of a database to keep track of what it's already seen so it is actually incredibly wasteful to use as a tool. Hence my frustration with it. Hands down this is the best method IMO but a far more efficient approach to scraping can be taken and would result in a significantly better yield.

    It's not too terrible for me though because I target a non English speaking market and my TLD though more expensive than .com has significantly more quality stuff just waiting to be found.
     
  10. Donald Trump

    Donald Trump Registered Member

    Joined:
    May 15, 2016
    Messages:
    98
    Likes Received:
    13
    loopline, is there any reason to use proxies if i'm only harvesting from one search term? won't it just search once and pull everything from that one serp?

    I've been having trouble with using the proxies from proxy harvester
     
  11. s14blackhat

    s14blackhat Registered Member

    Joined:
    May 16, 2014
    Messages:
    62
    Likes Received:
    4
    I don't use proxy on Scrapebox/expireddomain plugin.
     
  12. bdenzer

    bdenzer BANNED BANNED

    Joined:
    Dec 23, 2014
    Messages:
    35
    Likes Received:
    0
    Easy to finds thousands of zero backlink expired domains.
     
  13. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,995
    Gender:
    Male
    Home Page:
    Expired domains change every minute. If it used a database and said Ok I checked url X today and then it comes across it tomorrow and doesn't load it again, url X might link to the best expired domain you ever wanted and you would miss it because of the effeciency of the database. So how often would you purge the database? Every day? Every 12 hours? Then whats the point of having one in the first place?

    Ive come across many excellent domains that I go have a look at and they were just registered literally that day. Its a competitive hyper changing market place. So yeah it takes a lot of vigilance to get great domains, but there is an excellent quote by someone whose name escapes me. Its a great speaker/author that said "The troubles of your job account for half your pay, because if it werent for them someone could be found to do what your doing for half what your being paid".

    So it goes with expired domains, if it weren't for the trouble of finding them and always being on it scanning and rescanning pages that have already been scanned, then everyone would be selling them everywhere and google would figure a way to ignore them easily and they would be worthless.

    I think it would be exceedingly inefficient to use a database because that would massively increase the risk that I would miss valuable domains. It just changes too quick for a database to even be viable. Not to mention its a waste of resources as well.

    If your only scraping 1 search term, no there is no point to use proxies. Unless its a massively complex one, you can get your ip banned in a browser in a few requests if you start looking for crazy queries loaded with advanced operators. But otherwise no, try it with no proxies. Worst case you get your ip banned, in 48 hours its unbanned and if its banned in a browser then just enter the captcha until its unbanned and move on. Then you will know.
     
  14. henryw1981

    henryw1981 BANNED BANNED

    Joined:
    May 30, 2015
    Messages:
    19
    Likes Received:
    0
    I don't use proxy on expireddomain plugins.
     
  15. speedylikesKJ

    speedylikesKJ BANNED BANNED

    Joined:
    Aug 18, 2015
    Messages:
    17
    Likes Received:
    0
    Using the proxies from proxy harvester, Is this saffe?
     
  16. immaletyoufinish

    immaletyoufinish Regular Member

    Joined:
    Mar 3, 2016
    Messages:
    219
    Likes Received:
    111
    Completely and utterly incorrect.

    Domains have a cycle that they go through once they are registered and then not renewed then they may go to auction or get dropped but for the time it has been registered then you can't register it even if it's not being used. We can use the whois registration date and check only at the relevant timing.

    At the time you first encounter the domain, if it's expired and the metrics aren't even good the backlinks likely won't be good so you can update the database to put it on the blacklist and never waste resources looking at it again. This is probably the number one offense committed by sb for not using a database. Sure, you could use a filter but that doesn't stop scrapebox from rechecking the domain availability and the metrics it simply stops SB from reporting that it found it because it didn't match the criteria. The amount of effort wasted here is PHENOMENAL. Especially if you are starting from the same seed pages (say Wikipedia) often it cannot be understated how much of a colossal waste of effort this is.

    Next, if the domain is not expired you can check the whois and store it in the database to know when it is due to expire and have a daily run check for everything encountered so far that has just expired today, if it has been renewed then record the new expiration date in the database. If it has since become available then re-check the metrics. Manually rechecking this hundreds of times per domain across hundreds of thousands of domains really adds up fast. Those cycles should be spent crawling millions of new pages to find more expired domains.

    The objectives of using a database are three fold.

    1. To only expend the effort once to check the metrics of expired domains.

    2. To keep a record of when live domains are set to expire, so we can expend the effort to see if they hace exlipired once at the right time.

    3. To keep track of all current domains that return an HTTP 200 and determine how frequently they update their content and once thus frequency has been established check back for new content containing fresh links at the appropriate interval.

    Number 3 is what you are actually wanting to achieve. Not missing out on scooping up fresh links that may eventually turn into expired domains (assuming a freshly linked to articles linking domain won't expire for a minimum of 1 year) . Other than that if we have seen the domain anywhere we will have its expiration date in the database and seeing as only the current owner can re-register it before that period is up there is only a point to recheck it after the expiration date.

    So... actually a database is anywhere between 30x to 1 trillion times more efficient than starting from a single seed page and spidering with a naive algorithm re-checking 80% useless shit we have already seen 10x before on a single machine keeping everything in memory until we have done 8 million pages at which point we run out of memory and crash.

    That last part really gets me. It always crashes after around 6 million pages because not using a database means it has to rely solely on RAM which is far more limited.

    Your grievances with this strategy are actually a non-issue. Google indexes in the tens of trillions of unique URLs and guess what? They're not not using databases for this.
     
  17. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,995
    Gender:
    Male
    Home Page:
    Im not going to even try and go into the massive amount of potential problems and bottle necks this would create. I simply don't have the time and talking about it would just be a waste of time.

    That said on the actual function of what does exist, I have no problem getting into millions and millions of pages, much more then 8 million pages. Does it actually say "out of memory" on the crash or something else?
     
  18. Donald Trump

    Donald Trump Registered Member

    Joined:
    May 15, 2016
    Messages:
    98
    Likes Received:
    13
    So besides automating the process of moving url lists back and forth from harvester to link extractor and also checking domain availability and metrics filtering, I can still do this without the plugin and without proxies right?
     
  19. Donald Trump

    Donald Trump Registered Member

    Joined:
    May 15, 2016
    Messages:
    98
    Likes Received:
    13
    lol got ip blocked. this was from one search term at a time too. Harvester configuration settings and test google brings up a 503 service unavailable error. I copy and pasted the scrapebox search query and it brings up a captcha.

    I guess i'll use yahoo and bing for a couple days.

    Will google ban my ip faster now? How can i avoid this if i already wasn't using more than 1 search term at a time?
     
  20. Rezky

    Rezky Registered Member

    Joined:
    May 2, 2015
    Messages:
    73
    Likes Received:
    3
    You have all of the rest and more, If you have Scrape box....