1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

What types of hardware infrastructure does ahrefs,SEMRush,Moz,MajesticSEO use

Discussion in 'Black Hat SEO' started by seodeal, Jun 30, 2014.

  1. seodeal

    seodeal Jr. VIP Jr. VIP

    Joined:
    Jun 29, 2013
    Messages:
    319
    Likes Received:
    28
    I want to know what kind of hardware does these backlink reporting service use. I see their faq and here is something that can predict the infrastructure

    ahrefs claims that its crawler can index up to 6 billion pages per 24 hours
    Moz claims that it has database of 119 billion links/URLs

    Do you think they use super computer?

    and can anybody tell me about the monthly cost of maintaining these hardware infrastructure

    Thanks
     
  2. jasperq

    jasperq Newbie

    Joined:
    May 26, 2014
    Messages:
    4
    Likes Received:
    1
    Fishkin answered a Quora question about the running cost of SEOMOZ linkscape. $700k per month since Aug 2012.

    It's running off Amazon's AWS, so lots of virtual machines, specialised for each part of their infrastructure. And using spot-instances (which essentially gives you cheap capacity when the demand is low, like outside of peak hours).

    Supercomputers are rare things these days. What they most likely have is a cluster of 100+ virtual machines. Starting off with a scalable MySQL database to serve up the data needed for the website frontend. Multiple web servers fronted by a load balancer. A couple of big RAM servers acting as memcache in-memory caches for page data (to reduce how often the database is hit).

    Then the backend/crawler is probably a box or two acting as a message queue / pipeline manager (or Amazon's SQS). A dozen or two servers acting as workers crawling various URLs, a handful of worker servers crunching through the incoming HTML into writing out link information to a data store that is either a graph database or a reasonable facsimile thereof.

    And then another couple of servers whose sole role is calculating/iterating through their algorithms to calculate/update the metrics (DA/PA/MR).
     
    • Thanks Thanks x 1