1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Small contribution for anyone who own sites with more than 500,000 pages

Discussion in 'Black Hat SEO' started by maxlogov, May 29, 2015.

  1. maxlogov

    maxlogov Junior Member

    Sep 25, 2011
    Likes Received:
    Hey ,
    I was doing some research of seo specific for google for a pretty long time and decide to share some information with anyone who owns sites with half of million pages (or more) and hope if you can share something back.
    So what i figure out is that every page gets some rank only at point of indexing or recrawling , the most important part is that every page gets ranking depends on your DA and number of indexed pages.
    so i think the formula looks something like that ((DA/Number of indexed pages)*Content Update Date) * Number of backlinks to specific page, in other words the more DA(Domain Authority) you have and less pages you have the more chances that you inner pages will get higher in search results BUT you need to find golden middle between number of indexed pages and DA otherwise you might be loosen a lot of rankings if you have too many indexed pages or you could just waste your DA power if you have very few indexed pages.
    So basically when i am talking about huge sites I assume that no backlinks were built to inner pages.
    If anyone know (for sure) , for some testing purpose I need to figure out the fastest way to get deindexed all pages from google and let them index all again from begining.
  2. cottonwolf

    cottonwolf Regular Member

    Jan 20, 2015
    Likes Received:
    Scrapebox link extractor can be used to extract internal links from your domain pages. Do that multiple times until you think you got all your pages.

    First start in google harvest

    site:domain.com a
    site:domain.com b
    you can use words, numbers etc to multiply your site:domain.com queries, as SEs only return 1000result max per query.
    Then you link extract your internal pages until forever, and dedupe URLs at the end.

    Or try to use something like Xenu to crawl your site, starting from root. However, it crashed on me after crawling a medium site, and it doesn't seem to harvest external urls in the rootdomain.com format, only http://domain.com format. Suxx.

    For indexing, you can use sites that G frequents often and put your link there. That's what indexing is about on large scale.