1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Visible PBNs being indexed by Majestic

Discussion in 'Black Hat SEO' started by Jonaz86, Aug 7, 2016.

  1. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Hello guys,

    Despite using WP HTACCESS plugin and double checking (meaning removing the plugin and re-installing it to see if the block code is still there), some of my PBNs have been indexed by Majestic. I realised that the htaccess file didint block anything at all so i had to manually re-add the same code.

    My question is simple, is there anything i can do to remove it from Majestic records completely or am i screwed? So much time and money spent on this..

    p.s
    What could be removing the code, WP updates???
     
  2. Floopa75

    Floopa75 Jr. VIP Jr. VIP

    Joined:
    Feb 6, 2014
    Messages:
    885
    Likes Received:
    759
    Gender:
    Male
    Occupation:
    Summit Rank Link Building
    Location:
    Canada
    Home Page:
    Wordpress will remove the text if you put it before "#END Wordpress" because that gets rewritten every time WP updates.

    As far as indexing by Majestic it doesn't matter. They all are, it's how any PBN builder checks links initially. If you put in the code before building out the content then that content won't get indexed by Majestic, neither will the outgoing links. You're fine.
     
    • Thanks Thanks x 2
  3. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Thanks for the reply but it has for some reason failed and its not due to what you wrote im 100%.

    Is there any way to remove the tracks now from the few PBNs being visible?
     
  4. AleeGS

    AleeGS Regular Member

    Joined:
    Jul 15, 2015
    Messages:
    312
    Likes Received:
    85
    Location:
    ARG
    Majestic is using various user agents. That's the reason.
     
    • Thanks Thanks x 1
  5. Floopa75

    Floopa75 Jr. VIP Jr. VIP

    Joined:
    Feb 6, 2014
    Messages:
    885
    Likes Received:
    759
    Gender:
    Male
    Occupation:
    Summit Rank Link Building
    Location:
    Canada
    Home Page:
    Can you show me the code you're putting in the .htaccess? Paste it here or PM me, either or works.

    Also it is possible that Majestic is using multiple agents - not something I've looked into yet.
     
    • Thanks Thanks x 1
  6. askary

    askary Regular Member

    Joined:
    Jan 6, 2015
    Messages:
    407
    Likes Received:
    89
    may be your hosting doesnt work through apache but nginx, use robots.txt for block bots in this case
     
  7. LatteGrande

    LatteGrande Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 19, 2011
    Messages:
    2,202
    Likes Received:
    614
    Location:
    404 Not Found
    It's true that MJ has multiple agents. I used to put the Latest one on my robots. It's v1.4.6 (new - June 2016) according to http://mj12bot.com They also explained why did your robots not work on MJ12bot.

    Also Add MJ12bot on your htaccess as well.
     
    • Thanks Thanks x 1
  8. Sieusc

    Sieusc Registered Member

    Joined:
    Mar 8, 2009
    Messages:
    71
    Likes Received:
    5
    good info, thanks!
     
  9. ridersark

    ridersark Regular Member

    Joined:
    Nov 14, 2015
    Messages:
    439
    Likes Received:
    102
    Gender:
    Male
    Location:
    before the screen
    isn't it the other way around. They mentioned not to block via .htaccess.

    "MJ12bot adheres to the robots.txt standard. If you want the bot to prevent website from being crawled then add the following text to your robots.txt:

    User-agent: MJ12bot
    Disallow: /
    Please do not block our bot via IP in htaccess - we do not use any consecutive IP blocks as we are a community based distributed crawler. Please always make sure the bot can actually retrieve robots.txt itself. If it can't then it will assume that it is okay to crawl your site."
     
    • Thanks Thanks x 2
  10. rajeshwapda

    rajeshwapda Newbie

    Joined:
    Aug 8, 2016
    Messages:
    6
    Likes Received:
    1
    Gender:
    Male
    I don't think you can remove content frm majestic, but you can definitely prevent it from recrawling.
     
  11. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Thank you guys for your contribution to this thread first and foremost. I am using the following code down below:

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    SetEnvIfNoCase User-Agent .rogerbot. bad_bot
    SetEnvIfNoCase User-Agent .exabot. bad_bot
    SetEnvIfNoCase User-Agent .mj12bot. bad_bot
    SetEnvIfNoCase User-Agent .dotbot. bad_bot
    SetEnvIfNoCase User-Agent .gigabot. bad_bot
    SetEnvIfNoCase User-Agent .ahrefsbot. bad_bot
    SetEnvIfNoCase User-Agent .sitebot. bad_bot
    SetEnvIfNoCase User-Agent .semrushbot. bad_bot
    SetEnvIfNoCase User-Agent .ia_archiver. bad_bot
    SetEnvIfNoCase User-Agent .searchmetricsbot. bad_bot
    SetEnvIfNoCase User-Agent .seokicks-robot. bad_bot
    SetEnvIfNoCase User-Agent .sistrix. bad_bot
    SetEnvIfNoCase User-Agent .lipperhey spider. bad_bot
    SetEnvIfNoCase User-Agent .ncbot. bad_bot
    SetEnvIfNoCase User-Agent .backlinkcrawler. bad_bot
    SetEnvIfNoCase User-Agent .archive.org_bot. bad_bot
    SetEnvIfNoCase User-Agent .meanpathbot. bad_bot
    SetEnvIfNoCase User-Agent .pagesinventory. bad_bot
    SetEnvIfNoCase User-Agent .aboundexbot. bad_bot
    SetEnvIfNoCase User-Agent .spbot. bad_bot
    SetEnvIfNoCase User-Agent .linkdexbot. bad_bot
    SetEnvIfNoCase User-Agent .nutch. bad_bot
    SetEnvIfNoCase User-Agent .blexbot. bad_bot
    SetEnvIfNoCase User-Agent .ezooms. bad_bot
    SetEnvIfNoCase User-Agent .scoutjet. bad_bot
    SetEnvIfNoCase User-Agent .majestic-12. bad_bot
    SetEnvIfNoCase User-Agent .majestic-seo. bad_bot
    SetEnvIfNoCase User-Agent .dsearch. bad_bot
    SetEnvIfNoCase User-Agent .blekkobo. bad_bot

    <Limit GET POST HEAD>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>
    # END WordPress
     
  12. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Im just afraid that i will be reported now.. i even avoided using my brand name or naked anchors to link to my webstore only because of fear that i will get reported.. now 4-5 pbns are visible and probably forever will be visible trough history index of Majestic and probably other tools as well.
     
  13. askary

    askary Regular Member

    Joined:
    Jan 6, 2015
    Messages:
    407
    Likes Received:
    89
    delete dots in your list before and after bots names
     
  14. vsching

    vsching Power Member

    Joined:
    Dec 16, 2013
    Messages:
    642
    Likes Received:
    174
    Gender:
    Male
    Occupation:
    Full Time Internet Marketer
    Home Page:
    Reported for spamming
     
  15. starki

    starki Power Member

    Joined:
    Jul 17, 2012
    Messages:
    710
    Likes Received:
    233
    Unfortunately yes, since blocking via .htaccess will stop further indexing, but won't remove links already found. I wouldn't be worried too much, since I assume your PBN sites aren't interlinked anyway and 4-5 isn't a huge number.

    There are loads of niches you can get away with a PBN that's fully visible in the major backlink checkers for many years. From my experience, the "I report you, you report me" game is primarily played in very spammy niches dominated by affilates anyway. Sure, you can be unlucky, but no need to be paranoid ;)
     
    • Thanks Thanks x 1
  16. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Thank
    Thanks buddy, appreciate those words. Let me ask you, if i report someone does it automatically create a risk of myself getting penalised? There are several major companies that im struggling to compete with and theyre all using PBNs.
     
  17. Floopa75

    Floopa75 Jr. VIP Jr. VIP

    Joined:
    Feb 6, 2014
    Messages:
    885
    Likes Received:
    759
    Gender:
    Male
    Occupation:
    Summit Rank Link Building
    Location:
    Canada
    Home Page:
    Yeah all of the code is before #END Wordpress and will therefore be deleted every time WP updates.

    This is what it should look like:

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    # END WordPress

    SetEnvIfNoCase User-Agent .rogerbot. bad_bot
    SetEnvIfNoCase User-Agent .exabot. bad_bot
    SetEnvIfNoCase User-Agent .mj12bot. bad_bot
    SetEnvIfNoCase User-Agent .dotbot. bad_bot
    SetEnvIfNoCase User-Agent .gigabot. bad_bot
    SetEnvIfNoCase User-Agent .ahrefsbot. bad_bot
    SetEnvIfNoCase User-Agent .sitebot. bad_bot
    SetEnvIfNoCase User-Agent .semrushbot. bad_bot
    SetEnvIfNoCase User-Agent .ia_archiver. bad_bot
    SetEnvIfNoCase User-Agent .searchmetricsbot. bad_bot
    SetEnvIfNoCase User-Agent .seokicks-robot. bad_bot
    SetEnvIfNoCase User-Agent .sistrix. bad_bot
    SetEnvIfNoCase User-Agent .lipperhey spider. bad_bot
    SetEnvIfNoCase User-Agent .ncbot. bad_bot
    SetEnvIfNoCase User-Agent .backlinkcrawler. bad_bot
    SetEnvIfNoCase User-Agent .archive.org_bot. bad_bot
    SetEnvIfNoCase User-Agent .meanpathbot. bad_bot
    SetEnvIfNoCase User-Agent .pagesinventory. bad_bot
    SetEnvIfNoCase User-Agent .aboundexbot. bad_bot
    SetEnvIfNoCase User-Agent .spbot. bad_bot
    SetEnvIfNoCase User-Agent .linkdexbot. bad_bot
    SetEnvIfNoCase User-Agent .nutch. bad_bot
    SetEnvIfNoCase User-Agent .blexbot. bad_bot
    SetEnvIfNoCase User-Agent .ezooms. bad_bot
    SetEnvIfNoCase User-Agent .scoutjet. bad_bot
    SetEnvIfNoCase User-Agent .majestic-12. bad_bot
    SetEnvIfNoCase User-Agent .majestic-seo. bad_bot
    SetEnvIfNoCase User-Agent .dsearch. bad_bot
    SetEnvIfNoCase User-Agent .blekkobo. bad_bot

    <Limit GET POST HEAD>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>
     
    • Thanks Thanks x 3
  18. mirrorer

    mirrorer Jr. VIP Jr. VIP

    Joined:
    Jan 30, 2009
    Messages:
    1,311
    Likes Received:
    1,134
    So much bullshit here and useless responses
    .htaccess works but must be configured well on Apache!!!!

    Op, go to /etc/apache2/. There will be a file named apache2.conf
    You have to edit that one(you should have root permission). Change directory text like this

    Code:
    <Directory /var/www/>
        Options Indexes FollowSymLinks
        AllowOverride All
        Require all granted
    </Directory>
    Restart apache:

    Code:
    service apache2 reload
    
    Credit: http://askubuntu.com/questions/421233/enabling-htaccess-file-to-rewrite-path-not-working

    You choose op,either take useless advice from noobs or from someone who has been blocking majesti,ahrefs and all 3rd party links crawlers since 2014 successfully over 100+ pbns

    Warning:
    Don't use Robots.txt as it is a clear footprint and everyone one can access it.

    With .htaccess you're blocking them on server side BEFORE they access your website and nonone beside you and your hosting company doesn't know what you're blocking.
    You can block visitors by checking:
    • User Agent
    • Refere
    • Coockie
    • Host etc.
    Here is the official documentation how it works: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html

    here is the .htaccess code that I've been using with success
    Code:
    SetEnvIfNoCase User-Agent .*SemrushBot.* bad_bot
    SetEnvIfNoCase User-Agent .*SemrushBot-SA.* bad_bot
    SetEnvIfNoCase User-Agent .*MJ12bot.* bad_bot
    SetEnvIfNoCase User-Agent .*AhrefsBot.* bad_bot
    SetEnvIfNoCase User-Agent .*RavenCrawler.* bad_bot
    SetEnvIfNoCase User-Agent .*Rogerbot.* bad_bot
    SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
    SetEnvIfNoCase User-Agent .*exabot.* bad_bot
    SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
    SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
    SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
    SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
    SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
    SetEnvIfNoCase User-Agent .*semrushbot.* bad_bot
    SetEnvIfNoCase User-Agent .*ia_archiver.* bad_bot
    SetEnvIfNoCase User-Agent .*searchmetricsbot.* bad_bot
    SetEnvIfNoCase User-Agent .*seokicks-robot.* bad_bot
    SetEnvIfNoCase User-Agent .*sistrix.* bad_bot
    SetEnvIfNoCase User-Agent .*lipperhey spider.* bad_bot
    SetEnvIfNoCase User-Agent .*ncbot.* bad_bot
    SetEnvIfNoCase User-Agent .*backlinkcrawler.* bad_bot
    SetEnvIfNoCase User-Agent .*archive.org_bot.* bad_bot
    SetEnvIfNoCase User-Agent .*meanpathbot.* bad_bot
    SetEnvIfNoCase User-Agent .*pagesinventory.* bad_bot
    SetEnvIfNoCase User-Agent .*aboundexbot.* bad_bot
    SetEnvIfNoCase User-Agent .*spbot.* bad_bot
    SetEnvIfNoCase User-Agent .*linkdexbot.* bad_bot
    SetEnvIfNoCase User-Agent .*nutch.* bad_bot
    SetEnvIfNoCase User-Agent .*blexbot.* bad_bot
    SetEnvIfNoCase User-Agent .*ezooms.* bad_bot
    SetEnvIfNoCase User-Agent .*scoutjet.* bad_bot
    SetEnvIfNoCase User-Agent .*majestic-12.* bad_bot
    SetEnvIfNoCase User-Agent .*majestic- seo.* bad_bot
    SetEnvIfNoCase User-Agent .*dsearch.* bad_bot
    SetEnvIfNoCase User-Agent .*blekkobo.* bad_bot
    SetEnvIfNoCase User-Agent .*screaming frog seo spider/*.* bad_bot
    
    <Limit GET POST HEAD>
    
    Order Allow,Deny
    
    Allow from all
    
    Deny from env=bad_bot
    
    </Limit>
    
    This code works 99% of the time. 1% is apache configuration issue

    Good luck
     
    • Thanks Thanks x 4
  19. starki

    starki Power Member

    Joined:
    Jul 17, 2012
    Messages:
    710
    Likes Received:
    233
    A pleasure! You are not at risk, since Google loves spam reports. Penalizing those who feed the webspam team with precise and extensive reports wouldn't be helpful. Doesn't mean a competitor reporting your site can't be successful, actions often result in reactions. Don't underestimate Karma. People penalized often get in a "If I get penalized, everybody else must be penalized, too." mood. Tends to end in reporting wars with no real winner.
     
    • Thanks Thanks x 1
  20. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Floopa75, i just replaced the htaccess code on my PBNs with what you pasted there. How can i test if it truly works, if i force Majestic to crawl the site by sending them the link (whatever its called), should that be enough?

    Or will that raise any kind of concern if the bot cannot crawl it despite having a manual link being sent to the engine?