1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

PBN: Is blocking spiders with htaccess a footprint?

Discussion in 'Black Hat SEO' started by aussiejack, Nov 26, 2014.

  1. aussiejack

    aussiejack Regular Member

    Joined:
    Sep 23, 2014
    Messages:
    493
    Likes Received:
    62
    I built some PBN sites and want to block majestic/ahrefs spider etc via the htaccess file now. However, I am not sure if the google bot sees that and if it is a big footprint if I do this for all PBN sites?
     
  2. iamsolo

    iamsolo Power Member

    Joined:
    Jul 13, 2014
    Messages:
    510
    Likes Received:
    306
    Gender:
    Male
    You're so much paranoid about google. I dont think that was an fp. Why are you blocking bots in pbn sites? Blocking them in money site is enough.
     
  3. crazedspyker

    crazedspyker Senior Member

    Joined:
    Jan 5, 2010
    Messages:
    999
    Likes Received:
    657
    Because he doesn't want his competitors to easily find his entire PBN network. Blocking it on money site doesn't really make sense in this regard.

    And OP, I don't think it's a footprint like robots.txt and spyderspanker. You should be fine, a lot of legit sites block these robots for other purposes. If you are still paranoid, then just vary it up. Use .htaccess, spyderspanker, robots.txt at random. I don't bother with that personally.
     
    • Thanks Thanks x 2
    Last edited: Nov 26, 2014
  4. aussiejack

    aussiejack Regular Member

    Joined:
    Sep 23, 2014
    Messages:
    493
    Likes Received:
    62
    ok thanks man - -
     
  5. tb303

    tb303 Power Member

    Joined:
    Dec 18, 2011
    Messages:
    601
    Likes Received:
    280
    .htaccess is server side stuff.

    unless google has root access to your server they aint seeing it!
     
    • Thanks Thanks x 1
  6. Leith

    Leith Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Oct 30, 2011
    Messages:
    5,379
    Likes Received:
    8,565
    I do that too, it should be fine.
     
  7. rogerke

    rogerke Regular Member

    Joined:
    Oct 5, 2014
    Messages:
    262
    Likes Received:
    144
    They can just fetch under a different useragent than googlebot (for example MJ12bot from Majestic or ahrefsbot from ahrefs) and notice you're blocking these bots.

    Wether it's a footprint depends on wether they will do this and take action against sites like that. One of the complicated factors is mentioned earlier in this thread; a lot of legitimate sites block these bots as well, so I don't think they will ever take algorithmic actions against these sites. They might be passed on for manual review however. Nobody outside Google can give a valuable answer about this.

    One way to prevent it is by blocking IP ranges (of those backlink blocks) instead of useragents, but that's a bit tricky because you need to do a lot of due dillegence and the IP ranges might change overnight.

    All in all it's definitely worth it to block these bots in .htaccess. Not blocking them is simply not an option because you might as well paint a huge target on your head for your competitors to shoot at.
     
    • Thanks Thanks x 1
  8. Expertpeon

    Expertpeon Elite Member

    Joined:
    Apr 22, 2011
    Messages:
    1,959
    Likes Received:
    1,187
    Google routinely testing useragents they don't control is likely illegal (or at the very least, a nice civil suit for anyone they impersonate, though it could indeed be illegal to do so). Furthermore, server logs would tell you if Google did this, pretty massive risk for little reward.
     
  9. rogerke

    rogerke Regular Member

    Joined:
    Oct 5, 2014
    Messages:
    262
    Likes Received:
    144
    Interesting. Any source to do some further reading? I'm not familiar with US law, but would love to read why this could be illegal. Pretty sure Google already uses different useragents than googlebot to catch cloaking etc., so I'm pretty sceptical.

    If you're right blocking these bots in .htaccess is even safer than I thought though.
     
  10. gazmo

    gazmo Junior Member

    Joined:
    Jun 1, 2013
    Messages:
    121
    Likes Received:
    66
    Occupation:
    Software Engineer
    Location:
    Bulgaria
    Thanks, OP, now you made me paranoid too.

    In all seriousness though, I've never had a problem with that. Hopefully this won't be considered a footprint in the future either (although you're right that it can raise suspicion).
     
  11. Expertpeon

    Expertpeon Elite Member

    Joined:
    Apr 22, 2011
    Messages:
    1,959
    Likes Received:
    1,187
    Manipulating user-agents in order to bypass blocked content from crawlers or attempts to gain access to sites not permitted by design is typically going to be seen as unlawful in the US. Google does use a few user-agents, but these user-agents are controlled (and likely trademarked) by Google. Now when it's a major company, as Yelp! has proven over and over, they get to play by a completely different set of rules in the US (ie, falsifying reviews and extorting businesses is apparently legal according to the Federal Appeals court).

    However, Andrew Auerenheimer did indeed get put in jail in part for spoofing user-agents to gain access (or "test" the services) to a website. This is all the guy did, and it was seen as a hacking attempt. If Google did this, (granted it's a stupid idea to do so on their part due to the lack of information it would provide) I imagine it'd be lawsuit material.

    https://www.eff.org/files/2014/04/11/weev.pdf
     
    • Thanks Thanks x 1
    Last edited: Nov 26, 2014
  12. the_demon

    the_demon Jr. Executive VIP

    Joined:
    Nov 23, 2008
    Messages:
    3,177
    Likes Received:
    1,563
    Occupation:
    Search Engine Marketing
    Location:
    The Internet
    The main reason Google is unlikely to test .htaccess blocking is because they would need to use a lot more IPs to avoid being detected and also it would take a lot more computational resources as you would need to check over and over again using various user agents. Also, some services like CloudFlare and Distil Networks might block these bots for unusual behavior if they did this.
     
  13. ThopHayt

    ThopHayt Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 25, 2011
    Messages:
    5,396
    Likes Received:
    1,644
    Doing any one behavior across an entire network is probably not a great idea. As others have said I have no evidence Google checks htaccess but I cannot say for sure they don't. Nor can I say it is unlikely they will not eventually.

    My opinion: Mix it up, don't "always" do it.
     
  14. xha44a

    xha44a Power Member

    Joined:
    Dec 2, 2012
    Messages:
    532
    Likes Received:
    444
    Hi OP

    Google has no way of accessing your htaccess. You can't view it via your web browser. Go ahead and try. It's not something Google can access. It merely tells Apache what to do with certain visitors/bots etc. I wouldn't worry about it.
     
  15. Laubster

    Laubster Senior Member Premium Member

    Joined:
    May 21, 2013
    Messages:
    1,008
    Likes Received:
    377
    Occupation:
    Self employed
    Location:
    I Travel A Lot
    Home Page:
    Lot of people have asked this, but there are legitimate reasons for blocking bots outside of SEO, mainly to reduce server load. At this point its not a footprint, and most guys I know with insane PBN's are blocking every bot imaginable. If they're doing it with their 5 and 6 figure networks I think it's ok.