Block ChatGPT Bot?

prodon

BANNED
Joined
Jan 31, 2023
Messages
696
Reaction score
195
User-agent: GPTBot
Disallow: /

What do you think?

What are the pro's and con's of doing so?

A major con (IMO) is that you aren't included in opportunities to be in the SGE Snippet?
 
A major con (IMO) is that you aren't included in opportunities to be in the SGE Snippet?
Considering Google does not use GPTBot or OpenAI data, it's not going to stop them using your content for SGE or their own training purposes.
 
Block google if you don't want to be in SGE.
 
I think without damaging the interests of your website. You don't have to do this
 
Pro: Stops bad bots from messing with your site and keeps it safe.

Con: Might accidentally block good bots, like those that help your site get noticed in search results.
 
Blocking the GPTBot could mean missing out on opportunities to be included in SGE Snippet , which could impact your site's

visibility in search results.

What does Google's SGE have to do with OpenAI's GPTBot?
 
What does Google's SGE have to do with OpenAI's GPTBot?
GPT stands for Generative Pre-training Transformer, so it does not mean it only excludes OpenAI. I would not block GPTBot, since it could also help with getting source reference links in the future or even suggested within commercial terms, not sure which niche you're in tho.
 
GPT stands for Generative Pre-training Transformer, so it does not mean it only excludes OpenAI. I would not block GPTBot, since it could also help with getting source reference links in the future or even suggested within commercial terms, not sure which niche you're in tho.

My friend, you are quite possibly a bit confused.

Every company that harnesses Web Data, uses a crawler. This crawler automatically crawls web pages (also referred to as a spider).

Now, a legit company with a legit crawler always declares it's bot name, and the subsequent IPs.

Google has GoogleBot (and several others), Microsoft has BingBot (and several others), DuckDuckGo has DuckDuckBot.

This is not just limited to search engines. Ahrefs has AhrefsBot and AhrefsSiteAudit, Semrush has SemrushBot,

//

Similar to the above, GPTBot is exclusively owned, operated and crawled by OpenAI

https://platform.openai.com/docs/gptbot
The user-agent token is very explicitly stating it -

Code:
User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Like every other legit firm, they also publicly mention the IPs they use to crawl

JSON:
{
  "creationTime": "2023-11-30T11:51:00.000000",
  "prefixes": [
    {
      "ipv4Prefix": "52.230.152.0/24"
    },
    {
      "ipv4Prefix": "52.233.106.0/24"
    }
  ]
}

//

So yes, blocking GPT Bot only prevents from OpenAI from using your content in their training data. If you want to block ChatGPT from accessing your block, you need to block the ChatGPT-User Bot (as of writing this, blocking one bot blocks the other)

P.S - It is Generative "Pre-Trained" Transformer and not "Pre-training". :)
 
Well, IMO, Blocking GPTBot from indexing seems like a bold move.

On the plus side, it gives website owners control over their content and can prevent potentially unwanted scraping or indexing by AI-driven bots. This might be especially appealing for those concerned about their content being used without explicit permission or in ways they hadn't intended.

However, as you pointed out, a significant drawback is missing out on the opportunities provided by being included in the SGE Snippet. The visibility and traffic that could come from being featured there are substantial. It's like a double-edged sword; you're protecting your content but potentially limiting its reach and the opportunities for engagement and growth that come with broader exposure.

I guess it boils down to what you value more: the control and security over your content or the potential benefits of increased visibility and engagement. It's a tough call and probably varies depending on the nature of your site and your goals. I'd love to hear more opinions on this, especially if anyone's seen direct impacts from making a decision either way.
 
You're right about blocking GPTBot with "Disallow: /". Here's the gist:
1. Cons: You miss out on Google's Featured Snippets (SGE Snippets) which can boost visibility.
2. Pros: You prevent your content from being used to train OpenAI's models, which might be important if you're concerned about how it's used.
 
No cons tbh, and the only benefit I can think of is that you save some server bandwidth by blocking the crawler.
 
Well, IMO, Blocking GPTBot from indexing seems like a bold move.

On the plus side, it gives website owners control over their content and can prevent potentially unwanted scraping or indexing by AI-driven bots. This might be especially appealing for those concerned about their content being used without explicit permission or in ways they hadn't intended.

However, as you pointed out, a significant drawback is missing out on the opportunities provided by being included in the SGE Snippet. The visibility and traffic that could come from being featured there are substantial. It's like a double-edged sword; you're protecting your content but potentially limiting its reach and the opportunities for engagement and growth that come with broader exposure.

I guess it boils down to what you value more: the control and security over your content or the potential benefits of increased visibility and engagement. It's a tough call and probably varies depending on the nature of your site and your goals. I'd love to hear more opinions on this, especially if anyone's seen direct impacts from making a decision either way.
Or bad chatbots.
ChatGPT.
"bold move" gave it away. "IMO" and "I guess" was a nice touch but not enough :)
 
Your site's visibility in search results could be affected if you block the GPTBot.
 
Back
Top