Block ChatGPT Bot?

prodon · Mar 12, 2024

User-agent: GPTBot
Disallow: /

What do you think?

What are the pro's and con's of doing so?

A major con (IMO) is that you aren't included in opportunities to be in the SGE Snippet?

Steptoe · Mar 12, 2024

prodon said:
A major con (IMO) is that you aren't included in opportunities to be in the SGE Snippet?

Considering Google does not use GPTBot or OpenAI data, it's not going to stop them using your content for SGE or their own training purposes.

4440 · Mar 12, 2024

Block google if you don't want to be in SGE.

420lounge · Mar 12, 2024

It could impact your site's visibility in search results ...

danny0616 · Mar 12, 2024

I think without damaging the interests of your website. You don't have to do this

BlogPro · Mar 12, 2024

420lounge said:
It could impact your site's visibility in search results ...

How exactly?

gregstereo · Mar 12, 2024

Pro: Stops bad bots from messing with your site and keeps it safe.

Con: Might accidentally block good bots, like those that help your site get noticed in search results.

420lounge · Mar 12, 2024

BlogPro said:
How exactly?

Blocking the GPTBot could mean missing out on opportunities to be included in SGE Snippet , which could impact your site's

visibility in search results.

BlogPro · Mar 12, 2024

420lounge said:
Blocking the GPTBot could mean missing out on opportunities to be included in SGE Snippet , which could impact your site's

visibility in search results.

What does Google's SGE have to do with OpenAI's GPTBot?

jorun · Mar 12, 2024

BlogPro said:
What does Google's SGE have to do with OpenAI's GPTBot?

GPT stands for Generative Pre-training Transformer, so it does not mean it only excludes OpenAI. I would not block GPTBot, since it could also help with getting source reference links in the future or even suggested within commercial terms, not sure which niche you're in tho.

Steptoe · Mar 12, 2024

jorun said:
GPT stands for Generative Pre-training Transformer, so it does not mean it only excludes OpenAI.

But GPTBot is explicitly OpenAI's bot, so yes, it does: https://platform.openai.com/docs/gptbot

jorun · Mar 12, 2024

Steptoe said:
But GPTBot is explicitly OpenAI's bot, so yes, it does: https://platform.openai.com/docs/gptbot

Oh my bad, still would not block with the reasons mentioned above.

BlogPro · Mar 12, 2024

jorun said:
GPT stands for Generative Pre-training Transformer, so it does not mean it only excludes OpenAI. I would not block GPTBot, since it could also help with getting source reference links in the future or even suggested within commercial terms, not sure which niche you're in tho.

My friend, you are quite possibly a bit confused.

Every company that harnesses Web Data, uses a crawler. This crawler automatically crawls web pages (also referred to as a spider).

Now, a legit company with a legit crawler always declares it's bot name, and the subsequent IPs.

Google has GoogleBot (and several others), Microsoft has BingBot (and several others), DuckDuckGo has DuckDuckBot.

This is not just limited to search engines. Ahrefs has AhrefsBot and AhrefsSiteAudit, Semrush has SemrushBot,

//

Similar to the above, GPTBot is exclusively owned, operated and crawled by OpenAI

https://platform.openai.com/docs/gptbot
The user-agent token is very explicitly stating it -

Code:

User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Like every other legit firm, they also publicly mention the IPs they use to crawl

JSON:

{
  "creationTime": "2023-11-30T11:51:00.000000",
  "prefixes": [
    {
      "ipv4Prefix": "52.230.152.0/24"
    },
    {
      "ipv4Prefix": "52.233.106.0/24"
    }
  ]
}

//

So yes, blocking GPT Bot only prevents from OpenAI from using your content in their training data. If you want to block ChatGPT from accessing your block, you need to block the ChatGPT-User Bot (as of writing this, blocking one bot blocks the other)

P.S - It is Generative "Pre-Trained" Transformer and not "Pre-training".

EliteDiscoveries · Mar 12, 2024

Well, IMO, Blocking GPTBot from indexing seems like a bold move.

On the plus side, it gives website owners control over their content and can prevent potentially unwanted scraping or indexing by AI-driven bots. This might be especially appealing for those concerned about their content being used without explicit permission or in ways they hadn't intended.

However, as you pointed out, a significant drawback is missing out on the opportunities provided by being included in the SGE Snippet. The visibility and traffic that could come from being featured there are substantial. It's like a double-edged sword; you're protecting your content but potentially limiting its reach and the opportunities for engagement and growth that come with broader exposure.

I guess it boils down to what you value more: the control and security over your content or the potential benefits of increased visibility and engagement. It's a tough call and probably varies depending on the nature of your site and your goals. I'd love to hear more opinions on this, especially if anyone's seen direct impacts from making a decision either way.

Somenpla · Mar 13, 2024

You're right about blocking GPTBot with "Disallow: /". Here's the gist:
1. Cons: You miss out on Google's Featured Snippets (SGE Snippets) which can boost visibility.
2. Pros: You prevent your content from being used to train OpenAI's models, which might be important if you're concerned about how it's used.

Steptoe · Mar 13, 2024

EliteDiscoveries said:
However, as you pointed out, a significant drawback is missing out on the opportunities provided by being included in the SGE Snippet.

Somenpla said:
1. Cons: You miss out on Google's Featured Snippets (SGE Snippets) which can boost visibility.

There's a severe lack of comprehension here. Or bad chatbots.

ePrime · Mar 13, 2024

No cons tbh, and the only benefit I can think of is that you save some server bandwidth by blocking the crawler.

noellarkin · May 30, 2024

EliteDiscoveries said:
Well, IMO, Blocking GPTBot from indexing seems like a bold move.

On the plus side, it gives website owners control over their content and can prevent potentially unwanted scraping or indexing by AI-driven bots. This might be especially appealing for those concerned about their content being used without explicit permission or in ways they hadn't intended.

However, as you pointed out, a significant drawback is missing out on the opportunities provided by being included in the SGE Snippet. The visibility and traffic that could come from being featured there are substantial. It's like a double-edged sword; you're protecting your content but potentially limiting its reach and the opportunities for engagement and growth that come with broader exposure.

I guess it boils down to what you value more: the control and security over your content or the potential benefits of increased visibility and engagement. It's a tough call and probably varies depending on the nature of your site and your goals. I'd love to hear more opinions on this, especially if anyone's seen direct impacts from making a decision either way.

Steptoe said:
Or bad chatbots.

ChatGPT.
"bold move" gave it away. "IMO" and "I guess" was a nice touch but not enough

Emisesary · May 30, 2024

Blocking can impact your site in many ways.

She Hulk · May 30, 2024

Your site's visibility in search results could be affected if you block the GPTBot.

Block ChatGPT Bot?

BANNED

Elite Member

Elite Member

Elite Member

Junior Member

Elite Member

Elite Member

Elite Member

Elite Member

Junior Member

Elite Member

Junior Member

Elite Member

Registered Member

Junior Member

Elite Member

Power Member

Senior Member

Junior Member

Elite Member

Main Menu

Marketplace

Making Money

BlackHat World