How to stop ChatGPT from Crawling your website?

sagarmalik007

Power Member
Jr. VIP
Joined
Mar 7, 2021
Messages
529
Reaction score
94
Chatgpt has launched a new model Alpha that can crawl a live website and scan everything on the website which is not good for website owners.
So in order to stop chatgpt from crawling we need to update our robots.txt file.
So Please add this in your robots.txt file-
User-agent: ChatGPT-User
Disallow:/

By adding this text to robots file it blocks chatgpt from crawling the website.

Looking forward to more suggestions as well

1682851198011.png
 
Interesting. But why it is not good for website owners?
Man then gpt can rewrite the content. For example, you have published a new blog or news and I have then immediately added your website to gpt and got the new content.
Do you want your content to get republished?
If gpt pays the website owners for crawling then it is fine
 
Wonderful, nice share buddy!
 
Man then gpt can rewrite the content. For example, you have published a new blog or news and I have then immediately added your website to gpt and got the new content.
Do you want your content to get republished?
If gpt pays the website owners for crawling then it is fine
The question shouldn't be aren't we worried that chatgpt can do this to OUR websites, its how can we take advantage of this with OTHER people's website niches?

I happily give away the option to browse my sites, as long as I can get the same exploited information from others to outperform them with my sites if I can.

If too many people use this, chatgpt will just use multiple useragents, and we will bitch while scraping the information from other sites through proxies! lol don't be a data whiner, be a data hustler. :cool:
 
I agree but everything has its pros and cons so
This is my opinion buddy and it will change I know
 
User-agent: ChatGPT-User
Disallow:/
was just about to add this to my robots.txt files but noticed that the code inside of these files reads like this:
Code:
User-Agent: *
Disallow: /cgi-bin
Disallow: /wp-
Disallow: /wp-includes
Disallow: /wp-admin/
Disallow: /wp-content/plugins
Disallow: /?s=
Disallow: *&s=
Disallow: /search
Disallow: /author/
Disallow: *?attachment_id=
Disallow: */feed
Disallow: */rss
Disallow: */embed
Allow: /wp-content/uploads/
Allow: /wp-content/themes/
Allow: /*/*.js
Allow: /*/*.css
Allow: /wp-*.png
Allow: /wp-*.jpg
Allow: /wp-*.jpeg
Allow: /wp-*.gif
Allow: /wp-*.svg
Allow: /wp-*.pdf

... which I think covers all user-agents because of the *. So, do I still need to add the chatGPT agent specifically, or am I good? :)
 
Man then gpt can rewrite the content. For example, you have published a new blog or news and I have then immediately added your website to gpt and got the new content.
Do you want your content to get republished?
If gpt pays the website owners for crawling then it is fine
I wonder how to use it for competitors' websites. ;)
 
What do you think you will gain from doing this, exactly?
 
was just about to add this to my robots.txt files but noticed that the code inside of these files reads like this:
Code:
User-Agent: *
Disallow: /cgi-bin
Disallow: /wp-
Disallow: /wp-includes
Disallow: /wp-admin/
Disallow: /wp-content/plugins
Disallow: /?s=
Disallow: *&s=
Disallow: /search
Disallow: /author/
Disallow: *?attachment_id=
Disallow: */feed
Disallow: */rss
Disallow: */embed
Allow: /wp-content/uploads/
Allow: /wp-content/themes/
Allow: /*/*.js
Allow: /*/*.css
Allow: /wp-*.png
Allow: /wp-*.jpg
Allow: /wp-*.jpeg
Allow: /wp-*.gif
Allow: /wp-*.svg
Allow: /wp-*.pdf

... which I think covers all user-agents because of the *. So, do I still need to add the chatGPT agent specifically, or am I good? :)
You don't have to add it as * covers everything. However, your robots.txt do not block any engine/crawler (including chatgpt) from reading your content.
 
I feared that this day would come. But thanks to BHW and coding... Blocking all the way!
 
You don't have to add it as * covers everything. However, your robots.txt do not block any engine/crawler (including chatgpt) from reading your content.
oh, I didn't know that... Is there any way to block them from reading the content?
 
Already added this to my WP blogs by adding a spiderblock plugin. I have even blocked ahrefs lol. Thank you OP for sharing
 
was just about to add this to my robots.txt files but noticed that the code inside of these files reads like this:
Code:
User-Agent: *
Disallow: /cgi-bin
Disallow: /wp-
Disallow: /wp-includes
Disallow: /wp-admin/
Disallow: /wp-content/plugins
Disallow: /?s=
Disallow: *&s=
Disallow: /search
Disallow: /author/
Disallow: *?attachment_id=
Disallow: */feed
Disallow: */rss
Disallow: */embed
Allow: /wp-content/uploads/
Allow: /wp-content/themes/
Allow: /*/*.js
Allow: /*/*.css
Allow: /wp-*.png
Allow: /wp-*.jpg
Allow: /wp-*.jpeg
Allow: /wp-*.gif
Allow: /wp-*.svg
Allow: /wp-*.pdf

... which I think covers all user-agents because of the *. So, do I still need to add the chatGPT agent specifically, or am I good? :)
Yeah
Just add the same is your robots.txt file
 
oh, I didn't know that... Is there any way to block them from reading the content?
You don't want to block all of the engines as how would then google/bing/yandex crawl and index your content?
 
You don't want to block all of the engines as how would then google/bing/yandex crawl and index your content?
I don't know, I'm not very bright :D

Although, to be honest I wouldn't mind blocking yandex...
 
At least you guys can try to crawl it yourself and have a confirmation if the mehtod works.
Best of luck to you all.
 
Back
Top