Can Google detect/penalize GPT-3 or will it be able to do that in the future?

Cryptochick007

Junior Member
Joined
May 9, 2022
Messages
135
Reaction score
51
Hello BHW members,

I am thinking of setting up a site with 50% hand written content and 50% AI generated content.
Will I get in a dangerzone now or in the future?

Thanks so much for helping me out!
 
Google detecting ai generated content and you getting in dangerzone can be mutually exclusive.

That is, even if google can detect some of your ai generated content, it doesn't mean you will automatically get in trouble.

Just fact check your content yourself, make sure its readable and make sense, and most importantly, make sure the flow of content is continuous and coherent. That is, don't switch between writing styles and subjects frequently in a single post.
 
How can google detect it?
Set a thief to catch a thief.

The same tools that was used to make gpt-3 can be used to detect it too.

All you have to do is train an ai on a huge dataset (like really huge, like TBs large datasets of just text) of gpt-3 generated text (or other transformers like gpt-3). Label it as "ai content".

The ai will find hidden patterns of ai generated text on its own. Like factual inaccuracies that gpt-3 often makes, or certain words that might be considered "unnatural vocabulary".

A system like this would trigger a lot of false positives. But it would also remove the "obviously ai generated" looking content out there.

This was how gpt-2 text is detected. It gave fewer false positives tho because gpt-2 is smaller doesn't come close to the readability of gpt-3.

Theres nothing to say that Google has not been doing this already.

In fact, there has been a conspiracy that Google has already deployed it and the sites that got attacked in the May update were either ai generated or triggered false positives.
 
There is a GPT3 detector out there already, but I'm not allowed to share the URL on here as the guy also has other tools.

But if a dev can code this and it works, Google could do this very easily with their resources.
 
There is a GPT3 detector out there already, but I'm not allowed to share the URL on here as the guy also has other tools.

But if a dev can code this and it works, Google could do this very easily with their resources.
Huh, now that would be a cool thing to see, as I'm highly interested in practical AI and tech. :D
 
Set a thief to catch a thief.

The same tools that was used to make gpt-3 can be used to detect it too.

All you have to do is train an ai on a huge dataset (like really huge, like TBs large datasets of just text) of gpt-3 generated text (or other transformers like gpt-3). Label it as "ai content".

The ai will find hidden patterns of ai generated text on its own. Like factual inaccuracies that gpt-3 often makes, or certain words that might be considered "unnatural vocabulary".

A system like this would trigger a lot of false positives. But it would also remove the "obviously ai generated" looking content out there.

This was how gpt-2 text is detected. It gave fewer false positives tho because gpt-2 is smaller doesn't come close to the readability of gpt-3.

Theres nothing to say that Google has not been doing this already.

In fact, there has been a conspiracy that Google has already deployed it and the sites that got attacked in the May update were either ai generated or triggered false positives.
Good observation..but in this update i see many AI sites gained a lot of traffic,beside real niche sites are hitted badly.
 
The last few months, I've been an avid believer that they cannot find AI generated content.

However, lately, I've been reading & I've figured they're at least making strides to detect the obvious articles that have been posted by people who don't edit what the AI output gives them.

It'll happen eventually, but I don't see it realistically being deployed for at least another year.
 
Good observation..but in this update i see many AI sites gained a lot of traffic,beside real niche sites are hitted badly.
No algorithm that utilizes ml/dl will be 100% accurate.

Many ai sites are also huge af. We are talking like thousands or tens of thousands of pages of content.

An ai gen Detector algorithm would be compute intensive, to the point that running them on those sites at scale would be too expensive.

So google probably just ignored scanning those sites.

Meanwhile most niche sites are small. hardly a few have more than 500 articles. That means they are gonna get scanned first.
 
GPT-3 detector tool is already available. So, Google definitely can detect it as well. But, you can still give it a try as a churn and burn project.
 
The last few months, I've been an avid believer that they cannot find AI generated content.

However, lately, I've been reading & I've figured they're at least making strides to detect the obvious articles that have been posted by people who don't edit what the AI output gives them.

It'll happen eventually, but I don't see it realistically being deployed for at least another year.
Technologically, there's nothing that stops engineers from making software that can detect gpt-3 text.

This is a classic supervised learning classification problem. The class of problems ML literally began with. It has like the largest amount of literature in ml/dl research.

This class of problems are so foundational, that if you do an introductory ml course, solving something like this will be the first thing they will teach you. (classifying handwritten letter from the mnist dataset, or Classifying flowers from the Iris dataset).


You would bet that they have already solved identifying the obvious looking ai text. Now, it's about reducing the false positives which I guess is very frequent.
 
This class of problems are so foundational, that if you do an introductory ml course, solving something like this will be the first thing they will teach you. (classifying handwritten letter from the mnist dataset, or Classifying flowers from the Iris dataset).

You would bet that they have already solved identifying the obvious looking ai text. Now, it's about reducing the false positives which I guess is very frequent.

This is specifically what I was referencing would take ~1 year to implement.

For them to really drill down the fact checking & eliminating false positives down to the smallest ratio.
 
I wish Google good luck in detecting content generated by fine-tuned/ trained from scratch AI models :)

But, they don't really need it because as far as I know they're focusing a LOT on training their own AI models (and by the way, some of them are huge) to better understand what a piece of content is about, how relevant it is for a specified keyword, how well does it satisfy the search intent, what concepts or ideas is that content missing or what new and relevant information it is bringing etc.

Having in mind the multitude of AI models that have appeared on the market (and especially the multitude of models that will appear in the next few years) and the increased possibilities of fine-tuning & training new models, their decision to focus on upgrading their AI in order to better understand & evaluate the content seems like a very smart move

One example of AI model used by Google Search is their MUM (Multitask Unified Model). It's 1000 times more powerful than BERT (an old but still very capable Google AI model).
It’s trained across 75 different languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge than previous models. And MUM is multimodal, so it understands information across text and images and, in the future, can expand to more modalities like video and audio.
You can read more about it here: https://blog.google/products/search/introducing-mum/

In the end, if a piece of content is garbage, it doesn't matter if it has been written by a machine or by a human...it just doesn't deserve to rank or to be indexed. And that's the top priority of Google right now.
 
Last edited:
Most likly they can. If they do, is another question. Google handles some websites in it's index and google bots have some work to do. Every check google does is a decrease in it's own performance. So if they would use such a model, they will not do it on a broad level, but on sensibel topics or very high traffic stuff. I would argue, google has some kind of escalation for QA of a website, anyway and wouldn't use every metric on every site/serp.
 
I doubt this is an issue. Google may or may not be able to detect AI content, ultimately, what they do about it depends on whether the content in question is used in a context that helps or hurts Google's business model.

If the content is garbage, in a site full of other garbage that ends up having a really high bounce rate, and users don't use the site properly and don't move from page to page in a manner suggesting they find the content useful and/or relevant, then the site isn't really Ads-worthy in a sense, so Google has a lot of incentive to find sites like these and weed them out.

If on the other hand, the content is machine generated, but is useful enough that there's a lot of traffic that stays on the site and uses it, then the fact that it's machine generated doesn't really take away from its utility. In such a case, I doubt that Google is incentivized to penalize that site.

There's definitely a grey area here, there may be many false positives, but if you design your site for user engagement, you shouldn't have too many problems, even if the content is AI generated.
 
I wouldn't worry about this. Yes, perhaps G can detect ML generated content when presented with one, however, ML models are notoriously difficult to run in production, especially on global scale, so G probably utilizes ML resources for other things. If that was not the case. SERPs would look quite differently.
 
make sure its readable and make sense
even if it's not readable google would still not label it as AI generated as I've seen so many handwritten blogs that are way worse than AI contents. The worst thing Google can do is not rank you high in the SERP because your content sucks so review it before publishing ......unless your review skill is also like an AI LOL
 
Back
Top