There must be a way to crack Google's mechanism for 'quality content' detection..

forwardedlandlines

Power Member
Joined
Feb 10, 2012
Messages
635
Reaction score
455
I can't believe that Google judges content at the degree of a human being. There MUST BE a way to pass the google check with generated content. And I as a programmer want to make it real. Please respond with your ideas what it might be based on and how it might be possible to battle it..
 
Last edited:
Of course there is.

Google doesn't "parse" your entire text. It looks for overt plagirism, spelling and sentence structure above all else.
So code a bot with creates proper sentences, grammatically correct, none scraped (so using a dictionary, language analysis and then grammar checking tool) and work from there.

I'm not saying it's simple - but many AI bots have already done 75% oif the work for you (There's the hint - check out the AI community for this)
Synonyms are getting weaker and weaker and Google uses "super synonyms now"

I can change the term "motor scooter" and "moped" in a synonym spinner as much as I like - but Google treats them as the exact same phrase these days.
Many synonyms still get through - but Google is getting smart to them and parsing individual blocks of words for meaning and context.

When I did some EW a couple of years back the chat bots that were available back then were getting pretty darn awesome. Like I said - check the AI community for a kick start.

Scritty
 
I will create a tool that auto updates your site with keyword targeted content that fully passes Google's mechanism and is SEO'd up to post penguin standards. I will reuse the same thing for a tool that generates spintax for all your backlinks (wikis/web2s/articles..). But first some other great projects :) Doesn't mean that ideas can't be discussed here! It will be really helpful for me when I actually start it.
 
Of course there is.

Google doesn't "parse" your entire text. It looks for overt plagirism, spelling and sentence structure above all else.
So code a bot with creates proper sentences, grammatically correct, none scraped (so using a dictionary, language analysis and then grammar checking tool) and work from there.

I'm not saying it's simple - but many AI bots have already done 75% oif the work for you (There's the hint - check out the AI community for this)
Synonyms are getting weaker and weaker and Google uses "super synonyms now"

I can change the term "motor scooter" and "moped" in a synonym spinner as much as I like - but Google treats them as the exact same phrase these days.
Many synonyms still get through - but Google is getting smart to them and parsing individual blocks of words for meaning and context.

When I did some EW a couple of years back the chat bots that were available back then were getting pretty darn awesome. Like I said - check the AI community for a kick start.

Scritty

I guess its pretty hard to do with a bot now that just replaces words. you've got to change the sentence structure too and make sure its grammatically correct.
 
So code a bot with creates proper sentences, grammatically correct, none scraped (so using a dictionary, language analysis and then grammar checking tool) and work from there.

I'm not saying it's simple - but many AI bots have already done 75% oif the work for you (There's the hint - check out the AI community for this)
Synonyms are getting weaker and weaker and Google uses "super synonyms now"

If you use a dictionary, and put a random verb in the place where a verb should be in the sentence, then providing you do the same for all words in the sentence, then the sentence have no meaning, even it is grammatically correct.

But anyway, do you know where can i find sample code for language processing and a dictionary.
 
Did you ever wonder, if google can "detect" this at the level people are pretending, why the most competitive niches are filled with Xrumer spam in jibberish?
 
Speaking on grammar, There are a lot of big sites (you can basically call them authority sites) that dont have the best grammar. i'm talking mostly about hip-hop/urban sites that speak in Ebonics you can say. You dont see these sites get penalized. So theres more to it than just grammar alone
 
There you go, straight from Google's mastermind:

Detecting spam documents in a phrase based information retrieval system
Code:
http://www.google.com/patents/US7603345.pdf

Document similarity detection
Code:
http://www.google.com/patents/US7734627.pdf
 
Seems like they split sentences into 2 word groups and then check how many duplicates there are (when comparing 2 documents).

So this way synonyms should work well, actually much better than random-words-made-sentences.

Regarding spam filtering, they check if there are phrases that are common for this industry, and then check if there are no phrases not related to the industry.
 
Google does have people who quality check websites.

Check out what LionBridge does, the people who work for them basically decide if a website is spammy or not.
 
^OP is talking about googles algorithm, bots, that check sites. not manual reviews
 
^OP is talking about googles algorithm, bots, that check sites. not manual reviews

Well what's the chance your keyword #1 site entry will be manually reviewed? how much % per 1000 searches/month ? (just an idea)
 
What I observed is even if the sentences are not closely related but of the same topic and unique (given the correct grammar and structures), content still works well.
 
[h=2]There must be a way to crack Google's mechanism for 'quality content' detection..[/h]

There is and I'm close. I have one site bouncing hard. One day its up to page one and the next its down. lol I will be looking for some beta testers as soon as I can somewhat figure out what is causing the severe bouncing. All the other test are going great though. The trick is leaving no foot print and nothing spammy.
 
Why don't you post here an example of this "non-spammy" content you use on your site?
 
As somebody who is heavily involved in natural language processing, I'll clear up some stuff for you.

1. Google's quality content detection is not as good as people think it is. They have done a good job convincing people it does a better job than it actually does.

2. Google's (accurate) quality content detection is too expensive for them to do for the vast majority of sites. It takes a lot of computational resource for them to do this detection so they cannot do it for most sites. They use a much less strict algorithm for the bulk of sites out there which is why there is a lot of gibberish still out there.

3. I have a spinner (WordAi) that can automatically switch around the structure of sentences like this:
{{You should buy|You should purchase|You should obtain} my {product|merchandise}|My {product|merchandise} should be {bought|purchased} by you}.
Where in that case it was able to do phrase spinning that completely switched around the order of the sentence, the verb tenses, added in new words, all while having it still make sense and mean the same thing.

Really your goal should be to have content that looks like it was written by a native English speaker (automated systems cannot quite do this yet but they are getting very close), and can pass Copyscape (which can already be done) and you will be able to get around any quality check Google can have. Because if your content reads just like a humans would there is nothing Google can do to detect whether it was automatically generated or not (since the quality of the two are the same anyways)!
 
google is mankind's greatest engineering feat - let that sink in for a moment.

i haven't looked into any of it personally, but judging by the replies in this thread, alot of you aren't really aware of basic spam detection much less clustering/classification techniques. if you want to combat the enemy, first you have to understand what the enemy is capable of. look up naive bayes to get a simple idea of what is possible -
P(A|B) = P(A and B) / P(B)
. machine learning and ai are probably being relied upon heavily for google's spam detection services. if you're going to combat it effectively, your solutions will have to employ similar techniques. a valid point was brought up - machine learning/data mining solutions can be expensive for google to employ. while this is true to a certain extent, it's not entirely true. we're talking about guys that graduated at the top of their class at ivy league schools just to sweep the floors much less work on actual algorithms (exaggeration but you get the point) -- accept this and you'll do a lot better.
 
Last edited:
google is mankind's greatest engineering feat. i haven't looked into any of it personally, but judging by the replies in this thread, alot of you aren't really aware of basic spam detection much less clustering/classification techniques. if you want to combat the enemy, first you have to understand what it can do. look up naive bayes to get a simple idea of what is possible - P(A|B) = P(A and B) / P(B). machine learning and ai are probably being relied upon heavily for google's spam detection services. if you're going to combat it effectively, your solutions will have to employ similar techniques. a valid point was brought up - machine learning/data mining solutions can be expensive for google to employ. while this is true to a certain extent, it's not entirely true. we're talking about guys that graduated at the top of their class at ivy league schools just to sweep the floors much less work on actual algorithms (exaggeration but you get the point.) -- accept this and you'll do a lot better.

There are enough ivy graduates on the black hat side of the globe, so its a fair game. :-)
 
How about this?

List of Keywords (and LSIs)==> Instant Article Wizard==>WordAi==> CorrectEnglish/WhiteSmoke==>Copyscape Check==> Blog

If somebody looped the above process, wouldn't that qualify as unique content?

EDIT: You could also automate the article fetching from the archived caches of expired High-PR domains.
 
Last edited:
Penguin is flawed and Google will continue patching it to correct the problems. Because it is so random, and still booting whitehat sites out of the SERPS, it will be difficult to pin down what it is exactly targeting. Obviously, more than one or two variables are at play. Time will tell and hopefully some patterns will begin to emerge. That's pretty much how it works with all of their major algorithm updates. Penguin will be definitely be put on ice. The only question is when and how much collatoral damage Google is willing to accept.
 
Back
Top