1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

There must be a way to crack Google's mechanism for 'quality content' detection..

Discussion in 'Black Hat SEO' started by forwardedlandlines, May 27, 2012.

  1. forwardedlandlines

    forwardedlandlines Jr. VIP Jr. VIP

    Joined:
    Feb 10, 2012
    Messages:
    540
    Likes Received:
    372
    I can't believe that Google judges content at the degree of a human being. There MUST BE a way to pass the google check with generated content. And I as a programmer want to make it real. Please respond with your ideas what it might be based on and how it might be possible to battle it..
     
    Last edited: May 27, 2012
  2. Scritty

    Scritty Elite Member Premium Member

    Joined:
    May 1, 2010
    Messages:
    2,807
    Likes Received:
    4,496
    Occupation:
    Affiliate Marketer
    Location:
    UK
    Home Page:
    Of course there is.

    Google doesn't "parse" your entire text. It looks for overt plagirism, spelling and sentence structure above all else.
    So code a bot with creates proper sentences, grammatically correct, none scraped (so using a dictionary, language analysis and then grammar checking tool) and work from there.

    I'm not saying it's simple - but many AI bots have already done 75% oif the work for you (There's the hint - check out the AI community for this)
    Synonyms are getting weaker and weaker and Google uses "super synonyms now"

    I can change the term "motor scooter" and "moped" in a synonym spinner as much as I like - but Google treats them as the exact same phrase these days.
    Many synonyms still get through - but Google is getting smart to them and parsing individual blocks of words for meaning and context.

    When I did some EW a couple of years back the chat bots that were available back then were getting pretty darn awesome. Like I said - check the AI community for a kick start.

    Scritty
     
    • Thanks Thanks x 4
  3. forwardedlandlines

    forwardedlandlines Jr. VIP Jr. VIP

    Joined:
    Feb 10, 2012
    Messages:
    540
    Likes Received:
    372
    I will create a tool that auto updates your site with keyword targeted content that fully passes Google's mechanism and is SEO'd up to post penguin standards. I will reuse the same thing for a tool that generates spintax for all your backlinks (wikis/web2s/articles..). But first some other great projects :) Doesn't mean that ideas can't be discussed here! It will be really helpful for me when I actually start it.
     
  4. BlackxHat

    BlackxHat Power Member

    Joined:
    Oct 6, 2009
    Messages:
    591
    Likes Received:
    78
    I guess its pretty hard to do with a bot now that just replaces words. you've got to change the sentence structure too and make sure its grammatically correct.
     
  5. moonlighsunligh

    moonlighsunligh Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2010
    Messages:
    1,623
    Likes Received:
    218
    If you use a dictionary, and put a random verb in the place where a verb should be in the sentence, then providing you do the same for all words in the sentence, then the sentence have no meaning, even it is grammatically correct.

    But anyway, do you know where can i find sample code for language processing and a dictionary.
     
  6. Expertpeon

    Expertpeon Elite Member

    Joined:
    Apr 22, 2011
    Messages:
    1,959
    Likes Received:
    1,187
    Did you ever wonder, if google can "detect" this at the level people are pretending, why the most competitive niches are filled with Xrumer spam in jibberish?
     
  7. BlackxHat

    BlackxHat Power Member

    Joined:
    Oct 6, 2009
    Messages:
    591
    Likes Received:
    78
    Speaking on grammar, There are a lot of big sites (you can basically call them authority sites) that dont have the best grammar. i'm talking mostly about hip-hop/urban sites that speak in Ebonics you can say. You dont see these sites get penalized. So theres more to it than just grammar alone
     
    • Thanks Thanks x 1
  8. lancis

    lancis Elite Member

    Joined:
    Jul 31, 2010
    Messages:
    1,632
    Likes Received:
    2,384
    Occupation:
    Entrepreneur
    Location:
    Milky Way
    Home Page:
    There you go, straight from Google's mastermind:

    Detecting spam documents in a phrase based information retrieval system
    Code:
    http://www.google.com/patents/US7603345.pdf
    Document similarity detection
    Code:
    http://www.google.com/patents/US7734627.pdf
     
    • Thanks Thanks x 3
  9. moonlighsunligh

    moonlighsunligh Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2010
    Messages:
    1,623
    Likes Received:
    218
    Seems like they split sentences into 2 word groups and then check how many duplicates there are (when comparing 2 documents).

    So this way synonyms should work well, actually much better than random-words-made-sentences.

    Regarding spam filtering, they check if there are phrases that are common for this industry, and then check if there are no phrases not related to the industry.
     
  10. Griller

    Griller BANNED BANNED

    Joined:
    Apr 9, 2008
    Messages:
    698
    Likes Received:
    221
    Google does have people who quality check websites.

    Check out what LionBridge does, the people who work for them basically decide if a website is spammy or not.
     
    • Thanks Thanks x 1
  11. BlackxHat

    BlackxHat Power Member

    Joined:
    Oct 6, 2009
    Messages:
    591
    Likes Received:
    78
    ^OP is talking about googles algorithm, bots, that check sites. not manual reviews
     
  12. forwardedlandlines

    forwardedlandlines Jr. VIP Jr. VIP

    Joined:
    Feb 10, 2012
    Messages:
    540
    Likes Received:
    372
    Well what's the chance your keyword #1 site entry will be manually reviewed? how much % per 1000 searches/month ? (just an idea)
     
  13. Four Seasons

    Four Seasons Regular Member

    Joined:
    Aug 22, 2011
    Messages:
    409
    Likes Received:
    206
    Location:
    Cottonballs
    What I observed is even if the sentences are not closely related but of the same topic and unique (given the correct grammar and structures), content still works well.
     
  14. Comic

    Comic Regular Member

    Joined:
    Jun 17, 2010
    Messages:
    287
    Likes Received:
    190
    Occupation:
    Marketing & SEO
    Location:
    Phoenix, AZ
    Home Page:
    There is and I'm close. I have one site bouncing hard. One day its up to page one and the next its down. lol I will be looking for some beta testers as soon as I can somewhat figure out what is causing the severe bouncing. All the other test are going great though. The trick is leaving no foot print and nothing spammy.
     
  15. moonlighsunligh

    moonlighsunligh Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2010
    Messages:
    1,623
    Likes Received:
    218
    Why don't you post here an example of this "non-spammy" content you use on your site?
     
  16. artificial_genius

    artificial_genius Jr. VIP Jr. VIP

    Joined:
    Sep 27, 2011
    Messages:
    864
    Likes Received:
    314
    Home Page:
    As somebody who is heavily involved in natural language processing, I'll clear up some stuff for you.

    1. Google's quality content detection is not as good as people think it is. They have done a good job convincing people it does a better job than it actually does.

    2. Google's (accurate) quality content detection is too expensive for them to do for the vast majority of sites. It takes a lot of computational resource for them to do this detection so they cannot do it for most sites. They use a much less strict algorithm for the bulk of sites out there which is why there is a lot of gibberish still out there.

    3. I have a spinner (WordAi) that can automatically switch around the structure of sentences like this:
    Where in that case it was able to do phrase spinning that completely switched around the order of the sentence, the verb tenses, added in new words, all while having it still make sense and mean the same thing.

    Really your goal should be to have content that looks like it was written by a native English speaker (automated systems cannot quite do this yet but they are getting very close), and can pass Copyscape (which can already be done) and you will be able to get around any quality check Google can have. Because if your content reads just like a humans would there is nothing Google can do to detect whether it was automatically generated or not (since the quality of the two are the same anyways)!
     
    • Thanks Thanks x 3
  17. gooldude13

    gooldude13 Newbie

    Joined:
    Jun 11, 2011
    Messages:
    24
    Likes Received:
    25
    google is mankind's greatest engineering feat - let that sink in for a moment.

    i haven't looked into any of it personally, but judging by the replies in this thread, alot of you aren't really aware of basic spam detection much less clustering/classification techniques. if you want to combat the enemy, first you have to understand what the enemy is capable of. look up naive bayes to get a simple idea of what is possible -
    . machine learning and ai are probably being relied upon heavily for google's spam detection services. if you're going to combat it effectively, your solutions will have to employ similar techniques. a valid point was brought up - machine learning/data mining solutions can be expensive for google to employ. while this is true to a certain extent, it's not entirely true. we're talking about guys that graduated at the top of their class at ivy league schools just to sweep the floors much less work on actual algorithms (exaggeration but you get the point) -- accept this and you'll do a lot better.
     
    Last edited: May 28, 2012
  18. lancis

    lancis Elite Member

    Joined:
    Jul 31, 2010
    Messages:
    1,632
    Likes Received:
    2,384
    Occupation:
    Entrepreneur
    Location:
    Milky Way
    Home Page:
    There are enough ivy graduates on the black hat side of the globe, so its a fair game. :)
     
  19. moeatwa

    moeatwa Regular Member

    Joined:
    Jul 23, 2011
    Messages:
    486
    Likes Received:
    475
    Occupation:
    Material Engineering Undergrad Student/ Blackhat I
    Location:
    Land of the Nile
    How about this?

    List of Keywords (and LSIs)==> Instant Article Wizard==>WordAi==> CorrectEnglish/WhiteSmoke==>Copyscape Check==> Blog

    If somebody looped the above process, wouldn't that qualify as unique content?

    EDIT: You could also automate the article fetching from the archived caches of expired High-PR domains.
     
    Last edited: May 28, 2012
  20. assphuck

    assphuck Senior Member

    Joined:
    Feb 22, 2009
    Messages:
    1,196
    Likes Received:
    905
    Penguin is flawed and Google will continue patching it to correct the problems. Because it is so random, and still booting whitehat sites out of the SERPS, it will be difficult to pin down what it is exactly targeting. Obviously, more than one or two variables are at play. Time will tell and hopefully some patterns will begin to emerge. That's pretty much how it works with all of their major algorithm updates. Penguin will be definitely be put on ice. The only question is when and how much collatoral damage Google is willing to accept.