1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

You don't know SH!T about SEO if you don't read and understand this!

Discussion in 'Black Hat SEO' started by SEO20, Mar 3, 2011.

  1. SEO20

    SEO20 Elite Member

    Joined:
    Mar 25, 2009
    Messages:
    2,017
    Likes Received:
    2,259
    OK, I admit a bit harsh headline - but it's true.
    Besides the resent changes in google favoring the social graph - content is still relevant for your sites and backlink strategies - surprise ;-)

    This is about how the search engines look at your content so you can create/spin better content.

    I have been working on generating premium unique articles/content, based on keywords, for years on and off. A resent JV kicked it back to life at full speed again.
    Well you will hear more once it's ready and all that.

    While working on this I went through my private files and notes about this. Some of the notes are based on datasets published by the search-giants for other purposes.

    I realised this info is still not available anywhere so because I'm a nice guy and all I want to share parts of this in public.

    Search-engines index their content via it's crawlers and store them in several tables in their advanced database-setup.

    When searching for content (or detecting dublicate content) search-engines split combinations of words together also refered as n-grams.

    There are stored several million ngrams.

    "this is cool" is a 3 word ngram
    "this is awesome" is also a 3 word ngram that has the same meaning.

    Since the search-engines don't really understand the content - it uses a lot of ngrams to link (or glue) searches/relevance together.

    How many tires of n-grams that are stored are hard to say. I have proof that Microsoft stores up to 7-tiers n-grams.
    So nearly every combination of 7 words - a LOT of data I can tell you. You can think of this as a MONSTER spyntax-database on steorids.

    ngrams is just a small part of the game creating a search-engine.

    Let's look at some real examples from a real database (simplified for better reading):

    Code:
    as growth is unlikely to | is set for zero growth
    as growth is unlikely | economic growth
    as growth is unlikely | growth
    as growth is unlikely | is forecasting gdp growth
    
    
    Another example:
    Code:
    bruce almighty | also generated the | bruce almighty earned
    
    bruce almighty | also generated the | bruce almighty
    
    bruce almighty | also generated the | film bruce almighty
    
    bruce almighty | also generated the | jim carrey comedy bruce almighty
    
    
    So you see here that bruce almighty is also linked to "Jim Carrey" and knows that it's a comedy.

    Code:
    cisco systems plans to announce %%number%% | cisco systems is to announce %%number%%
    clinton said %%day%% . | clinton said on %%day%%
    
    Note here that days and numbers can vary in a single ngram.

    Please spare us all for comments like - you knew that and have done it for years.

    Enjoy and hopefully I have helped you a bit.
     
    • Thanks Thanks x 144
    Last edited: Mar 3, 2011
  2. philionaire

    philionaire Regular Member

    Joined:
    Mar 20, 2010
    Messages:
    212
    Likes Received:
    180
    Location:
    Vanland
    Thanks for that.

    I new there was a reason for {deep|multi level} spinning, although never knew the reason behind it. It make a lot more sense now knowing about ngrams.

    Thats a hell of a lot of spinning they can do automatically. As said, to get the most out of the search engines in the future everything should be spun on multiple levels, and, although it takes a heck of a lot more time and effort, the rewards should pay dividends.
     
  3. OnlineGodfather

    OnlineGodfather Senior Member

    Joined:
    Mar 3, 2010
    Messages:
    1,117
    Likes Received:
    406
    Occupation:
    Interwebs
    Location:
    Russia
    seo20 has once again shown that he has a great knowledge in this. These kind of threads we need :p
     
    • Thanks Thanks x 4
  4. SEO20

    SEO20 Elite Member

    Joined:
    Mar 25, 2009
    Messages:
    2,017
    Likes Received:
    2,259
    So even though you spin your articles ngrams can link them together.
    Hope you understand.
     
    • Thanks Thanks x 8
  5. kazhkaz

    kazhkaz Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 19, 2010
    Messages:
    1,238
    Likes Received:
    369
    This sentence made everything clear :D thank you for great info and making it easy to understand for newbies like me :)
     
    • Thanks Thanks x 2
  6. blight12

    blight12 Regular Member

    Joined:
    Mar 5, 2009
    Messages:
    317
    Likes Received:
    66
    I would be really interested to know what your solution would be? I tend to create 1 article by initially piecing together 4 -7 articles and then using contentboss do the the spinning. And then additionally spinning sentences as well. I have never had a problem with this method in terms of maintaining rankings.

    Another strategy is scraping related articles into a large database and then sentence spinning the whole 50k pages in the file. Ths should result in i unique content.
     
    • Thanks Thanks x 1
  7. wannabie

    wannabie Elite Member

    Joined:
    Mar 11, 2009
    Messages:
    3,807
    Likes Received:
    2,954
    Occupation:
    Seo and Marketing Suprisingly
    Location:
    Your bedroom window
    Home Page:
    I have been discussing this with a client and found to explain it in the way you have!

    Bingo!
     
  8. observer

    observer Power Member

    Joined:
    Apr 7, 2010
    Messages:
    731
    Likes Received:
    22
    Ok, so what is the solution indeed?
     
  9. satyawrat

    satyawrat Jr. VIP Jr. VIP

    Joined:
    Jul 8, 2009
    Messages:
    933
    Likes Received:
    1,186
    Occupation:
    Hustler
    Location:
    Gurgaon
    Home Page:
    Another technique used by search engines to link related word or ngrams is singular value decomposition. Consider the ngrams to be in form of a matrix [a really huge one] and and tons of mathematical operations carried out help it detect related words, duplicate stuff and lot more.
    SEO is not very far from mathematics if you try to go deeper.

    The solution is to vary the content in such a way that it cannot be tracked by Google, and google has a huge matrix of ngrams on which is uses Singular Value Decomposition and other mathematical operations to track similarity. So the easy way out is to rewrite instead of spinning :p
     
    • Thanks Thanks x 12
    Last edited: Mar 3, 2011
  10. redstone.1337

    redstone.1337 BANNED BANNED Jr. VIP Premium Member

    Joined:
    Dec 30, 2009
    Messages:
    1,259
    Likes Received:
    999
    So, is it not enough if the content is passing copyscape?
     
    • Thanks Thanks x 1
  11. satyawrat

    satyawrat Jr. VIP Jr. VIP

    Joined:
    Jul 8, 2009
    Messages:
    933
    Likes Received:
    1,186
    Occupation:
    Hustler
    Location:
    Gurgaon
    Home Page:
    Does copyscape employ anything like this? I doubt so, because it would be very resource demanding. It would make them pay up loads of money to programmers and there are million other reasons.

    I have personally never used copyscape, so i cannot go into the nuts and bolts of the method they use.
     
    • Thanks Thanks x 3
  12. reinie

    reinie Elite Member

    Joined:
    Jan 16, 2009
    Messages:
    1,574
    Likes Received:
    1,040
    Mmm...great man,i gues i dont know shit...lol
     
  13. daveguy

    daveguy Power Member

    Joined:
    Nov 22, 2010
    Messages:
    591
    Likes Received:
    251
    Location:
    Florida
    I'm right there with you haha. Im ganna do some googling on ngram and singular value decomposition
     
  14. Monrox

    Monrox Power Member

    Joined:
    Apr 9, 2010
    Messages:
    615
    Likes Received:
    579
    People, don't get carried away :)
    It is true that n-grams work the way seo20 explained but this is not a cause to panic.

    An example, if you take a look at Hamlet and Othello you will get many matching n-grams. Both are from the same author and he inevitably uses some phrases and variations thereof more than others. It is true for every human brain. Still, nobody considers Othello a spun version of Hamlet lol.

    But there are always a lot of matching n-grams even between authors because of the rigid language constructs during each 1-2 generations.


    At any given time a language only has ~10 000 really used words. No matter how hard someone tries, he/she must eventually use an n-gram variaton to continue creating text. The longer the text, the more unavoidable reusing a construct becomes.

    And since search engines feed on long texts, they are forced to ignore repetition, otherwise they would consider anything reasonably long a copy of everything else with similar length.


    Here's an even clearer example:
    Code:
    The inflamation of this segment of the gastrointestinal tract
    is related to disturbed bowel functions.
    Pain, nausea and fever is a classic presentation.
    Patients in this state describe pain in the right half of the abdomen.
    Code:
    When this particular zone of the bowel becomes inflamed,
    it is due to the digestive system not functioning correctly.
    Pain, fever and nausea are present in almost all cases.
    Individuals with this condition may present with right-sided abdominal pain.
    The amount of n-grams is stunning, isn't it?


    However, the first passage is about apendicitis, the second one about diverticulitis. They are two VERY different acute illnesses and treated absolutely differently. A switched up treatment may result in death.

    Any search engine will fail very miserably if it decides to consider the two paragraphs duplicate content.


    If we examine music, the frequency of n-grams is even higher because there are even less musical notes (than words in a language) and a repetition will be hit after a lot less combinations.

    'In the Hall of the Mountain King' and 'Green Sleeves' have a lot of matching n-grams. But we are not considering them spun versions of the 9th symphony. Of course music has nothing to do with text (but with YT :D ).

    N-grams are used to find out whether some content can be loosely grouped together, for example to destinguish medical texts from mathematical texts but anything more restrictive than that will result in a huge margin of error.
     
    • Thanks Thanks x 62
    Last edited: Mar 3, 2011
  15. J0kerz

    J0kerz Supreme Member

    Joined:
    Nov 2, 2009
    Messages:
    1,415
    Likes Received:
    435
    Occupation:
    IM
    Location:
    There
    In short,

    Stop spinning old rehashed articles and start writing Unique content.
     
    • Thanks Thanks x 3
  16. Chris269

    Chris269 Newbie

    Joined:
    Apr 25, 2010
    Messages:
    26
    Likes Received:
    48
    Thanks for the informative post. So using software to spin articles is not useful after all?! I almost brought an article spinning software. Now I'm just going to skip it. Thanks!
     
  17. teeniegenie

    teeniegenie Supreme Member

    Joined:
    Aug 28, 2010
    Messages:
    1,296
    Likes Received:
    662
    Location:
    The Cool Part of Vegas
    Buy seo20's info when he decides to sell it - :D
     
  18. killer2021

    killer2021 Regular Member

    Joined:
    Sep 9, 2010
    Messages:
    229
    Likes Received:
    76
    Probably the best way to write, "unique content" is to just read some other articles and then paraphrase it alot. Basically just restating it in a different way. I am not talking about plopping some synonyms in there or moving some sentences down I am talking about writing entire new sentences that state generally the same meaning.
     
    Last edited: Mar 3, 2011
  19. cristianraiber

    cristianraiber Regular Member

    Joined:
    Nov 22, 2008
    Messages:
    293
    Likes Received:
    381
    Occupation:
    Onliner
    Location:
    Internet
    How about using Markov chains combined with a thesaurus ? :>
     
  20. paincake

    paincake Power Member

    Joined:
    Aug 18, 2010
    Messages:
    716
    Likes Received:
    3,099
    Home Page:
    you just defined spinning