Is Content Reading Grade Level a Google Factor? (My findings)

Discussion in 'White Hat SEO' started by validseo, Jan 24, 2014.

    In a Jan 20, 2014 article on search engine watch a claim was made that the "grade level" readability is a Penguin factor. I decided to check some niches to see if there really is correlation with Google web search rankings.

    For niches like "Lung Cancer" I did find correlation with outliers being commonplace. If this is a factor then it is easy to make up for it in the rankings by overcompensating in other factors.

    Personally I think the readability algorithm is flawed because it is based on the notion that words with fewer syllables are more readable. But if we went to a 5th grade class and asked the students to define "cartoon" and "cherub" I know which one they will get wrong more often and they are both two syllable words. The algorithm does not account for how commonly words are used. Also if you went back 150 years the results of my example would probably be reversed.

    If reading level is a factor then at a minimum some niches will favor a college reading level over a high school reading level. In a test case like "Lung Cancer" this appear to be the case.

    What I found even more interesting was that across multiple niches average words per sentence correlated stronger than reading level as a possible factor.

    Some data:


    The Original Article Making The Claim:


    The algorithm:


    A Measurement Tool:


    Derivation of the algorithm:


    PS: For this post:

    Flesh-Kincaid Reading Ease: 66.7
    Flesh-Kincaid Grade Level: 7.2
    Average Grade Level: 8.2
    Avg Words per sentence: 13

    So... if reading level is a factor then it won't help this post. :)
    This data will be fun to play with. I'm curious about the number of sentences - how strong the correlation is, because the data seems to infer that less sentences are better for rankings. At face value, that sort of sounds like shorter content, but if taken in the context of greater words per sentence - no.

    I also wonder how tech content would be viewed - more on the higher education side of the needle or high school level. Will have to test this out with certain niches.

    Thanks for this data. I'm a fan of on page factors - not that off page doesn't count, but I've seen really well done content rank very well with little extra help from the latter - even in competitive b2b niches. I've also always wondered if G would get to the point where their software could start identifying spun content by analyzing grammar and such. In a way, this reminds me of college writing class where there is a grading rubric with all of these factors determining your grade.
