1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

SEO Content Tuning Experiment

Discussion in 'White Hat SEO' started by validseo, Sep 9, 2015.

  1. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    SEO Content Tuning Experiment

    Summary:


    • Keyword stuffing works
    • H1 not as important as people claim
    • Keywords near top of page not so important, but not unimportant
    • Nofollow links are much stronger than do follow links in on page content
    • Keywords in alts tags do matter quite a bit
    • Short domains do rank better
    • Keywords in your forms... yup they matter
    • Strong, bold, em BAD!!!! Underline Good???? Maybe.
    • And a zillion more things in the data... I'm giving you ALL the data.

    What I did:

    I measured 200 different content factors for each of the top 10 Google search results for 25 VERY niche high competition keyword searches. This post contains some of the interesting highlights as well as ALL of the data.

    About the sample:

    I chose the niche of "DUI Attorney" because it is extremely high competition. I wanted to measure content tuning factors in a niche where the websites would be financially motivated to invest in getting FREE organic clicks. At $50-$200 per click in most major metros these websites would be very motivated to get every free click they can. I chose 25 metros so each search was similar to "dui attorney phoenix" or "dui attorney Chicago" etc. So the sample got lots of different websites with extremely similar search intent. The local block search results were ignored. For this study I am only interested in what is trending in the organic results.

    Sample Size:

    25 search terms in a very specific niche.

    About the correlation:

    Correlation % is calculated based on what percentage of the sample did the average of the first 5 results trend with the average of the second 5 results in favor of the factor. So for a factor like "Domain Name Length" where shorter is supposed to be better we want to see the average domain length decrease as you approach number one across all the keywords and visa versa for factors that should increase with rankings.

    Because of the way we calculate and use correlation you should know that a correlation of 50% is random. It literally means that the factor trended properly for half of the sample. So the higher the correlation gets above 50% the more interesting it gets. When the correlation is smaller than 50% it most likely means that there was insufficient data on the factor in the sample.

    If you wonder why the correlations fall into quantized amounts it is because of the sample size. Common fractions of the sample produce those like values.

    Correlation != Causation:

    Yes. Yes. I know. But unless you work at Google on their algorithms then correlation is the best empirical data you are going to get. Plus the people who gripe about correlation often forget that if it doesn't correlate then it probably isn't causal either SO correlation can vastly narrow the field of THINGS you need to guess about. Another saying anti-correlation people should remember: "If it doesn't correlate then you aren't using data." I'd also accept "If it doesn't correlate then you are just blind guessing."

    About the Charts and Tables:

    Most of the charts and tables I present in this post show the min, average, and max values for the specified factor across all 25 searches FOR the ranking the result had. The tables show the values and the charts show the lines, but the most important number is the correlation % which indicates how well or poorly the factor performed across the whole sample.

    About Image Attachments:

    BHW only allows 8 image attachments. I have a lot more than 8 things to show. There is also a KB size limit to the attachments. So I combined them into a single image. So I apologize but you'll have to open the image in a new window so you can see the charts and data as your read the post.

    Disclaimer:

    I did use software I created and some day hope to offer. I am not selling it currently and fully intend to have a paid membership when I do. I am just sharing the data and insights I am able to produce with the BHW community first.

    Lets Begin

    Myth: Keyword Stuffing Doesn't Work

    Saying the keyword more appears to matter... It trends 22% better than random! It makes sense. At its core, Google Web Search is a typical document indexing engine. This kind of solution has existed since the 70's and they all suffer from the same kinds of limitations. For example, when all other things are equal whoever says it more wins. Look at those average and max values across the samples. That's a lot of stuffing, but at $170 CPC if it helps is it worth it? The myth that Google solved keyword stuffing and that it doesn't work is TOTALLY BUSTED. That doesn't mean it isn't risky and you wont get punished for it. You might.

    Myth: H1 is the most important heading

    In terms of Bing this myth is totally true. In terms of Google it is totally busted. H1 tags only correlated 84% of the time. H2 correlated 88%. H4 correlated 84%. When I treated H4-H6 as a group they correlated 96%... H1-H6 only correlated 88%. H1-H3 only correlated 80%. This makes sense to me. I believe Google uses Bayesian methods to identify web spam based on a training set. I bet that training set exploits the heck out of H1 tags thus diminishing its value as a signal. Just my theory... but H1 being most important... BUSTED

    Myth: It Helps To Have Keywords Near Top

    "Keywords near the top" correlates weakly... only 14% better than random. That is not a strong factor. I tested both full source and text with HTML removed. They produced the same result. As a factor it is plausible, but weak.

    Links

    Holy Guacamole! I guess you want a fair amount of no follow links on your pages. If we use the "do follow" correlation as the control then nofollow links are 20% more important! It is interesting that keyword matches in "on page links" don't correlate as strongly as either the number of do follow or nofollow links. Honestly, I'm not sure what this means.

    Images

    1. Keyword matches in image alt text trends 42% better than random! I guess a photo (with relevant alt text) is worth a thousand words (of content)!

    2. Clearly the keyword matches in the alt text matters about 20% more than just having the images with alt text.

    3. Since it is technically an image... the favicon... look how smoothly that average transitions... As factors go this is pretty good, trending 26% better than random. There are a lot of simple little things like this that you can and should add like privacy policy, terms of service, a copyright, apple touch icons. Everything a typical spammy web page might opt to not include.

    Forms

    There seems to be something about having a web form that uses your keywords. I guess that demonstrates some sort of intention to address a visitor need.

    URLs & Domains

    1. Keywords in the URL... only trends 10% better than random. Lower than I thought it would be.
    2. Short domain name? Yup! There appears to be a pattern that shorter is better.
    3. And keywords.html appears to have value too!

    Page Size

    Strangely the number of words and number of sentences in the page content correlated at 50% or completely random... not a factor, but the kilobyte size of page did trend 30% better than random. It appears you want to be right around 150Kb in page size. Weird.

    Emphasis

    Strong tags, bold tags, and em tags failed to correlate. They all 3 had values. It makes me wonder if over using these hurts your rankings. If so, it would be one of the few things I've ever measured that actively hurts rankings. The one outlier was italic tags which did trend as a factor 42% better than random. That's huge. I can't help but bring up my Bayesian theory again here. That could explain it.

    Wrapping Up

    As I promised, below is a link to a zip archive containing all the data. I hit the attachment limit for this post. In the archive there is a spreadsheet containing the 200 measurements for each result of the 25 search terms. There is an aggregate.xlsx file that contains the correlations across the whole sample set. There is A LOT more to discover in the data and I encourage you to take a closer look.

    Let me know what you think?

    Thanks!

    [​IMG]

    All the data in a 6MB zip file: https://drive.google.com/file/d/0B9Xgfy4uuUsrdDN5b0xHV2F1WGs/view?usp=sharing
     

    Attached Files:

    • Thanks Thanks x 28
    Last edited: Sep 9, 2015
  2. neverquitting

    neverquitting Regular Member

    Joined:
    Nov 10, 2014
    Messages:
    410
    Likes Received:
    327
    Location:
    United States
    Home Page:
    It's obvious there's a lot of hard work put into this. I'm gonna have to take a look at that zip file soon. When you say some of these things trend better than random, what exactly do you mean? What does a random amount of italicizing mean? What does random KW density mean?
     
  3. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    If you read about my correlation near the top I go into this. I look at the top ten results for each search in the sample set. I calculate the average of the 1st five results and an average for the second five results for each of the 200 factors. The correlation % is the percentage of the time the two averages trended in favor of the factor across the whole sample set.

    A correlation of 50% means that half of the time the factor trended with ranking position and half of the time it didn't. The flip of a coin... or a random outcome. A correlation of 75% is the same thing as "trending 25% better than random". It tells you how strongly or weakly a factor appears to matter in terms of rankings across all the searches.

    not random italicizing. I was stating how strong the correlation was for using keywords in I tags with rankings.

    I don't think I said that... search for it on page and can't find it in my post.
     
    • Thanks Thanks x 1
    Last edited: Sep 9, 2015
  4. neverquitting

    neverquitting Regular Member

    Joined:
    Nov 10, 2014
    Messages:
    410
    Likes Received:
    327
    Location:
    United States
    Home Page:
    Like I said, I'll have to look through your data. I have no idea how you'd have a large enough sample size nor a meaningfully representative sample to draw these types of conclusions.


    You're misunderstanding me, and I think it stems from what you mean by better/worse than random. Saying "more italicization is better" isn't very helpful, because surely there's an upper limit. Same with KW density. I also don't think you're correctly calculating correlation in the statistical sense, but again, I'll know with more certainty when I look at the data.

    Thanks for your work.
     
  5. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    No worries... I am just trying to clarify.

    Actually large samples cause a lot of problems. Google handles different kinds of queries differently. You don't want to cross those boundaries in this kind of experiment.

    When you see the searches I used and read about why it should help put it into perspective.

    I can draw these conclusions because they are empirical measures of how keywords are being used in a very specific niche. If content tuning is real then these empirical measures are how you reach competitive parity with your content tuning in that niche.

    I honestly don't care what is or isn't a factor. I just like getting more and better data that pushes them more one way or the other.


    Sure there are upper limits... for I tags with keyword matches it was 189 time on a page. For KW stuffing it was 3319 times in result number 6 for one of the searches.

    I think this approach is very helpful and very actionable. If you are on page 2 wondering why your not on page 1 then you have to measure how you're different and make changes.

    But yes you have to cherry pick the reasonable along with what is worth the time and risk.

    For my uses I am, but I agree it is subjective which is why I detailed how I calculated it. All the raw measures are in the spreadsheets so you can calculate any way you like. :) (note: if you subtract 50% from my correlation percentages you get both strength and direction... which would be akin to the correlation coefficient you are probably looking for.)

    Thank you for your thoughts. I hope you share more after you dig deeper.
     
    Last edited: Sep 9, 2015
  6. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    UPDATE: I'm adding a proper Spearmans ranking correlation coefficient to the software so future runs will have the academic value... There are like 50 different correlation coefficients so it still is going to be a big argument everywhere I go and I still think my "% better than random" is far more practical to use.

    see https://statistics.laerd.com/statis...ank-order-correlation-statistical-guide-2.php

    but your point is taken and appreciated.
     
  7. mindmaster

    mindmaster Jr. VIP Jr. VIP

    Joined:
    Sep 16, 2010
    Messages:
    2,765
    Likes Received:
    1,240
    Home Page:
    Wayyy to much information.

    Thanks for sharing.
     
  8. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    I get that a lot. But its actually most likely in the range of not enough to just enough. I do get that it is way too technical. I do miss what this forum used to be. :)
     
    • Thanks Thanks x 2
  9. wizard04

    wizard04 Elite Member

    Joined:
    Apr 1, 2014
    Messages:
    2,700
    Likes Received:
    2,538
    Location:
    Outside your house
    There is no such thing as way to much information, every information and data relating about what we do is welcomed.
     
    • Thanks Thanks x 1
  10. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,156
    Likes Received:
    33,711
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
    I am reserving judgement until I have looked at the data, but looks like it might be interesting.
     
  11. myopic1

    myopic1 Regular Member

    Joined:
    Mar 24, 2014
    Messages:
    408
    Likes Received:
    404
    Ridiculous statement. It's posts like this that make this forum great, appreciate the effort and looking forward to picking through your data.
     
    Last edited: Sep 10, 2015
  12. spappa

    spappa Regular Member

    Joined:
    Oct 21, 2014
    Messages:
    287
    Likes Received:
    47
    Great work. I love it when someone actually does some real analysis. I do have a couple of questions for you.

    Number of matches in web page I tags. The average for #1 is 85.
    What exactly is the 85? Is it 85 <i>'s or is it 85 characters that are within the <i> tag?

    With number of matches - H1 to H6 tags, for #1 the average is 33. Do these pages have on average 33 headings/<h> tags because that seems really high.
     
  13. myopic1

    myopic1 Regular Member

    Joined:
    Mar 24, 2014
    Messages:
    408
    Likes Received:
    404
    What's the Y-axis on the number of matches in webpage HTML source?
     
  14. phirex

    phirex Power Member

    Joined:
    Nov 17, 2009
    Messages:
    515
    Likes Received:
    259
    Thanks a lot!
    So long since I read anything useful here!
    Keep up the good work, and please put me on your list of potential clients for the tool as I will be more than happy to pay you for this.
    Thanks again!
     
  15. richinca

    richinca Registered Member

    Joined:
    Feb 3, 2010
    Messages:
    66
    Likes Received:
    7
    First of all, Thank you. We can't know the algorithm, but we can see how it reacts to different things. Really an excellent summary.

    I also believe that G looks at things dynamically (how things change with time) where ever possible.
     
  16. metafser

    metafser Regular Member

    Joined:
    Jul 20, 2014
    Messages:
    449
    Likes Received:
    268
    Gender:
    Male
    Occupation:
    Digital Marketing Influencer
    Home Page:
    Interesting experiment. I like your "Emphasis" Point. The true word is "Over Optimizing". Most of the people are doing this. Worth Share mate. :)
     
  17. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    keyword was used 85 times in <i> tags.

    Yes... but keep in mind the niche I analyzed... At a CPC of around $150 per click... it is an arms race to over optimize your competitors... If I analyzed a niche like "used cars" where the CPC is $20 the degree of tuning needed would be much less. That is a big reason why you don't use a sample size of 1000 search terms... you would be blending niches which would be suboptimal for competing in any one niche.
     
  18. validseo

    validseo Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 17, 2013
    Messages:
    910
    Likes Received:
    527
    Occupation:
    Professional SEO
    Location:
    Seattle, Wa
    x axis is the rank position in google and the y-axis is the measurement data for the factor across all searches in the sample.
     
  19. mindmaster

    mindmaster Jr. VIP Jr. VIP

    Joined:
    Sep 16, 2010
    Messages:
    2,765
    Likes Received:
    1,240
    Home Page:
    What's ridiculous is that you can't spot a joke.
     
  20. SEO FOX

    SEO FOX Jr. VIP Jr. VIP

    Joined:
    Apr 27, 2015
    Messages:
    3,529
    Likes Received:
    726
    Gender:
    Male
    Location:
    Infront Of U!!
    Home Page:
    Awesome experiment very interesting share buddy. A big hard work is here Cheers.