1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[DATA] - W3 Validation and SERP results

Discussion in 'White Hat SEO' started by Fwiffo, Oct 13, 2010.

  1. Fwiffo

    Fwiffo Power Member

    Joined:
    Apr 7, 2010
    Messages:
    562
    Likes Received:
    325
    Occupation:
    Starship Captain
    Location:
    Pluto / Spathiwa
    Thought I'd share some data from a recent experiment.

    No "smoking gun", or anything earthshaking / sensational here, however I know there's a few people here who enjoy data like this, thought I'd share some of mine

    I wanted to see if there was any correlation with SERP results and W3 validation - ie does google have a bias for pages that meet w3 validation? / is there a "maximum error" threshold that a page must meet to rank at a certain level? / is there any correlation between w3 validation results and SERP rankings?

    While W3 validation likely isn't possible for on-the-spot serp rankings, it could be something that is part of a "deeper" scan that is analyzed on a separate process.

    I took 2 very competitive terms, and checked w3 validation for the top 20 results, and compared it with results from #81-100.

    My personal notes after reading the data:

    - To answer original questions - no correlation or identifier that I can see at the moment
    - The results from 81-100 had less "extreme" w3 errors than the top 20 had
    - The two sets of 81-100 rankings had only 1 site with more than 110 W3 errors
    - Two sets of 1-20 rankings had 4 sites with more than 110 W3 errors
    - No site had more than 1000 W3 errors (may be a threshold?)

    One thing I gained from this is that higher "extreme" w3 errors on the top 20 pages may show that there is "more going on" on those pages, and that may indicate that google prefers pages that have "more going on" - not enough data to conclude this, however opens a few questions for other tests later on.

    So why am I posting this?
    - others have posted their results from different experiments, I appreciate what they have done, and this is my way of "giving back" (even without "groundbreaking" data)
    - someone else may find other use for this data to help their projects
    - someone may see something here that I've missed, and point it out
    - this might trigger an idea for someone else to post similar data
    - to help people decide where to allocate their time

    Keep in mind - there are tons of ways that this data could be flawed. this is the results from one search from one ip, at one moment in time (this evening). My data could have been pulled from a "test mode" in google, and SERP results may be affected by my personal searching history.

    Here is my data:

    search term "online poker"

    SERP Ranking / W3 Errors / W3 Warnings
    1/642/89
    2/passed/
    3/passed/
    4/6/6
    5/66/8
    6/2/
    7/177/318
    8/34/23
    9/11/
    10/11/
    11/928/51
    12/passed/
    13/36/4
    14/4/10
    15/14/5
    16/can't be checked/
    17/75/598
    18/96/12
    19/1/
    20/passed/

    81/can't be checked/
    82/32/
    83/can't be checked/
    84/18/8
    85/20/21
    86/47/42
    87/7/
    88/83/45
    89/37/11
    90/12/2
    91/45/
    92/38/
    93/69/5
    94/23/24
    95/36/
    96/12/1
    97/4/3
    98/8/84
    99/53/6
    100/473/138

    search term "new york dentist"

    SERP Ranking / W3 errors / W3 Warnings

    1/415/14
    2/10/
    3/43/3
    4/329/137
    5/25/21
    6/411/5
    7/22/8
    8/passed/
    9/55/54
    10/80/11
    11/1/
    12/43/7
    13/46/4
    14/434/13
    15/11/1
    16/passed/
    17/passed/
    18/70/2
    19/44/
    20/5/

    81/8/11
    82/7/
    83/107/3
    84/102/11
    85/9/7
    86/14/4
    87/3/
    88/76/42
    89/38/15
    90/51/2
    91/54/28
    92/can't be checked/
    93/15/14
    94/23/
    95/2/4
    96/50/
    97/171/32
    98/40/4
    99/17/56
    100/109/28
     
    • Thanks Thanks x 2
  2. ipopbb

    ipopbb Power Member

    Joined:
    Feb 24, 2008
    Messages:
    626
    Likes Received:
    844
    Occupation:
    SEO & Innovative Programming
    Location:
    Seattle
    Home Page:
    Several things are interesting in your data...

    I find the error tolerance for #1 meets my observation of increased exploitation of the #1 spot since Caffeine rolled out.

    I like your concept of error and warning counts but let me suggest a test that might equate to google's costs more directly and thus would be a more likely factor.

    Same tests as above but calculate the benchmark time in milliseconds for parsing the source into a DOM object. Not the network time. Just the measurement of the source code efficiency as a function of DOM parsing time. I would suspect that if there is a factor in this area that it would be something that would project out well in cloud computing costs.

    Really great experiment. You've really got me thinking.


    Thanks,

    Ted
     
    • Thanks Thanks x 3
  3. shadowbox

    shadowbox Registered Member

    Joined:
    Apr 27, 2010
    Messages:
    58
    Likes Received:
    12
    Occupation:
    Freelance Web/Graphic Designer
    Location:
    California, USA
    I am actually going to continue your test in a more in depth fashion. How current is this data?
     
    • Thanks Thanks x 1
  4. xgnux

    xgnux Regular Member

    Joined:
    Sep 26, 2008
    Messages:
    492
    Likes Received:
    149
    Occupation:
    STudent
    Location:
    Germany
    lol you cant test it this way. you need to create and equal envoirement as best as you can. So no Backlinks and such at least and 10 testsites.
     
  5. ipopbb

    ipopbb Power Member

    Joined:
    Feb 24, 2008
    Messages:
    626
    Likes Received:
    844
    Occupation:
    SEO & Innovative Programming
    Location:
    Seattle
    Home Page:
    Maybe or maybe not. If you took the test prototype and expanded it to all top 100 for say 3000 different high volume searches and then looked for trending by calculating min,max, and average errors and warnings per result by page number ...

    then if it is a factor or related to a factor the min,max,avg lines on the chart should approach a limit as you move from result 100 to result 1 on the X axis.

    If it is not a factor then all 3000 searches should produce non-trending plot from one search to the next.

    This kind of approach does work fairly well for prominent factors... It does not work well for weak factors and tie breaker conditions and sparse factors.

    It does also point out special exceptions you might not be aware of because the outliers are usually getting special rule breaking treatment...

    off course you have to start somewhere and a simple experiment prototype is a great way to begin testing the water before investing heavily.
     
    • Thanks Thanks x 1
  6. Fwiffo

    Fwiffo Power Member

    Joined:
    Apr 7, 2010
    Messages:
    562
    Likes Received:
    325
    Occupation:
    Starship Captain
    Location:
    Pluto / Spathiwa
    thanks ipopbb. will look into your suggestions and post if I run the data again (not a priority at the moment, a few months out). I think you could be on the mark for DOM parsing time, however do you mean that just as an indicator of how google caches website data (as it's likely a similar format), or are you aware of them using DOM format specifically?

    Other thing I thought of for a test like this if one could develop it on a larger scale - could W3 validity (or parsing time) be used as a metric to qualify the quality of a backlink?

    one could put a fairly simple metric that says "sites that take a long time to load / load with lots of errors are of lower quality, therefore are of "lower quality backlink"

    while that would be way more horsepower to figure out, it might give a different graph of link quality than comparing backlinks to PR

    appreciated! please do

    data was taken last night.

    note the semi-disclaimer towards the end of the post - there's so much that "could" be wrong with how it was obtained and measured, that it's not a full experiment, however it's a start for anyone looking to replicate further.

    another thing to add to it is that I was running strict xhtml requirements - so for example putting in
    Code:
    <br> instead of <br/>
    would result in an error under the "strict" guidelines (or is it a warning? I forget...).

    For all I know, this could be repeated under non-strict with very different results.

    perhaps, and that may give better results, however creating a "sterile" environment for testing would probably take me more time than just bringing all my sites up to w3 standards.

    even then, looking at how google is ranking sites "in the wild" can (potentially) tell quite a bit - repeating the same thing over and over (much more than I did) can potentially tell more than even a sterile test.

    Keep in mind a test like this could also show a false positive, based on webmasters who SEO well could also be nitpicky about code. Could show a correlation where one doesn't exist, but that's just part of putting data together.
     
  7. ipopbb

    ipopbb Power Member

    Joined:
    Feb 24, 2008
    Messages:
    626
    Likes Received:
    844
    Occupation:
    SEO & Innovative Programming
    Location:
    Seattle
    Home Page:
    No nothing like that... just that DOM parsing time would be a decent relative analog for whatever parsing Google is doing on the source. The more complicated the source is for parsing the more resources and time it will take google to analyze it. So if the DOM parse benchmark is faster than most then google can do more of them in less time. When multiplied out across the whole internet it just makes sense that google would eventually reward parsing efficiency to get the world to help keep their costs down. It probably would effect things like power consumption for them too... it could be the SEO way of "going green".



    I'd have to know one link's quality versus another's to start looking for trends in possible factors... I only have PageRank and Ranking Numbers from SERPs. Those I can measure. I haven't heard of link quality score coming out of google. I'm open to the idea. Just don't how to get at that kind of data.


    Its a decent thought experiment, but if it effects placements in a meaningful way then the individual metrics (errors,warnings,PR,loadtime,parse time,number of letter e's in source,etc.) should just trend as you approach number 1 on page 1.
     
    • Thanks Thanks x 1
  8. Fwiffo

    Fwiffo Power Member

    Joined:
    Apr 7, 2010
    Messages:
    562
    Likes Received:
    325
    Occupation:
    Starship Captain
    Location:
    Pluto / Spathiwa
    looking back on that idea, I think it was mostly a brain fart

    I think i meant something along the lines of seeing if there was a way to build a kind of "quality" filter for backlinks consisting of a combination of page PR / domain PR / page load speed (of the backlink) / number of links pointing to the link / anchor text / title text

    if one set it up a across a very very large set of data, it could be mined to determine if google is judging the "value" of a link in its rankings beyond simple PR - ie see if the "weight" of any parts of the data could be matched up by layering it over ranking data - some factors might be 0, others very high, would tell someone what to look for when building backlinks.

    once again, mostly a brain fart, that's one heck of a big project, both to setup and datamine.