• On Wednesday, 19th February between 10:00 and 11:00 UTC, the forum will go down for maintenance. Read More

PENGUIN UPDATE: 5 NEW Factors that Google Uses to Rank Websites

PartyNeon

Regular Member
Joined
Feb 27, 2012
Messages
209
Reaction score
602
I manage an agency, so I do everything I can to get an edge up on the algorithm. Myself and a few other SEOers have analyzed tons of data to figure out the breakdown of penguin. We put a lot of time into this research, and I would love to hear your input! (For those of you that like stat, we correlated these factors using Spearman and plotted a Normal Distribution with an alpha level of .05 yielding 95% confidence)

5 New Post-Penguin Factors:
  1. Anchor Text Diversity (Google Divides your number of links by the total number of different anchor texts used to link to your site)
  2. The ratio of the number of links to deep pages vs your home page
  3. The Composition of your backlinks TLDs
  4. Average number of characters per word and H2 breaks (5.1 characters per word is ideal with one H2 per 150 words)
  5. Time on Site (The time it takes a user to return back to Google, Dwell Time)



The New Factors - Broken Down

Factor 1: Anchor Text Variety
Google takes your total number of referring domains and divides it by the total number of different anchor texts that people use to link to your website. BUILDING LINKS TO BRANDED KEYWORDS IS NOT ENOUGH. The more links you have, the larger anchor text variety you should have. If you have 100 links, but those 100 website link to you using 5 different anchor texts... that is unnatural EVEN IF most of those links are branded keywords, like your site name.
Hint: Build anchor text for these type of keywords: "Visit the site", "click here" , "find out more information", "on their website", etc...

Factor 2: Ratio of Links to Deep Pages
Here is a sample breakdown of how the back link composition should look like on a website with 100 indexed pages and 120 links.
[TABLE="width: 400"]
[TR]
[TD]Page[/TD]
[TD]Ideal # Of Links[/TD]
[/TR]
[TR]
[TD]Homepage[/TD]
[TD]20 links[/TD]
[/TR]
[TR]
[TD]Popular Page[/TD]
[TD]10 links[/TD]
[/TR]
[TR]
[TD]Pages 1-10[/TD]
[TD]5 links each[/TD]
[/TR]
[TR]
[TD]Pages 10-20[/TD]
[TD]2 links each[/TD]
[/TR]
[TR]
[TD]Pages 20-40[/TD]
[TD]1 link each[/TD]
[/TR]
[TR]
[TD]Pages 40-100[/TD]
[TD]No links[/TD]
[/TR]
[/TABLE]
This is just an example and by no means actual data to rank well. What I am trying to illustrate is that don't build 1 link to every subpage. CHANGE IT UP. Build a ton of links to one subpage, then a decent amount of links to some more sub pages, but leave half of your pages without any links.

Factor 3: The Composition of the TLD's of your Backlinks
If 40% of your backlinks are from sites ending in .info, this is unnatural. On the other hand, if 100% of your backlinks are from sites ending in ".com", this is ALSO unnatural. In order to see top results, make sure your backlink profile is comprised of the following structure.
[TABLE="width: 400"]
[TR]
[TD]TLD[/TD]
[TD]Ideal % Of Links[/TD]
[/TR]
[TR]
[TD].com[/TD]
[TD]60%[/TD]
[/TR]
[TR]
[TD].net[/TD]
[TD]10%[/TD]
[/TR]
[TR]
[TD]country specific[/TD]
[TD]10%[/TD]
[/TR]
[TR]
[TD].org[/TD]
[TD]5%[/TD]
[/TR]
[TR]
[TD].co, .info, .biz[/TD]
[TD]5%[/TD]
[/TR]
[TR]
[TD]Other[/TD]
[TD]10%[/TD]
[/TR]
[/TABLE]

Factor 4: Average Character Length of Content
Spun articles tend to have either a significantly higher or significantly lower average character length than naturally written articles. In order to see top results, your average word should be 5.1 characters long.

Factor 5: Time on Site
Not everyone uses Google Analytics. That being said, Google doesn't need you to use analytics in order to get data about a users time on site. Dwell time measures both bounce rate and time-on-site metrics - I believe Google measures how long it takes for someone to return to a SERP after clicking on a result. This is data they can EASILY get from their own SERP data. They have already admitted to using this data in quality tests!

Summary:
There you have it. A brief overview of 5 new factors we have discovered since Penguin. This list is far from exhaustive, but these are ones I figured were worth mentioning. We spend hours and hours each day trying to break down the exact effects of every possible ranking factor. If their is a demand for users to see the actual math we used to determine this, PM me. I will send you our ACTUAL data and the true spearman correlation and Z-score data that was aggregated to come to these conclusions. THIS IS NOT SPECULATION but highly analyzed and correlated testing.

Take Home Points: Make sure your backlink profile looks natural, build alot of "garbage" anchor texts to increase diversity, keep your average word count at 5.1 characters, build links to deep pages (using the clever strategy above), and do something to make people stay on your site longer.

(Disclaimer: correlation is not causation. Just because these factors appear to effect rankings, doesn't mean they are actually in the Algorithm)
 
Last edited:
Excellent post, I appreciate the way you actually broke down each factor and explained it rather than just stating the 5 factors in bullet points with little information to back it up.
 
Thanks for the interesting data and posting it here. The interesting thing (to me) is that the sites that I've been helping that were hit by Penguin seem to have one thing in common: onsite "quality" issues such as over-optimization of keywords, canonical issues (creating duplicate content), one site had tons of 404's as a result of deleting a directory that no longer was relevant, but that had many many individual pages in it.

Anchor text variety does not seem to be as much of an issue as some have experienced and reported.

I'm seeing some positive results on the sites that had onsite quality issues, as they get fixed up. Sites that had little onsite quality issues, but that had a high ratio of keyword anchor text are still doing well and were not hit by Penguin at all.

I have come to an early conclusion that Penguin is about a variety of things. Onsite SEO was a major factor, from my observations.
 
Excellent data.

Not only done scientifically (Gotta love Spearman rank correlation) but also makes sense and laid out nicely.
Above all - despite what some might say here IT'S LOGICAL both for us in SEO and Google.
And it certainly doesn't rule out automation..it just rules out laziness and stupidity to some extent.

Thanks and rep given

Scritty
 
Nice post, well presented, thanks given.

I've been waiting for bounce rate to get big, and I think its only a good thing!
I was also waiting for a more complex anchor text algo, which you've made alot
clearer to me, I had no idea which way it would go. Thanks again!
 
nice post... I'm not sure about the number of characters though...
 
I am very interested to see the data that you used and calculations you performed to arrive at these conclusions.
 
Factor 5: Time on Site
Not everyone uses Google Analytics. That being said, Google doesn't need you to use analytics in order to get data about a users time on site. Dwell time measures both bounce rate and time-on-site metrics - I believe Google measures how long it takes for someone to return to a SERP after clicking on a result. This is data they can EASILY get from their own SERP data. They have already admitted to using this data in quality tests!

Now everyone should be aware of their bounce rate. Ideal bounce rate is less than 40% right? This means Bounce rate now affects rankings?
 
What do you suggest setting up as the home page? A post or page? Or it doesn't matter. Should I use yoast wordpress plugin?
 
Great post with lots of points that have been speculated on the forum for a while but this time backed up with evidence. Thanks and rep given.

Interesting what you say about characters though. The theory makes sense to try and identify spun articles but doing this with an average character number or range seems a bit flimsy to me and would potentially penalise a lot of people that are doing nothing wrong. I would have thought that they would do this with spin syntaxes instead. Google translate works like this so why not have it in the algorithm. It seems a better way to do it and the resources are already there.

Definitely interested in getting the data, i will send you a PM now.
 
Thanks for putting up so much effort. Its very informative and helpful. Hope you would add more on the topic soon.
 
What are your findings on negative SEO? Im still unsure if I should start ranking a brand new site. Im afraid all my hard work and money invested in the site will be wasted with some bitch spamming scrapebox and xrumer to my site.
 
Second informative post...in two days. Excellent going PartyNeon. You have drilled down the factors and it makes for easy understanding. But factor 4 is a little vague. How did you derive the 5.1 character average? And how does it relate to spun content?
 
Does achor text diversity apply on only regular, domain-backlinks or others such as Web 2.0, SB and etc.
Let's say I buy 100 backlinks, do I need to have 100 anchor texts?
How do you do your anchor texts? Domain name, random words, siteurl, ... ?
 
Last edited:
This is the best post which I've read this month.
 
Back
Top