1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

The Real Story Behind the Google "Dance" & "SandBox"

Discussion in 'White Hat SEO' started by crazyflx, Jun 14, 2010.

  1. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    I've been doing a lot of thinking about this lately, and I believe I have a pretty solid hypothesis on just what the Google Dance & Sandbox actually are and what their purpose is. Think of this thread as a place to agree or disagree with what I'm about to say...just make your opinion heard by saying something. That's the best way for ideas and theories to progress (well, testing actually is, but you get the idea ;) )

    Every search engine out there has an algorithm. Simply put, an algorithm takes the variables it is programmed to process in, and spits out a result. Data In = Data Out

    I believe there are two things that can cause the downfall of any search engine and Google has figured out a solution to both of them (there are more than two, but I'm using just these two so as to keep this post short :) ).

    They are: A static algorithm that doesn't change...or doesn't change often enough & results returned to the user based solely on what I'll refer to as "hard data". Hard data meaning data that no matter who looks at it, it can only be interpreted one way. In other words, if I were to say whats 1+1? The only possible answer, no matter who looks at it, is 2.

    The first thing mentioned above (the static algorithm) can cause the downfall of a search engine because after awhile, it will be figured out. Once it is figured out, it can be manipulated. Obviously, as soon as the SERPs can be manipulated, they WILL be manipulated and the user experience for that search engine goes to crap...POOF, nobody goes to that SE any more. Google obviously updates it's algorithm on a regular, if not constant basis. So, first thing taken care of.

    Now I think Google has evolved to a whole new level by taking care of the second variable I mentioned above...only returning results to the user based on "hard data"...the data fed into the algorithm.

    Simple example: If X site has X amount of backlinks, it gets X position. That is an algorithm that is based on "hard data". There is nothing to interpret. You'll always get the same answer.

    I think Google has done something brilliant.

    I believe they have added what I'll now call "soft data" to their algorithm. What is soft data? Data that changes, data that can be interpreted more than one way, data that "flows" in one direction or another.

    Let me explain what I'm saying a little bit better and in a way any IM will understand.

    I think Google is split testing!

    They're getting how they rank their pages based on how users browse their results. But I think it goes much deeper than this. I think there is a distinct difference between the purpose of the Sandbox and the Dance.

    Google Dance: Site owner typically sees their site fluctate a couple of positions on google. Maybe they are jumping around from position 1 or 2 to position 7 or 8. Maybe they are jumping around from page 3 or 4 to page 1 and 2. Why is this happening? These changes can sometimes literally happen repeatedly in less than 12 hours. Nothing THAT drastic can change in less than 12 hours where ANY algorithm change could cause that much movement over and over and over again.

    What I think Google is doing is allowing their users to determine your search position!

    Obviously not entirely. They have their "hard data" in place that gives everything in Google it's initial blueprint, but they leave the rest up to its users. This is referred to normally as "crowdsourcing" although with crowdsourcing, the "crowd" normally knows up front that they are part of the process (like when Mountain Dew releases 4 different flavors of soda and says that after 3 months they will discontinue all but 1 flavor and then they ask you to go to their website and vote on your favorite one. The one with the most votes stays and the rest are out. That is an example of crowdsourcing).

    So, Google puts your site in one position, then replaces it with somebody elses site. They keep track of the clickthrough rate when your site was in X position and when the other site was in that same position. After X amount of time, that "soft data" that was just provided to Google via the users of their search engine (ala clickthrough rates) is turned into variable Q and variable Q is fed into the algorithm as a newly formed bit of "hard data" (along with the rest of the actual "hard data"). I think there is a lot more than just clickthrough rate though. I think it is time spent on site after clickthough, pages viewed, which pages were viewed and where they go after they leave your site (if they use the back button and show back up at googles SERPs that is).

    Think about this. What happens when a user shows up at an MFA site? More importantly, a REALLY poorly set up MFA site. One that is designed to get the user to leave the page as soon as possible via one of their AdSense ads? That user leaves! Where do these sites typically end up? The Google trash bin. You see, you just simply can't fake the kind of data that a real, quality site would generate.

    You can only get people to click your link in google more with an appropriate fitting title. Once they get to your site, you can only keep them there with real genuine content. Once they stay at your site you can only get them to come back (and not hit the back button for somebody elses results) with content that makes them want to come back. I believe Google has added a portion, or all of...or more of these bits of "soft data" to their algorithm.

    That is (in a nutshell) what I think the Google Dance is for. They are split testing / crowdsourcing as part of what is going into how they provide their SERPs. THIS COULD NEVER BE GAMED! (I mean, it could, but it would be very, very, very difficult). It would also be always changing. What the "crowd" deems as the most relevant one week could easily be dumped as ridiculous the next.

    Google SandBox:
    This is typically a much more drastic change than the aforementioned "dance". What a site owner normally experiences here is what appears to be a complete dissapearance from the SERPs. However, upon further inspection, they find that they aren't actually de-indexed. Just moved from their former 1st page position to page 50 (or wherever...just far back enough that nobody would ever see it). You'll even find that Google is still indexing your newly created pages, keeping track of your newly created backlinks, basically other than your new (and horrible) SERP position, everything else is "business as usual". Why? Why would Google suddenly throw you to the dogs, but still continue to do all the work of crawling, indexing & checking all those variables?

    Again, I think Google is Using "Crowdsourced Data" to determine your position!

    Above, that may have made sense, as you were in their SERPs where lots of people would be clicking your links so they could easily do a "split test". But you might be wondering what I think they could possibly be basing "crowdsourced data" on if they have removed you from their SERPs (or put you somewhere that may as well be considered removed). Well, if your page was popular enough and had enough backlinks to get on the first or second page of Google in the first place, then surely your site must be able to survive for a couple weeks without Google traffic directly.

    I mean, if Google removed Amazon.com from their SERPs would people stop visiting it? Would people stop linking to it? Would people stop talking about it? Of course not!

    I believe they have the "SandBox" as a way to tell if your site has all the statistics that a front page "site" deserves because they were gotten legitimately or if they were illegitimate.

    You see, if they were illegitimate (IE - You were the one doing all the work to make it look like a site that everybody "talks" about) then when you are removed from the SERPs, everything should theoretically come to a halt. As a site owner, you're not going to put more work into promoting a site that has been what appears to be de-indexed. So no Google, no backlinking, no talking about your site, no facebook links, no nothing. Most importantly, no traffic. Everything stops. Since you made the site with the intentions of gaming Google in the first place, then your only REAL source of traffic is going to be Google. So when you get SandBoxed, I believe that the absolute BEST thing you can do, is to make absolutely sure that you don't stop doing what you were doing before you were SandBoxed. What's tough to keep going is the traffic.

    However, if how you got to the first page through legitimate means, Google SHOULD be able to remove your site from the SERPs (or move them to page 5000) and still see activity on your site. They should still see backlinks, people talking, facebook comments, etc, etc, etc. Granted, there will obviously be a drop in stats, but everything SHOULDN'T come to a halt.

    Again, imagine that Google removed YouTube from their SERPs. Think everybody would just stop talking about YouTube? Granted, that is an exaggerated example, but I believe the basic premise still applies.

    In the end, I think Google has done something that every other company in the world has been doing for awhile. Crowdsourcing (at least that is what I would call it in it's most basic form). They just figured out a way to do it virtually.

    It makes perfect sense. You let the crowd choose whats best and the crowd is always happy. You also can't "game" the crowd.


    To sum this up in the simplest way I can put it is this. Imagine Google turning it's SERPs into a giant version of Digg (or any social bookmarking site) except the data that is fed into it isn't via a user clicking an "I like this button" but through metaphorically saying "I like this site" by doing what any user does when they actually like a site...USING IT.

    The business model works. Look at Digg, YouTube, FaceBook, Twitter, Wikipedia...it simply works. What's most relevant is what the users SAY is most relevant. However I think Google is working its way towards making the user tell them what's most relevant without actually saying anything (because again, that is "gameable").


    I think there will be BIG changes in how you rank in the SERPs in the future, and I think that change is going to come through "CrowdSourcing".


    EDIT: After some conversation ensued below with some other BHW members, I decided to do a little research. As it turns out, Microsoft has already placed a patent on what they are calling (it's the patents name): SYSTEM AND METHOD FOR SPAM IDENTIFICATION portions of which are very similar to what some of my opinions are above. This is simply a coincidence, as I had no idea this existed prior to writing everything above.

    You can view the entire legal patent Microsoft file here: http://appft.uspto.gov/netacgi/nph-P...DN/20100100564

    Here is just one very interesting excerpt from it (out of the many I could post):


    The above means that a site's spam rating will decrease (and therefor it's ranking will increase) if "many users" visit the individual result. Pretty much exactly one of the things mentioned above. If you don't mind spending a half an hour reading through some technical stuff, you'll find that patent Microsoft filed very interesting. It holds some very insightful information on what direction search engines are headed.
     
    • Thanks Thanks x 30
    Last edited: Jun 14, 2010
  2. callmelucid

    callmelucid Regular Member

    Joined:
    Feb 15, 2009
    Messages:
    487
    Likes Received:
    446
    so theoretically, it comes down to attracting people to your site and consistent backlinking ;)

    i think this is a very good theory and it makes sense.
     
    • Thanks Thanks x 1
  3. Ramsweb

    Ramsweb Senior Member

    Joined:
    Mar 31, 2010
    Messages:
    1,121
    Likes Received:
    658
    Occupation:
    Internet Marketer - Self Employed
    Location:
    In front of my PC
    This is a pretty nice write up. I have experienced something similar seeing the top 10 being constantly changed around by Google. Barring the top two results, there has not been much consistency.

    One result that was on the bottom of the first page with an attractive Free **** has been moving up slowly and is now sitting at 4. If your theory was right, they have been getting a good click through rate because of the word free and Google is moving them up to cater to public demand.

    Crowd sourcing does not really make a lot of sense for results after the first page though.
     
  4. Ramsweb

    Ramsweb Senior Member

    Joined:
    Mar 31, 2010
    Messages:
    1,121
    Likes Received:
    658
    Occupation:
    Internet Marketer - Self Employed
    Location:
    In front of my PC
    Sorry, deleted the double post, browser hung up.
     
    Last edited: Jun 14, 2010
  5. bo2kmm

    bo2kmm Registered Member

    Joined:
    Oct 25, 2009
    Messages:
    70
    Likes Received:
    14
    I don't agree with the part about Google sandboxing to see if a site is real or not.

    For two reasons:
    1) Sandboxing happens all the time to sites that are ranked on say page 5, suddenly drop to page 50. Even on page 5 they would be getting bugger all clickthrough, certainly not enough data to make a valid comparison between how they did at 50.

    2) Unless the site has Google analytics installed, how do they know how many people are visiting the site after being sandboxed?

    Matt
     
  6. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    No, it doesn't. They will always have to have a blueprint in place that does rely solely on "hard data" (title tags, anchor text, meta tags, H1 tags, etc, etc.)

    However, if they notice that more and more users are clicking through to page 2 or page 3, they'll know that they aren't serving up the most relevant results on page 1 anymore and all of a sudden...BAM! Google Dance!
     
    • Thanks Thanks x 1
  7. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    1) Yes, you're right. But that also implies that I'm saying that the only way you get sandboxed is for the reasons above. I'm sure there are plenty of others, but the main point behind what I've typed out above is that I believe that Google is moving closer and closer to a "you tell us" instead of a "we tell you" whats relevant business model.

    2) How does Compete.com know you're getting traffic? How do any of those sites out there that keep track of traffic stats know? I'm sure you've heard of Alexa rank before.

    There are plenty of ways to know if a site is getting traffic even if that traffic isn't coming from Google directly.
     
    Last edited: Jun 14, 2010
  8. davidzh

    davidzh Newbie

    Joined:
    Jul 24, 2009
    Messages:
    40
    Likes Received:
    5
    I've been thinking more or less the same thing. However, it gets complicated. Suppose you have a news site, where people go see an article and leaves, you'll have a very high bounce rate even with relevant results.

    I think what really matters is the return rate. Does the user go to your site from google, hit the "back" button, and go to another site? Does users do this more often on your site than on other sites competing with you?

    If so, I believe the site will get f*cked very quickly.
     
  9. Sylas

    Sylas Junior Member

    Joined:
    Mar 26, 2010
    Messages:
    140
    Likes Received:
    20
    Occupation:
    Wholesaler
    Location:
    Shanghai, China
    Very interesting theory. It makes sense to consistently backlink even after being sand-boxed. That way, you show Google that the down ranking of your site in the SERPs was inconsequential and that ultimately you don't need them.

    In theory, the algorithm would then correct itself and "unsandbox" you as it realizes it was mistaken in assuming you were trying to game it.
     
  10. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    If they went to the site to read the article, and the article is relevant, they may have a high bounce rate (because the user reads the article and then leaves), but the time spent on the site will also be much longer than if they thought they were going to read one article but instead got another, or did get an article on the topic they wanted but when they got to the site they found that it was poorly written.

    What's interesting, is everything would be relative and that relativity balances everything out. That is what I mean by "soft data". They wouldn't have it "hard coded" into the algorithm to say "if the bounce rate is X then it isn't as good as if it had a bounce rate of X". It would be something along the lines of "if bounce rate is X% different than the bounce rate of all other results on the SERP for X phrase than it isn't as good as if the bounce rate were X% different".

    What I mean is, if users are searching for a news result and those users only want to read an article, the bounce rate is going to be high no matter what site they clicked on. So what it would come down to, is the length of time spent. What would really put a site head and shoulders above the other first 10 results on the serps page where all 10 have a higher bounce rate than average would be if the user spent a long time AND browsed the site.

    See what I'm saying?

    I couldn't agree more with this. I think that would be one of the largest determining factors of a "crowdsourced" search engine. In fact, I think this is one of the things that they are already using and keeping track of during a "Google Dance"
     
    Last edited: Jun 14, 2010
  11. bo2kmm

    bo2kmm Registered Member

    Joined:
    Oct 25, 2009
    Messages:
    70
    Likes Received:
    14
    Agree with the above to some extent. What is pretty much a known fact though (or maybe i'm wrong) is that things like compete are only a rough estimate, and alexa even more so, often being downright wrong.

    Unless Google's tracking technology is far superior to Alexa's, yet tracks a much bigger sample of sites than Compete, they aren't going to have data that is of much value. Obviously Google could have, but you would think we would have at least heard something of it Google had a project that rivaled Alexa in scale and Compete in accuracy.

    Matt
     
  12. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    I couldn't agree more with you about some of those sites being downright wrong. But that doesn't mean that in terms of relativity they aren't useful.

    Imagine a sites "real" Alexa rank would be 100,000 if everything were correct. However, because of errors, it's Alexa rank is 50,000. That data means that Alexa thinks that the site is getting more traffic than it is.

    If that site were to lose traffic, Alexa would still update as that site losing traffic and update it's rank accordingly. So, now it's Alexa is 75,000 (and if it were accurate it would be 125,000). That is a loss of 25,000 positions regardless of whether the rank number assigned to it is incorrect.

    It's the difference between the original rank and the updated rank that matters, not the rank itself.

    Visit this URL: EzineArticles Compete Rank and Data

    Imagine there weren't any numbers on the graph. You'd see it and know that Ezine has lost traffic. It doesn't matter what the "rank numbers" are, simply that it has lost traffic. That much is at least accurate.
     
    Last edited: Jun 14, 2010
  13. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    @bo2kmm -> I am admittedly working on a lot of assumptions in my OP and I know that what I said is by no means an absolute fact. What you're saying has just as much legitimacy as mine.

    I'm just saying that I believe that Google is definitely moving towards a more "user generated" SERPs page than a "X + Y + Z = SERPs" SERPs page.
     
  14. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    @bo2kmm -> Also, there are a number of things that Google has at it's disposal and you can typically find at least one of them on almost any site on the web:

    Google Analytics
    Webmaster Tools
    AdSense

    All of the above are capable of tracking the statistics from any given site.
     
  15. albertb

    albertb Registered Member

    Joined:
    Nov 18, 2008
    Messages:
    53
    Likes Received:
    9
    Google might have a bigger sample than Alexa and compete. They have the google toolbar and chrome browser to track user browsing. And think of all those sites using Analytics or Adsense. The also track which result you click on in the SERPS. They make this clear in their privacy policy.
     
  16. crazyflx

    crazyflx Elite Member

    Joined:
    Nov 9, 2009
    Messages:
    1,674
    Likes Received:
    4,825
    Location:
    http://CRAZYFLX.COM
    Home Page:
    Here is something that is also really interesting. According to Wikipedia (another site based on the "crowdsourcing" model), the pitfalls of crowdsourcing are as follows:

    * Added costs to bring a project to an acceptable conclusion.

    * Increased likelihood that a crowdsourced project will fail due to lack of monetary motivation, too few participants, lower quality of work, lack of personal interest in the project, global language barriers, or difficulty managing a large-scale, crowdsourced project.

    * Below-market wages.[20], or no wages at all. Barter agreements are often associated with crowdsourcing.

    * No written contracts, non-disclosure agreements, or employee agreements or agreeable terms with crowdsourced employees.

    * Difficulties maintaining a working relationship with crowdsourced workers throughout the duration of a project.

    * Susceptibility to faulty results caused by targeted, malicious work efforts.

    All of the above only apply if the "crowd" being "sourced" know they are being "crowdsourced". If you didn't know you were being "crowdsourced" and were doing the work just by doing something you WANTED to be doing (I.E. - Finding something you want on the web) there would be virtually no pitfalls to crowdsourcing.
     
  17. albertb

    albertb Registered Member

    Joined:
    Nov 18, 2008
    Messages:
    53
    Likes Received:
    9
    They do change the ranking factors so that no one can completely figure it out. The theory about crowdsourcing does make sense and I've been thinking along the same lines since I read that google toolbar and chrome spy on your browsing history by default. Making use of the data provided by the users would be a good way to keep their results relevant. They might have enough people to spy on by now to implement this into their algorithm.

    They keep saying that you will do better in the results with good content. However, googlebot is a machine and even though it might be a very powerful machine it can't process natural language. The quality of the document it calculates is going to depend on factors such as keyword density, lsi keywords, document structure, inbound links etc and black hatters have shown that these factors can be gamed. By observing human behaviour you can get a good indication of the quality of the page. They might just be doing this.

    But if this theory checks out, they are wrong in thinking that users can't mislead them. Think about all those stupid people out there using the computer. Instead of reverse engineering the search engine algorithm, you've just got to social engineer the people.
     
  18. MarketerMac

    MarketerMac Regular Member

    Joined:
    Oct 26, 2009
    Messages:
    247
    Likes Received:
    101
    When you install the Google toolbar, it asks if you would like to 'submit anonymous usage statistics', which I believe is on by default. Those 'usage' statistics are no doubt your surfing habits, search history, etc.

    Also, if you'll remember, Google had a fairly large relationship with Firefox. I'm sure most people would argue it was to make IE loose marketshare (which in all honesty I'm sure it was) but what about the data that firefox collects? If you don't think that was a part of the deal then you're kidding yourself.

    Google also purchased the rights to use the analitics by purchasing Urchin for an 'undisclosed' amount of money. They then took this purchase and made it free to use. Why? To collect data of course.

    There is a much longer list:
    http://en.wikipedia.org/wiki/List_of_acquisitions_by_Google

    But if you look at it, excluding mobile stuff, google most often buys:
    Other types of search engines (their core business, so...duh)
    Analytics Packages
    Advertising Platforms

    So, if you put their toolbar, browsers (chrome as well as their relationship with FF), their own search engine and acquisitions, the own advertising platform (adsense) and those they have bought, together with their enormous web properties they have bought (youtube, blogger, etc.) I'd think that of everyone, google has the best picture of how traffic moves about the internet. Not to mention they are a multi-billion dollar business - if you think they can't afford to buy data from alexa, compete, and who ever else they decide they want it from, you are wrong.

    And since they've spent all this money on putting together the data, they'd be foolish not to try and use it to improve the results on Google.com.

    Now to bring it all together, what they have is a massive amount of data, and if I was google I would put it into perspective. Each search is worth $X. Each customer (or searcher) has a lifetime value of $Y. Each customer has a set of expectations from search 'Z'. Now if 'Z' becomes more expensive to maintain then $Y, google is in it to make money. And there is no way that every possible search query is worth that much effort on googles part (they would loose money) but it would stand to reason that the more popular search queries (who know's what this arbitrary number could be without having all the data) would get more attention then others when putting together all this data they have.

    Anywho, that's my 2 cents. I'm absolutely certain you are correct and google is (and has been, for a number of years) moving towards this crowdsourcing option. I just don't think it's financially feasible to use it for every possible query on their SE because of the sheer amount of data, but that's just my opinion. At the end of the day, it doesn't change the process of promoting and it certainly won't effect any projects that are just short-term - in fact it might help them because google will want to 'split-test' the new content. So you just have to rethink about how you build your projects and keep creating new ones for google to try out and make you a buck.
     
    Last edited: Jun 14, 2010
  19. MarketerMac

    MarketerMac Regular Member

    Joined:
    Oct 26, 2009
    Messages:
    247
    Likes Received:
    101

    Sorry, had to quote that because it made me laugh as I thought about it.

    If I had to put money on which was 'smarter', the average person or google's algo, I honestly think I'd put my money on Google. No idea how you'd compare the two, but who here would disagree with me?
     
  20. nicefirework

    nicefirework Newbie

    Joined:
    Aug 27, 2009
    Messages:
    19
    Likes Received:
    2
    wow... very good analysis. It's mean lets start hire 1000 people in microworker as example to search your site with specific keyword you target and suddently you become first place in SERP? if this work, marketer can buy the crowd? what do you think guys...