1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Insane SEO Theories Post 1: Linkwheel Detection & Link Pyramids

Discussion in 'Black Hat SEO' started by dannyhw, Dec 12, 2010.

  1. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    I never see threads on here about really reverse engineering what's going on with the algorithm. It's really important that people do this. In SEO we need pioneers who are willing to tackle the really hard questions about what's going on.

    I hope we can start a whole new level of SEO discussion here and I'd like to help kick it off.

    DISCLAIMER: I SUSPECT THAT THIS IS ACCURATE. IT MAY NOT BE. FIND A FLAW IN IT OR PRESENT A BETTER THEORY. IT'S SCIENCE TIME.


    ARE WE READY FOR HARDCORE SEO? LET'S DO THIS!


    I always hear that link pyramids work better than link wheels. Usually there's no real explanation, other times it'll be some vague idea about how google sees the shape and that it's better to be chaotic. Nice things to consider but this explanation just isn't good enough. It's our job to reverse engineer this algorithm people. SO LET'S DO IT. Now remember this is just a theory and I'm not saying this is the truth, I'm just saying I can't find anything wrong with it and it's the simplest explanation I can think of. We've got important work to do and we have to start somewhere.

    So what's the difference between link wheels and link pyramids? It's the whole idea of the wheel. It's the fact that properties on the inner wheel not only link to our money site, but to the next property. You get an actual loop of PR transfer flowing through X different pages, and all this juice is also passed to the money site. Great, right?

    [​IMG]

    Yeah great for us but bad for Google! Juice just isn't supposed to flow that way. It's too perfect. It's pure mathematical exploitation! Different sites! Different IP addresses! What will they do? Somehow, they did something.

    When Google actually succedes in making a popular technique less effective, something important has happened. This is where blackhats get to work. The tricks we exploit like blog comments and forum profiles seem like they should be easy to fix. It turns out they're not. Years ago Google briefly completely devalued links from pages containing words that could potentially identify spam, like "guestbook". Guess what, guestbook spam still works. The engineers are brilliant, but it turns out that's a REALLY hard thing to prevent. So why were they successful with link wheels?

    The link wheel takes such careful planning to do right that you'd expect Google to never pick up on it. As it turns out, it's incredibly simple and can be done efficiently using simple graph theory. If you know a little about graph
    theory you can think of each property in the link wheel as a node. The shape that gives a link wheel its power is known in graph theory as a cycle. A cycle is a path between a number of nodes where they connect in a closed chain. Wherever you start, you will end up. Here's a lovely one!

    [​IMG]

    That's the same kind of cycle exploited by the link wheel! Unlucky for us, Google is pretty good at drawing circles.

    Wait, so Google can detect and devalue these crazy structures and not some spammy blog comments or forum profiles? Yes, because this is just a simple algorithm dealing with a data structure all computer science students are familiar with.

    Ask a second year student to write a program to follow paths from a node on a directional graph and identify any cycles. It's easy and takes very little processing power. All the computer has to do is traverse all possible paths, which is simple (I'll let you look it up), especially with a directional graph. If at any point the computer ends up back at the starting node, all the nodes on that path are marked as part of a cycle. As far as Google is concerned, each one of these filthy nodes had to agree to be a part of the cycle and they will all be shown no mercy. In fact, the penalty can be incredibly severe without any real risk of negatively affecting a legitimate link profile. How likely is it that one of these will occur by chance? It's almost impossible. It gets even less likely the more nodes the cycle connects.

    So ok it's actually easy, the engineers had no problem once they figured out what was going on. Now ask that same engineer to reliably identify a guestbook that's vulnerable to spam. He'll probably do a pretty good job, but it will involve lots of text parsing and you'll want to use enough criteria to avoid any false positives. That's a whole lot of instructions for the processor to handle, not to mention memory. Now, repeat this process to identify every possible link spam opportunity and devalue the links. It turns out it's not a realistic idea at all. In fact, it's pretty wacky that they actually tried it back in '02. This sucks for Google, because this is what blackhats live to discover. Next time you blast some profile links, you don't have to wonder "How or why is this possible?? How could those stupid engineers allow this??". Instead you get to think "HAH HAH HAH WHAT ARE YOU GOING TO DO ABOUT IT GOOGLE???".

    OK that got a bit off topic, but it's a point that every real blackhat should know. Back to these link structures. Why are pyramids the big thing now? Obviously someone tried it and it worked well. Not only that, but it continued to work well. How is it that Google killed the magic wheels so easily, yet pyramids are still going strong?

    I WILL TELL YOU.

    A pyramid is that nice triangle shape everyone likes. It's a symbol of strength and it's a lot simpler than a stupid wheel. You go in layers instead of trying to form all kinds of convoluted shapes. But when you think about the pyramid
    as a graph, it doesn't matter that it fits nicely in a triangle. Actually it can be visualized as a circle. ACTUALLY, it's the same as the wheel without the edges that make the actual wheel.

    [​IMG]
    A nice shape, but WHERE IS THE DIRECTED GRAPH?

    Anyway, with a pyramid your nodes form what's known in the math world as a directed acyclic graph (DAG). There's definitely a special name for this specific type of DAG but I have no clue what it is and I really don't care. Sure, the structure of the pyramid and the levels of links is important, but what's special about the DAG is it's a stealthy little math creature. Here's what makes it so great.

    1. By definition, there will be no cycle in a DAG.
    2. Any number of DAGs can be formed by traversing the graph originating from any node in a link profile. This applies to all link profiles without exception.

    So that cycle detection algorithm and nasty penalty? Gone. Can any single intentional DAG link structure be detected when traversing backwards from a money site? No.

    How weird is that? It is mathematically impossible for Google to ever be able to detect a link pyramid. I'll bet you never thought about that.

    Now for the final fun thoughts! I'm sure some of you sat there and noticed that you could eliminate the cycle in the traditional link wheel and make it undetectable just by removing one edge between any two nodes. This is absolutely true! The difference is instead of one simple path from any node in the DAG to the money node, there is potentially two.

    This doesn't create any obviously unnatural path, so feel free to *ALMOST* complete what would have been your magic PR multiplying circle if you want. It's not going to have the magic powers, but it's a more optimized structure.

    OR

    Take the time you'd spend linking from one property to the next and start on another pyramid.

    THINGS TO THINK ABOUT

    Can you think of a more effective DAG linking structure? They do exist, just make sure it's a DAG!

    If you can create a better structure, is it worth it to spend the time creating with the tools available or would it be a better use of time to start a new pyramid?

    Can you design an algorithm that will help you automate your better structure?

    Can you design an algorithm to identify and devalue your custom structure?

    Is your algorithm efficient enough to be implemented realistically?

    I WANT FEEDBACK!
     
    • Thanks Thanks x 36
  2. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    You get an extra 20 points if you figured this out!

    The cycle flagging algorithm needs a restraint as to how many nodes it traverses. If this is how it actually works, a link wheel big enough will conquer all!
     
  3. albaniax

    albaniax Elite Member

    Joined:
    Aug 5, 2008
    Messages:
    1,586
    Likes Received:
    823
    Location:
    GER - ALB
    So, first of, I see you have a good analyzing mindset and I appreciate that :)

    That Guugle can detect closed linkwheels is true, and that happened to a lot of guys from what I have read here. I for myself have only a few linkwheels which are opened, and I didn't got penalized or such, but I may not be taking everything out of the link-juice.

    Therefore what I think is, how can we take the most out of pyramids?
    ..and let me ask you, what kind of links do you put where ?

    Let's say Base-Links are xRumer-Profiles. But where do they point to?
    Ok, we take one of many articles. And the article link goes to?
    Web 2.0 properties! -> From here of course to your money-site


    Would this be an possible example of each layers?!
    Should one mass commenting backlinks made with Scrapebox use as Base-Links, together with profiles?
     
  4. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    That's a whole other matter on its own and maybe people can chime in. I would say for the first layer try to use content rich pages, on authority domains is better, ******** of course.

    Then pointing to there, I guess just unleash the links. Profiles, scrapebox, other platforms. It doesn't really matter that much. The problem I see is finding places to your content to that won't can you when complaints start coming in.
     
    • Thanks Thanks x 1
  5. MarketerMac

    MarketerMac Regular Member

    Joined:
    Oct 26, 2009
    Messages:
    247
    Likes Received:
    101
    I'm not so sure that a wheel that would be big enough to escape detection would be more efficient to make then just going with a pyramid struction that you already know cannot be detected.

    I think your idea of it being 'undetectable' might be flawed though. Sure, if you have diversity in your links you could very well be untectable. But there is a limited amount of web 2.0 properties, and an even smaller amount of ones that would be worth writing on (ie get spidered regularly, have some sort of authority, etc.). This is why profiles work, because of the depth of available spaces on the internet and their diversity, they are as you said harder to detect with an insane amount of processing power.

    Great post, definately provokes some thought, however I don't see how it changes much. I'll still use scrapebox. I'll still use xrumer. And I'll still ignore article sites / web 2.0 whatevers until the above two stop working for me on a project and I absolutely have to step outside of my proverbial box.
     
  6. turdface

    turdface Supreme Member

    Joined:
    Sep 3, 2010
    Messages:
    1,268
    Likes Received:
    169
    I'm about 95% sure I've read that somewhere else before.
     
  7. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Personally, I just don't want to use web 2.0s. Maybe I'll go pyramid style some other way if the time is right but I won't invest any serious time into BH unless I've got a situation where I can rank easy without any grunt work.

    This is purely because I just was not satisfied with any explanation of what happened to link wheels. I'm obsessive.
     
  8. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    None of the information about linkwheels being devalued I could find on the big G touched on how it might actually work and the info on link pyramids wasn't that great. It's not like somebody couldn't have come up with the idea before though.
     
  9. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Oh yeah and I was kidding about the gigantic link wheel. :18:
     
  10. imperial109

    imperial109 Regular Member

    Joined:
    Jan 19, 2009
    Messages:
    499
    Likes Received:
    361
    A naturally grown site would have links just about everywhere. Blogs, article directories, web 2.0 sites, directories, etc. The problem with these structured links is that they still leave some sort of footprint that programs can identify. Take for example, SB: You post links to a couple thousand sites. Your competitors post to those sites as well. Now if both of your links are tracked, and they all come from SB, then it's going to be quite obvious that the site is being spammed, which in turn, will mark your site as being spam. Even though the theory that you can just create very large linkwheels or even pyramids is very plausible, you're still going to run into trouble because it's something that has some sort of structure. If it has structure then it has a pattern. If it has a pattern, then it has a footprint. And we know that for the most part, it's easy for programs to identify patterns, because they themselves are based on patterns. Therefore, small and open linkwheels, small pyramids, small silo structured style wheels, and any other pattern that you can think of work best, because, no matter how small a structure is, if your competitor uses it, then your footprint doubles. This is why sometimes structures work, and sometimes they don't. If you diversify your linking styles, then your chances of being penalize are greatly reduced.
     
    • Thanks Thanks x 1
  11. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Except none of that is true. See the point about 2002 when google actually tried to devalue guestbooks and failed miserably. You can still get bitchin link juice from guestbooks. Blog comments and forum profiles have been used for that long too. They know, but they can't do anything.

    The whole point is it's easy for programs to identify certain patterns. Other patterns just aren't feasible.
     
  12. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Also article directories, web 2.0 sites and directories aren't signs of natural link building. None of those are editorial links.
     
  13. Alchemical

    Alchemical Registered Member

    Joined:
    Jun 22, 2010
    Messages:
    90
    Likes Received:
    11
    Great analysis. I would add that, google could also start to see patterns in the pyramid structure, although its a lot harder to identify it. But, clearly there is a pattern there. Perhaps its good to keep in mind that whatever we do, throw in some randomization from time to time to help break up that pattern. Have some links that break the dominant paradigm, skip layers, etc. Then it becomes much harder to detect and more organic.
     
  14. imperial109

    imperial109 Regular Member

    Joined:
    Jan 19, 2009
    Messages:
    499
    Likes Received:
    361
    I'm not sure if your agreeing with me here or not. But, that's the point. Diversify your techniques, so that your patterns become less detectable. If you only submit to guestbooks, and you have a couple of thousand of those, then it's quite obvious that it's spam.

    How so? A backlink is a backlink, and people share them all the time with their friends. People write about other sites all of the time.
     
  15. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Well, in theory if they could start using on page clues to classify a link as being, say, from a profile or a comment they could identify a pattern. That's just not feasible with todays technology though. The difference between a graph analysis and something like that is like the difference between a calculator and an xbox.
     
  16. J0kerz

    J0kerz Supreme Member

    Joined:
    Nov 2, 2009
    Messages:
    1,413
    Likes Received:
    435
    Occupation:
    IM
    Location:
    There
    Linkwheels are new... yeah right!
     
  17. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    I'm not agreeing. They do not recognize patterns. Do you know how much extra processing power that would need and what potential impact it would have on the quality of their serps? This is why it pays to get an actual computer science background.

    The point I'm trying to make is that it's one thing to think about what Google should do, but it's even more important to understand what they're just incapable of. It's in this problem area that blackhat and grayhat takes place.
     
    • Thanks Thanks x 3
  18. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    Nope they're dead. Open wheels are fine.
     
  19. imperial109

    imperial109 Regular Member

    Joined:
    Jan 19, 2009
    Messages:
    499
    Likes Received:
    361
    I believe it is. You don't necessarily need 10 or 20 marks to identify a page as being a blog or a profile. It's true what you're implying that not all sites have signatures to identify them, however some widely used scripts are easy to find. For example, a lot of forums, have their "power by abc" link at the bottom. Softwares that automatically post to such sites follow a pattern to find these sites, so why can't G do the same to identify spam? Yes, softwares use a lot more resources, to identify a wider range of sites, but there is still a pattern that can be identified to some extent.

    I have to respectfully disagree with that. I'm working on my computer science degree right now, and from what I understand, it's a core aspect of an analytical program to recognize patterns. For example, if you were to take 20 pages, and look for the word "ace" and it shows up 13 times. Then you take the next 20 pages and find the word "ace" 17 times. Then 11 times. Then 5 times. That's a pattern, that a majority of the pages have the word "ace" in them. So if G reads all of the text on a page, then they can identify "powered by abc"

    Also, instead of throwing insults, we'd all appreciate it if you could enlighten us. Maybe I'm missing something since you have already completed your degree, but an explanation wouldn't hurt either. We're all here to learn, calm down.
     
  20. dannyhw

    dannyhw Senior Member

    Joined:
    Jul 16, 2008
    Messages:
    980
    Likes Received:
    462
    Occupation:
    Software Engineer
    Location:
    New York City Burbs
    If you really want to think about filtering out pages with potential for bad links, you have to consider that every page that has an outbound link has to be checked for every possible fingerprint. But even still, is it in their best interest to automatically devalue links from every page on a wordpress blog? Do you just make links not count at all on all phpbb forums?

    They tried one term in 2002 and it just backfired. They removed the penalty really quick. 2 years before that they removed poison words.

    Since then they've successfully reduced spam with every major update, and they do it in elegant ways that actually work without compromising their results. The fact that they're still letting the same old spam methods get by doesn't matter in the big picture. A few years ago bh was completely different. I mean I'd buy a domain, upload a one page script and a htaccess and it would be getting thousands of hits the next day. They've accomplished so much these scrapebox comments are nothing.
     
    • Thanks Thanks x 1