An SEO Statistical Project I'm Working On

Discussion in 'White Hat SEO' started by Diophantus, Aug 29, 2014.

  1. Diophantus

    Diophantus Junior Member

    Aug 20, 2014
    Likes Received:
    Survey Draftsman
    So, today I came up with an idea. I am uncertain of its utility as an SEO tool, but in any case it will serve as an academic exploration into keyword research

    here's the plan:

    - Pick a niche, any niche. Brainstorm and come up with as many sub-niches as possible. it stands to reason that the smaller your niche becomes, the longer your key phrases become.

    -Come up with several keywords and key phrases, starting with one word and going up to 5 or 6. A nice smattering of each will do quite nicely. The more the better.

    - this is the painstaking part: Perform competition research, taking into account several different indicators. Each keyword will get its own investigation for each indicator. For example I'll do research on the keyword "basketball". I'll check the top ten list of Google, looking up each site's Backlinks, etc.

    - Compile this information in a spreadsheet and create a graph. The independent variable (y-axis) will be the competition indicators, and the dependent variable (x-axis) will be the keyword/keyphrase word count.

    We know that this scatter plot should conform do something that resembles a Pareto, or Power Distribution (with the head, body, and long tail keywords). It should look something like this:

    Keyword Vs Competition Graph.jpg

    Then, what we do through trial and error, is find two power functions that encompass as many as the data points as possible on the top and bottom. We "squeeze" them together as far as possible, keeping as many of the data points inside as possible. Most of the data points should converge upon a particular line like the black line, so we should be able to squeeze it in as far as possible with little consideration to the outliers (the crazy random data points don't don't really correlate too much.)

    Keyword Vs Competition Graph.jpg

    Now, these two red lines represent functions which we can find the mean. The mean would be represented by the black line in the middle, which would be another function.

    This mean would represent your Competition Vs. Keywords. At that point, you could simply plug in your keyword word count, and out would come an average indicator of Competition for your keyword in your niche.

    This would work out better and become more accurate the more data points you have.


    With a sufficiently large amount of keywords and sub-niches covered, you could find the approximate competition of any key phrase based purely on the number of words in it. This would be custom tailored to your particular niche and sub-niches.

    Why would that be really nice to have? You could simple come up with keywords and have a good idea of what your competition looks like without doing a great deal of research on it. The leg work at first would be the hard part, but after that you could just plug in the number of words, and get going on using that keyword.

    If you had the money, you could set it up to where you outsource this enormous amount of legwork through Amazon's Mturk or something similar. Then you have a fantastic (approximate) competition checker without delving much deeper into it.

    All because of statistics :)

    Of course, this entire plan was a brainstorm, so I'm open to ideas and constructive criticism.

    EDIT: Oh, forgot to add that one of the competition indicators would be SEARCH VOLUME. That's an important one to have.
    Last edited: Aug 29, 2014
  2. SEO Power

    SEO Power Elite Member

    Jul 14, 2014
    Likes Received:
    Self employed
    Houston, TX
    Not a bad idea.
  3. Reeshua

    Reeshua Power Member

    Jan 6, 2014
    Likes Received:
    The x-axis should have the independent variable while the y-axis should have the dependent. Also, how about we use DA, PA, PR, TF, CF, plus the search volume you mentioned for the competition indicators?

    Lastly, you have to come up with a formula that combines all those metrics so you can easily plot it on the x-axis. For example Nick Flames formula for choosing an expired domain goes like this: (PRx10)+PA+DA+TF+CF > 80. It's a formula derived from experience I believe.
    Last edited: Aug 29, 2014
  4. Reyone

    Reyone Elite Member

    Sep 30, 2012
    Likes Received:
    "EDIT: Oh, forgot to add that one of the competition indicators would be SEARCH VOLUME. That's an important one to have."

    I think that seals the deal boy; search volume is the singlemost accurate competition indicator.

    New york plumbers has very few searches, so it must be really easy to rank...

    Right, on a serious note, search volume has nothing to do with competition; tons of kws with loads of searches bring NOTHING, while other keywords with 40 SPM might bring you 10s of thousands.

    I would personally advice you against this type of project as metrics are in constant change and the elements in the algos are as well in constant change.