I need "on page" footprints!

gheegh

Registered Member
Joined
Oct 12, 2009
Messages
56
Reaction score
2
Hey all,

I'm working on an SEO tool (which I'll share with some of you), but I need ON PAGE footprints.

On page footprints are footprints that, when you look at the source code of a page itself, you can tell. I do NOT need scrapebox/xrumer footprints, which google for particular terms to find sites.

In particular, I'm using this information to reverse engineer the backlinking strategy of a website. If done well (it works fairly well now), I can tell you whether they are using article marketing, blog comments, etc. to promote their site.

So, if anyone has a good list of On page Footprints, I'd love to see whatever you have.

Thanks,

Gheegh
 
a quick example.. if you look at wordpress, there is always a wp-content directory.. :-) Stuff like that would be great!
 
Well thats a loaded question, you could make an on page footprint for just about anything, it really depends on what you are looking for.

However bear in mind that scrapebox footprints for instance, just poll google for on page text. It has to be actually on the page for google to see it.

So if you search google for inurl:wp-content you would be searching for what you are asking.

Besides unless your going to crawl the internet for something, your program is going to be inputting your "on page" footprints into google, or some variation thereof just like scrapebox or xrumer. Else it would be pointless to have a "footprint".

Anyway, if you would be more specific of what you are looking for we can share specifically more info.

Thx,
MAtt
 
Hey Matt,

I'm writing a tool. So, like you are putting things into "scrapebox", I'm going to have them available in my tool.

The problem with Google type footprints is that they catch a lot of bad stuff, that isn't a perfect match. You can't check the header for a generator meta, for example, or look at stuff like scripts that might be a standard part of every page, etc.

This is the type of stuff I need.

I have a bunch of lists of footprints from Google. Those help. But, they don't at the end of the day, get me where i need to get to do a great job on this tool.

So, any detailed thoughts would help.

G
 
What I suggest is you do a first pass using google and pick up a ton of urls. Then you do a second deep pass where your s/w goes to every url and then looks for other types of footprint that do not show up in google search, e.g. certain tag names or form fields, or other html tags.

I have recently written something like this and worked like a charm, although slow. If you do this, make sure to cache every page, so you can run your crawler against the cache (in case you change the code..).
 
Last edited:
Back
Top