What's the best way to determine if traffic my site is receiving is from human visitors or bots? Are the numbers AWStats puts out on my c-panel accurate and good enough or are there any programs/sites I should be using instead?
User-agent is most common method. On my websites I flag any sessions and IPs that request robots.txt I also flag any ips that create new sessions with each request and any sessions that have a huge volume of requests that just grotesquely defy the averages. Like in my online stores... it is possible that someone might look at 500 item detail pages, but 2000??? Thats a bot. The average is 4. Some people will consider no image,css, and js file loads a bot, but it labels text only browsers of the blind and headless unix systems as bots too. Some people look at the time on page between requests. If it is very short for too many requests in a row you can flag the session. I'd bet the stats are fairly good for the bots AWStats detects but if you think people might write a bot to steal your content or integrate with your services then they could spoof and trick AWstats and other packages into grouping them with visitors. The easiest way to trick logging packages is to use macro software to pilot IE or to write a VB program to pilot IE which is how most macro software is written. The real questions need to be what do you need to protect and what options do you have to protect it? Sometimes just displaying data in graphics instead of text will make your site more trouble than its worth to bot writers.
Google's bot will visit every page unless it finds reasons to stop like the server response time getting slow, or finding the same content over and over again which will make it think it is in an infinite loop where the URLs keep changing but the pages don't. MSNbot doesn't use compression... If you have limited bandwidth and not much traffic from bing you can block them to save your monthly transfer for customers... In many cases bots try to influence ratings and post comments and scrape out content and stuff like that... Those bots are written by people who are like you and me and the rest of BHW. Some bots only request a single page like a ping... Its usually the crawl everything bots that use up resources when they are limited. It usually the "sneaky bots" that mess up your data, throw off you're conversion rates, and try to steal your content. bot usually just means its not a human visitor and it is there to perform a task... benign or EVIL