Lately I've been toying with the idea of building my own private crawler. Designing and building the data storage and crawler architecture is not a problem for me. But, I am wondering if this is a futile endeavor due to bandwidth costs alone. I would love to put this on a cloud platform. Most cloud platforms seem to offer "unlimited" in-bound bandwidth (Amazon EC2, Windows Azure and HP Cloud for instance). Has anyone tackled this problem? If you start consuming a couple hundred terabytes a month of inbound bandwidth, do they honor the unlimited in-bound statement?