1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Where to run a large scale crawler and minimize bandwidth cost? Cloud unlimited in-bound?

Discussion in 'Black Hat SEO Tools' started by darkmonk, Sep 7, 2012.

  1. darkmonk

    darkmonk Regular Member

    Joined:
    Nov 21, 2007
    Messages:
    226
    Likes Received:
    52
    Lately I've been toying with the idea of building my own private crawler. Designing and building the data storage and crawler architecture is not a problem for me. But, I am wondering if this is a futile endeavor due to bandwidth costs alone. I would love to put this on a cloud platform. Most cloud platforms seem to offer "unlimited" in-bound bandwidth (Amazon EC2, Windows Azure and HP Cloud for instance).

    Has anyone tackled this problem? If you start consuming a couple hundred terabytes a month of inbound bandwidth, do they honor the unlimited in-bound statement?