Casper_T
Newbie
- Aug 3, 2016
- 21
- 8
Hello there,
I am seeking for suggestion on how to properly select and use proxies for my
use-case:
We are doing many scrapes of US eCommerce stores (tens of millions of requests per month) using both headless browsers
and plain requests.
Several questions I am wishing to receive suggestions on:
amounts of traffic monthly so the only options remains to rent unlimited bandwidth IP's? But am I correct that it usually comes at a cost of limitation of proxy speed? (limited concurrent requests). I already see such
signs by using current provider, we are getting many Timeouts when doing headless scrapes.
3. Abstraction layer. As we plan to be managing the pool of different proxy providers IP's we would need some sort of abstraction layer that would
allow to easily combine various proxies from different providers and track/manage/replace them. Maybe there are existing Sass or library/framework solutions
for this specific purpose? I believe that implementing custom solutions would be trying to reinvent the wheel.
4. Testing proxy liveliness and speed. We will also need to proactively test the quality of the proxies we use. Are there any guidelines on how programmatically we
could periodically test the liveliness and speed of proxies we rent and use? I heard that simple PING requests are not working for many proxy providers, so that means we
would need some sort of custom-written service that would try to visit specific page by using proxy and check if page loads?
5. Testing proxy quality. Maybe you know some sites or tools that would easily detect majority of data-center proxies and therefore
would act as a perfect place to test if used proxy is actually good quality (the residential one)? I would be so thankful for such examples.
Thank you for your help in advance!
I am seeking for suggestion on how to properly select and use proxies for my
use-case:
We are doing many scrapes of US eCommerce stores (tens of millions of requests per month) using both headless browsers
and plain requests.
Several questions I am wishing to receive suggestions on:
- Proxy type. As we are scraping eCommerce products and not doing any kind of accounts creation are we enough to use shared datacenter proxies? Or thinking about long-term we should seek for dedicated residential IP's? We are trying to balance between quality and price. Right now we are using static residential proxies for around ~3.49$ per IP (ISP's IP addresses, although I believe they are shared), we couldn't find better price for this quality in the market.
amounts of traffic monthly so the only options remains to rent unlimited bandwidth IP's? But am I correct that it usually comes at a cost of limitation of proxy speed? (limited concurrent requests). I already see such
signs by using current provider, we are getting many Timeouts when doing headless scrapes.
3. Abstraction layer. As we plan to be managing the pool of different proxy providers IP's we would need some sort of abstraction layer that would
allow to easily combine various proxies from different providers and track/manage/replace them. Maybe there are existing Sass or library/framework solutions
for this specific purpose? I believe that implementing custom solutions would be trying to reinvent the wheel.
4. Testing proxy liveliness and speed. We will also need to proactively test the quality of the proxies we use. Are there any guidelines on how programmatically we
could periodically test the liveliness and speed of proxies we rent and use? I heard that simple PING requests are not working for many proxy providers, so that means we
would need some sort of custom-written service that would try to visit specific page by using proxy and check if page loads?
5. Testing proxy quality. Maybe you know some sites or tools that would easily detect majority of data-center proxies and therefore
would act as a perfect place to test if used proxy is actually good quality (the residential one)? I would be so thankful for such examples.
Thank you for your help in advance!