1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Any ideas to prevent getting blacklisted while scraping?

Discussion in 'Scripting' started by tounsi7orr, May 3, 2017.

  1. tounsi7orr

    tounsi7orr BANNED BANNED

    Joined:
    Apr 21, 2014
    Messages:
    180
    Likes Received:
    11
    Hello,
    I'm a web scraper (Not expert), I'm currently trying to make a bot to scrape amazon products.
    I face a terrible problem: I'm getting banned when I scrape about 300-400 pages.
    I used some fake browser agents and they worked a little better but still getting banned.
    I read that I must use proxies but it's expensive for me to buy dozens of proxies just for testing.
    Note: my scraping speed is 2500 pages/hour (It's kinda slow)
    My goal is to scrap 1 million pages in a week, I will use multithreading technique, but first how to avoid getting blacklisted?
     
  2. bl4cksta

    bl4cksta Registered Member

    Joined:
    Mar 6, 2017
    Messages:
    51
    Likes Received:
    6
    Gender:
    Male
    On that kind of bulk pages. It's good to implement nearly human behavioral bots to not get banned.My personal statement is 1 million pages too much for 1 week.
     
  3. JerryWoodburn

    JerryWoodburn Newbie

    Joined:
    Oct 23, 2016
    Messages:
    20
    Likes Received:
    0
    Gender:
    Male
    well,you are getting blacklisted obviously,because you are requesting too much pages in a small amount of time. So you have only two options
    1. do it slowly (well,yeah,its not much useful)
    2. use proxies. Why don't you try to look for a free list and try it just for a purpose of testing and then think about buying some good ones?