1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

ScrapeBox Link Extractor failing - Thoughts?

Discussion in 'Black Hat SEO Tools' started by Swave, Jan 3, 2019.

  1. Swave

    Swave Newbie

    Joined:
    Apr 11, 2014
    Messages:
    29
    Likes Received:
    7
    Hi All,
    I'm trying to run the SB Link Extractor but when I do, I keep getting "Read timed out" and "HTTP/1.1 403 Forbidden" errors.

    I have the Link Extractor set to "external links" because I'm targeting very specific Groupon pages to scrape the company websites (external links).

    For example: hxxps://http://www.groupon.cxm/deals/sky-zone-metairie-1

    Once I get the company websites, the plan is to use the SB Grab/Check to grab the emails from these sites.

    * Proxies: Using StormProxies (backconnect rotating proxies)
    * I have all the Connections, Timeouts, and Harvester settings at their default values.
    * Internet connection: 300 mbps down/ 30 mbps up

    Anyone have any ideas of what is causing these errors or suggestions for settings?
     
  2. 710fla

    710fla Jr. VIP Jr. VIP

    Joined:
    Aug 25, 2015
    Messages:
    1,861
    Likes Received:
    949
    Gender:
    Male
    Occupation:
    Marketing
    Location:
    FL
    Home Page:
    You using the 3-minute rotating proxies or every HTTP request?
     
  3. Swave

    Swave Newbie

    Joined:
    Apr 11, 2014
    Messages:
    29
    Likes Received:
    7
    I'm using the Main Gateway proxy and I have my threads set to 10 (which is only 20% max of my full proxy account). They don't want us to use the 3-minute proxies for scraping.

    Also, I received this feedback from the ScrapeBox support team (just as an fyi in case anyone else is trying what I'm trying)....

    "Hi

    Groupon is denying access, hence the 403 error. Perhaps due to the user agent or something else.

    The timeout is just likely due to storm proxies. does the timeout error happen without proxies?

    Edit: I did actually try to build this into the custom data grabber and even tried adding a custom user agent and groupon just ignores the request like it doesn't exist. They don't even send a HEAD response. This behavior would also cause a timeout.

    It seems groupon is working very hard to make sure their data is not scraped. So my assumption is that they are looking for some javascript to execute or basing it on some other metrics and otherwise either denying access or otherwise ignoring it.

    So you can experiment with some different things, but it may not be able to be scraped with scrapebox.

    Regards, ScrapeBox and ScrapeJet Support."
    I don't know if it will be a waste of money or not, but I'm having someone from fiverr who is supposed to be well versed in regex see if he can write something that will work just as a second opinion to what the ScrapeBox support team has found. If that doesn't work, I'll probably have to hire a VA to manually collect the data.