ScrapeBox Link Extractor failing - Thoughts?

Swave · Jan 3, 2019

Hi All,
I'm trying to run the SB Link Extractor but when I do, I keep getting "Read timed out" and "HTTP/1.1 403 Forbidden" errors.

I have the Link Extractor set to "external links" because I'm targeting very specific Groupon pages to scrape the company websites (external links).

For example: hxxps://http://www.groupon.cxm/deals/sky-zone-metairie-1

Once I get the company websites, the plan is to use the SB Grab/Check to grab the emails from these sites.

* Proxies: Using StormProxies (backconnect rotating proxies)
* I have all the Connections, Timeouts, and Harvester settings at their default values.
* Internet connection: 300 mbps down/ 30 mbps up

Anyone have any ideas of what is causing these errors or suggestions for settings?

710fla · Jan 4, 2019

You using the 3-minute rotating proxies or every HTTP request?

Swave · Jan 4, 2019

710fla said:
You using the 3-minute rotating proxies or every HTTP request?

I'm using the Main Gateway proxy and I have my threads set to 10 (which is only 20% max of my full proxy account). They don't want us to use the 3-minute proxies for scraping.

Also, I received this feedback from the ScrapeBox support team (just as an fyi in case anyone else is trying what I'm trying)....

"Hi

Groupon is denying access, hence the 403 error. Perhaps due to the user agent or something else.

The timeout is just likely due to storm proxies. does the timeout error happen without proxies?

Edit: I did actually try to build this into the custom data grabber and even tried adding a custom user agent and groupon just ignores the request like it doesn't exist. They don't even send a HEAD response. This behavior would also cause a timeout.

It seems groupon is working very hard to make sure their data is not scraped. So my assumption is that they are looking for some javascript to execute or basing it on some other metrics and otherwise either denying access or otherwise ignoring it.

So you can experiment with some different things, but it may not be able to be scraped with scrapebox.

Regards, ScrapeBox and ScrapeJet Support."

I don't know if it will be a waste of money or not, but I'm having someone from fiverr who is supposed to be well versed in regex see if he can write something that will work just as a second opinion to what the ScrapeBox support team has found. If that doesn't work, I'll probably have to hire a VA to manually collect the data.

ScrapeBox Link Extractor failing - Thoughts?

Swave

Newbie

710fla

Elite Member

Swave

Newbie

Main Menu

Marketplace

Making Money

BlackHat World