Cannot scrape e-mail addresses with Srapebox

PandaBusters

Registered Member
Joined
Jun 10, 2017
Messages
71
Reaction score
6
Hi,
I had success at one time, now nothing works, as I select keywords and footprints, extract many URL's successfully (but without proxies, Loopline said it's a bad idea to use proxies for URL harvesting), but produce no e-mail addresses on second harvesting or e-mail extraction, even with very low thread number. I used China Company Data at one time with slower robot speed to better emulate human spider speed, but Scrapebox seems to be controlled by thread number only.

Thank you.
 

loopline

Jr. VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
5,931
Reaction score
3,373
Website
contactformmarketing.com
Hi,
I had success at one time, now nothing works, as I select keywords and footprints, extract many URL's successfully (but without proxies, Loopline said it's a bad idea to use proxies for URL harvesting), but produce no e-mail addresses on second harvesting or e-mail extraction, even with very low thread number. I used China Company Data at one time with slower robot speed to better emulate human spider speed, but Scrapebox seems to be controlled by thread number only.

Thank you.
I think you misunderstood. using proxies for url harvesting from search engines is not only ok its good and preferred.
When scraping emails, generallys peaking, thats when you do not need proxies, so long as you keep connections low.

Do you get an error when doing email scraping?
 

PandaBusters

Registered Member
Joined
Jun 10, 2017
Messages
71
Reaction score
6
Thank you, I understand now. The question is, do I have to generate new keywords/footprints because I started with a good list, or is there a shot, although I guess impossible to find out until I experiment, that they only block the final octet and the changing IP here assigned by ISP means more success later? What's strange is that I get absolutely no e-mail addresses, before a lot, so maybe if there is centralization of ISP's to fight robots is increasing to adapt to more world robot activity. I don't know if there is a way to scrape at a slower rate like China Company Database in addition to reducing the thread count, and trying to emulate humans as much as possible is necessary. Right now I am trying to recover my URL list based on "perfect" keywords/footprints. Thank you.
 

loopline

Jr. VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
5,931
Reaction score
3,373
Website
contactformmarketing.com
Scraping from search engines is where you need to worry about blocked ips.

If you have a big mix of domains, then you don't need to worry about blocked ips. Websites don't collaborate to ban you ip, just go slow. Scraping emails just looks like your visiting a web page in a browser, and thats the entire purpose that sites exist.

You can add a random delay in the email scraper, if you like. The detailed harvester will let you delay when doing regular harvesting.
 

PandaBusters

Registered Member
Joined
Jun 10, 2017
Messages
71
Reaction score
6
The problem is, I am going to search engines (Bing, Yahoo, etc., Google blocks everything) to find the URL's scraped 2nd tier, so whereas the 2nd may not have blocked me, the SE's are not blocking me either, but somehow I can no longer scrape the e-mail addresses from the URL's, even with low thread number.
 

loopline

Jr. VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
5,931
Reaction score
3,373
Website
contactformmarketing.com
The problem is, I am going to search engines (Bing, Yahoo, etc., Google blocks everything) to find the URL's scraped 2nd tier, so whereas the 2nd may not have blocked me, the SE's are not blocking me either, but somehow I can no longer scrape the e-mail addresses from the URL's, even with low thread number.
Im really not following what you are saying. Can you give me an example?

As for the email scraping, do you get errors? Try some pages you know have mails on them, you can just use pages you used before i you want.
 

PandaBusters

Registered Member
Joined
Jun 10, 2017
Messages
71
Reaction score
6
Im really not following what you are saying. Can you give me an example?

As for the email scraping, do you get errors? Try some pages you know have mails on them, you can just use pages you used before i you want.
I get 90% 404 errors and "read time out" errors. Thank you.
 

loopline

Jr. VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
5,931
Reaction score
3,373
Website
contactformmarketing.com
I get 90% 404 errors and "read time out" errors. Thank you.
If you are getting 404 errors and read timeout errors when havesting that is probably related to proxies. are you using public proxies? This would be common with public proxies.
 

PandaBusters

Registered Member
Joined
Jun 10, 2017
Messages
71
Reaction score
6
Hi, no, this was without any proxies. Also, I bought your private proxies, but cannot load them as a file, all I get is the letter "N" in the first column, though I had IP address, port number, user, and pass in a new text file, not sure how to properly load them. Thank you.
 

PandaBusters

Registered Member
Joined
Jun 10, 2017
Messages
71
Reaction score
6
Hi,
I am also getting socket error 10060 errors, mostly time-outs, CPU usage is only 0-3%, but at one time it was 60-90% when I had success, 80% of the time the URL's time-out with about a 10 second time lapse for pages that do have e-mail addresses.
 

loopline

Jr. VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
5,931
Reaction score
3,373
Website
contactformmarketing.com
Hi, no, this was without any proxies. Also, I bought your private proxies, but cannot load them as a file, all I get is the letter "N" in the first column, though I had IP address, port number, user, and pass in a new text file, not sure how to properly load them. Thank you.

I don't sell proxies, which proxies are you talking about?

The format of the proxies should be

Code:
IP:PORT:USER:PASS

OR

IP:PORT



Hi,
I am also getting socket error 10060 errors, mostly time-outs, CPU usage is only 0-3%, but at one time it was 60-90% when I had success, 80% of the time the URL's time-out with about a 10 second time lapse for pages that do have e-mail addresses.

Go to settings >> connections timeouts and other settings >> timeouts - turn your email harvester timeout to the max. Does that help?
 

Respenzer

BANNED
Joined
Aug 24, 2015
Messages
190
Reaction score
15
you check it out tutorial too, scrape box have their own proxy
 

loopline

Jr. VIP
Jr. VIP
Joined
Jan 25, 2009
Messages
5,931
Reaction score
3,373
Website
contactformmarketing.com
You redirected me to:

https://www.myprivateproxy.net/billing/myproxies.php

where I bought 22 ip's/month, starting with:

173.245.75.139 jwood04 mQ6M9GJq 29842 San Jose, CA, USA Yes Other 136081

as an example of the first proxy.

so they should provide them in a different list format that is already setup.

So it would look something like

Code:
173.245.75.139:port:user:pass

where port is the port of the proxy. Then the user is your username and the pass is the password for the proxy.

So based on the above info, this is your proxy:

173.245.75.139:29842:jwood04:mQ6M9GJq

I did just test and it works, but you are going to want to ask My Private Proxy to replace that proxy, because now its out here on the internet for the world to see. That should be no issue.

But you can see from that format what your ip is and your user and pass and then they stuck the port on the end of it all.

Ive never seen it come in that format, you may be able to just ask them for scrapebox format or a more standard format as well.
 
Top