1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Safe settings for Proxy & Email Harvesting ? (Scrapebox)

Discussion in 'Black Hat SEO Tools' started by Sagat77, Jun 12, 2017.

  1. Sagat77

    Sagat77 Newbie

    Joined:
    May 11, 2017
    Messages:
    29
    Likes Received:
    2
    Gender:
    Male
    Hello guys,

    I have recently started experimenting with Scrapebox and would appreciate your kind feedback based on your experience:

    I want to be totally safe when using Scrapebox and ideally would use proxies for every of its function, but as I understand from what @loopline said in another post, there are certain functions of SB that don't utilize the proxy list, even when its active (such as proxy harvesting/testing?)

    So my questions are these:

    1. Does even something basic tasks such as proxy harvesting and then proxy testing (via google) can result in an IP ban/Google ban, especially if you are testing several 10k+ proxies at one go?

    2. Does it make sense to use a VPN on the pc that I use SB, so it will have my actual IP hidden at tasks that SB doesn't utilize the active proxy list(such as the one described at point 1) ?

    3. Can anyone recommend to me the safest "Connections, Timeout and Other Settings" parameters that I could use, for mostly email grabbing functions? Maybe just turning bars to the further left green corner should be too conservative and unnecessary?

    4. Since I want to use SB mostly for email grabbing, initially I was planning to use just free proxies. Would your recommend that? Will SB do a good job in that, or should I seek a more specialized tools such as GSA proxy scraper, or even buy some private proxies? (I'm low budget so ideally would prefer to start with just the free proxies if they can deliver the job)

    Big thanks in advance to anyone for your help!
     
  2. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,994
    Gender:
    Male
    Home Page:
    Proxy harvesting is going to get the proxies from a website, so a couple queries to a website isn't going to get your ip banned. Bear in mind scrapebox is seen as a web browser by end sites, and the entire point of a site existing is for web browsers to view it.

    Testing proxies uses the proxy its self. So scrapebox first tests with scrapebox private servers to see if a proxy has passed the anonymous test. If it doesn't pass it does NOT get tested against google. If it does pass, it gets tested against google. But that page load credits to the proxy, not your ip. Meaning google doesn't see your ip so either the proxy is blocked or its not.

    No its not necessary. Scrapebox has tens of thousands of users and has been around for more then 7 years. If it doesn't use a proxy, its because they know its not an issue. You can use a VPN if you want, but its overkill. Websites were made to be viewed and now days scrapebox does actually use a proxy for most things, if the use proxies box is checked, save a few things that really are pointless to use proxies with.

    Go to green for the bars and your probably safe. If you are email grabbing for all 1 domain, keep the connections low, like 1 per proxy. If you have a nice mix of domains, you can crank up the connections.

    No, I would not recommend using free proxies for ANYTHING except scraping from the search engines. Using free proxies for scraping emails can reduce your success by as much as 90% or more. So you may lose 90% or more of potential emails by using public proxies. You would be far better to use no proxies and 1 connection.

    You can buy $10 shared proxies for $10 or get 5 proxies on fiverr for 5 bucks and thats plenty for email scraping.
     
    • Thanks Thanks x 2
  3. Sagat77

    Sagat77 Newbie

    Joined:
    May 11, 2017
    Messages:
    29
    Likes Received:
    2
    Gender:
    Male
    Thank you mate, truly an exceptional, thorough, detailed reply. Very appreciated since I know how busy you are! (btw i also watched [and liked] almost all of your youtube videos, fantastic tutorials!)

    I will follow your advice also about the shared proxies. If you have also some specific proxy provider service you can recommend please let me know about it here or via pm!
     
  4. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,994
    Gender:
    Male
    Home Page:
    Your welcome mate, happy to help.

    I list all the proxy providers I use/recommend here:
    http://scrapeboxfaq.com/scrapebox-proxies

    I also keep that list up to date, and take into account feedback from other people etc....
     
    • Thanks Thanks x 1
  5. proxygo

    proxygo Jr. VIP Jr. VIP

    Joined:
    Nov 2, 2008
    Messages:
    15,859
    Likes Received:
    9,610
    Occupation:
    PROVIDING PROXIES FOR GSA SCRAPING.
    Location:
    BHW
    Home Page:
    i remember the days i was on recommended proxy seller list for sbox - some 5 years.
    then was struck of because i wouldnt sell to new members on bhw
    which as most sellers no is where 90% of scams come from on bhw.
    still chuckle as ive always sold that way for 10yrs on bhw but it
    only kicked in to matter after 5. the good old gays
    mornin mat.
     
    Last edited: Jun 14, 2017
  6. Sagat77

    Sagat77 Newbie

    Joined:
    May 11, 2017
    Messages:
    29
    Likes Received:
    2
    Gender:
    Male

    Thanks Mat! I checked all your sources and finally purchased 5 private proxies from myprivateproxy , all google test passed. However after scraping links for 450 keywords with Google, Yahoo, and Bing Engines together, at the middle of the road Google stopped giving results, followed a little later by Yahoo. Only Bing managed to give me almost 1k results for each of the keywords. until the very end. I kept all scraping to 1 connection at a time.

    Despite the issue,I still gathered around 50k relevant links ,to scrape for emails (this time had the email grab limit at 5 threads - to slow to go below that) . .Everything was going perfect until around 70% of the email grabbing where suddenly i was getting the error 10053 for the remaining of the links and that caused my SB to stop responding (even when i pressed the Stop button of the email grabber addon, the program was unresponsive so had to shut it down)

    Do you have any idea what may be the reason that Google and Yahoo blocked me so easily? Also do you have any idea what caused the error 10053 (on SB FAQ it says it may be a proxy error but after testing my proxies they seem to be fine) and how to avoid it again?
     
    Last edited: Jun 15, 2017
  7. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,994
    Gender:
    Male
    Home Page:
    You can't run 1 connection with 5 proxies on google, you would need about 100 proxies to do 1 connection continusouly. You need to use the detailed harvester and add a big delay. Its been about 5 years since you could get away with 1 connection and 5 proxies. google bans exponentially faster these days, followed up by yahoo.

    Also this should help



    As for that error, how many connections are you running? If you have a nice mix of domains, then you can just leave proxies off for email scraping, as long as you don't have connections too high. Else check with your provider to find out the max simultaneous connections limit per ip. Probably 10, so you can't run more then 50 connections and if you were your proxies may have been denying connections.
     
  8. Sagat77

    Sagat77 Newbie

    Joined:
    May 11, 2017
    Messages:
    29
    Likes Received:
    2
    Gender:
    Male
    So I did use the detailed harvester after all, and experimented with several delays using my 5 private proxies. With a delay of 90 second I was finally able to scrape all of my 50 keyword list via Google, maybe I could have done it with a little bit less than this but didn't want to push it too much. I'm no sure though how effective that delay will be with a much larger keyword list.

    Also, I've noticed that the delay is applied only when reaching a specific number of scraped urls (e.g. it was activated per reaching 1000 urls results via Google), however that is a bit problematic as there are instances, especially with other search engines such as Bing that only 20 links were loading per page. Therefore SB had to reach Bing's page number 50+(!!) to reach the 1000 url limit before applying the delay, which im sure that in the eyes of the search engine this doesn't seem that good.

    So my question is: is there any way to control the delay of the scraped search engine url results to be depended on how many different seach pages are loaded (e.g. apply the delay for every 10 biing seach pages loaded) , instead of being depended on a fixed number of scraped links reached (in our case 1000)? Also if there is not such an option in SB, is it aleast possible to control the url scrape limit and set it lower than 1000 before SB applies the delay?

    Lastly, for the error I mentioned in my previous post for the email scrapping settings: I was running 5 connections, but all urls were randomized. Now I am just running 2 connections only but it takes ages to do my work :(
     
  9. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,994
    Gender:
    Male
    Home Page:
    The delay in the detailed harvester is per keyword. So after each keyword, regardless of how many queries it takes to finish that keyword, thats when it delays.

    For bing don't worry, you can probably use 1 connection with zero delay or even 2 or 3 connections with 5 proxies and no delay on bing. Bing is very lax, they need a lot of views to post for their share holders. hehe

    You can run multiple instances of scrapebox as well, so using google for 1 and bing for another with different settings.

    Further you can go to settings >> harvester engine configuration - and here you can edit the delay for each engine. The delay in this setting is a delay after every query. So if it takes 10 queries to get 1000 results from google it will delay 10 times. In the case of bing and your example it would delay 50 times but agian with bing this isn't needed.

    Bear in mind that this delay WILL combine with the delay in the detailed harvester. So if you set delay 10 in the engine file for google and delay 90 in detailed harvester it will delay 10 after Each query and then delay 90 after the keyword is done.

    As for email scraping, if they domains are randomized it shouldn't matter, but if its all urls from a few or 1 domain and they are just randomized then it would cause this. I guess if it works you coudl get more proxies to make it go faster.
     
    • Thanks Thanks x 1
  10. Sagat77

    Sagat77 Newbie

    Joined:
    May 11, 2017
    Messages:
    29
    Likes Received:
    2
    Gender:
    Male
    Bing is really very kind to us, I just wished it's results were as accurate as Google's but cant have everything I guess!

    So I've tried your advice and experimented a bit, by adding several delays and see what works with Google. I did manage to get a proper link scraping with a rather small set of keywords (<200) with a general delay of 300 secs and a Google Engine 10 sec additional delay (maybe an overkill but done my work for now).

    The results from the email scrapping were also good, considering the initial URL list was 2M strong (only at the very end i was getting some connection errors). Now the challenge is how to clean up/verify my final email lists.

    So far I am using sendgrid as my email delivery service but even though my lists are supposedly targeting my service niche, I do worry for an imminent ban if there is a very large hard bounce rate on my uploaded lists.Therefore before starting the campaigns, I was thinking to additionally purchase an email validation software to clean up my lists, and so far I'm leaning towards the Atomic Email Verifier. Would you recommend it? Also can you recommend an email delivery service that has good delivery rates and is not banning accounts easily? Some people on this forum mentioned the Atomic Email Studio instead of using an email delivery service, but I'm not quite sure if using such a software may provide me equally good delivery/not spam results on my campaigns.

    Thank you for your insights!
     
  11. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,727
    Likes Received:
    1,994
    Gender:
    Male
    Home Page:
    Glad you are getting your scraping sorted.

    I have no feedback to give on sending mails. I only send opt in mails. Its just not my cup of tea.