1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scrapebox and emails, how do i solve these issues?

Discussion in 'Black Hat SEO Tools' started by Jonaz86, Sep 22, 2015.

  1. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Hello my fellow Blackhatters!

    I'm trying to collect emails using Scrapebox and its going well so far, but I have some questions I would like to get an answer to if any of you know, i will list them in a nice structure as follows:

    1. Is there a way to filter out domains for emails (not urls) ?
    - The problem is when im scraping emails after i've pulled out internal links i will endup sometimes with 10-20 emails (im doing b2b) from 1 single website. Is there a way to remove all AT toyota emails for example and just keep AT toyota email?

    2. Negative keyword list
    - The possibility of using negative keyword list is great but I do not see how i can use this effecienty, my last list ruined over 5000 collected emails and it took me a while to understand that out of those 5000, 4000 werent bad.. it was just the keyword list that messed things up. Negative keyword is it really reliable? Is there a better way to do this?`The problem occurs once again when you start pulling internal links.. for some reason it do not only pull internal links for me but all links from each website despite checking only internal links.

    My method is as follows: Add keywords - > Harvest - > Export list of links -> Import same list (remove duplicates and same domain)- > Pull internal links using Link Extractor - > Grab emails -> MANUALLY GO TROUGH ALL EMAILS.... yes sadly this is how i work right now.

    3. When adding city infront of each keyword i should in theory get 1000 new results and not 90% the same results for Toyota for example (dealerships etc), but still i feel that regardless of adding many extra keywords i show up with marginal bigger email list.

    Thanks a lot in advance, I hope i was clear above and like i said if anyone can help me with these issues i would be extremely delighted!
     
  2. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Anyone.......... ?
     
  3. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:
    1. Your contradicting your self, you can't remove all @toyota mails and keep all @toyota mails at the same time. So can you clarify what exactly it is you want to do? You want to remove emails that are [email protected] ? But you could just click the filter button and input what you want to filter out or what to keep.

    2.) A negative keyword list is only as good as you make it. So it just depends on what your trying to do. There is no right or wrong answer on how to use this. This is going to be a case of the tool is only as good as its master.

    3.) Well you could get a larger list, but it boils down to the scraping part here. are you getting back more unique domains? This is an easy test, just remove duplicate domains and see if the count is a lot higher. Basically Im saying there are multiple steps in the process so you have to identify at which point in the process your not getting what you think you should and go from there.

    Perhaps its the scraping, perhaps its on pulling internal links perhaps its on the email grabbing.
     
  4. TheSlug

    TheSlug Jr. VIP Jr. VIP

    Joined:
    Oct 1, 2014
    Messages:
    423
    Likes Received:
    187
    Location:
    South Florida
    Home Page:
    I'm not 100% what your asking about the AT toyota emails.

    I'm guessing if you mean that if you have let's say:
    admin AT toyota
    support AT toyota
    help AT toyota

    You only want to keep one of them? To do this you could put your whole list into a spreadsheet and clear duplicate domains. If you don't know how to do this try to google an answer or pay someone off a freelancer site like $2-3 to do it.

    For negative keywords, give it a try and then just manually look through the collected leads and get a feel for them. You should be able to draw some conclusions just by looking at what they look like
     
  5. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Thanks for all the replies and sorry for getting back so late to this thread, entirely missed th replies.

    So i emailed the scrapebox help which helped a lot with this matter. The Toyota was just an example, the problem is im selling products to the construction business and when im scraping their websites i get a lot of links that has nothing to do with the target base im looking for. So its unavoidable due to bad syntax used by the webmasters of these sites. I have managed to clear out duplicate emails but cleaning out emails is truly impossible to do due to Scrapebox giving me lot of bad links which i cant never get down to 0. So i have just learned to accept that out of all results there is around let say 10-15% that are junk.

    There is no way around this solution because if you look at the URL many would look like they have nothing to do with my target audience so i cant remove them using the filter. I only remove the overall like i type in words such as facebook, news, etc which helps me get rid of lot of junk but not all. The only way is to check them manually which I am doing, but im only able to spot obvious ones since the nonobvious emails are many times real potential customers.

    However I have a question regarding the internal proxies, would it be better to invest in squidproxy to get better results and less failed scrape attempts? How many proxies would I need if im scraping around 1-5 million links?
     
  6. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Also i did notice that in my target business most emails are on the landing page so I think i would be better off skipping over the Link Extractor procedure (sure ill loose some emails but i will avoid getting so much junk).
     
  7. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:
    Well Scrapebox is limited by what the search engines give back, so yes there is going to be stuff thats not on topic because of what some webmasters use. People are random, plain and simple.

    As for proxies, yes you would probably get better results, but its about how fast you go. So you could pick up 10 proxies and set a delay in the detailed harvester of like 3 or maybe 5 and probably scrape endlessly, millions and millions of links, its just going to be slower then if you use 50 proxies for example with no delay. So if you wanted to spend $1000 and buy hundreds and hundreds of proxies, you could go really fast, but you could spend $10 and buy 10 shared proxies and use a delay and get the same results, just a little slower.
     
    • Thanks Thanks x 1
  8. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Thanks for the quick reply Loopline.

    Ye I'm starting to realise this now, it has nothing to do with scrapebox itself. As a matter of fact I did purchase proxies from one of the sites you recommended (myprivateproxies) and it went fine until SB stopped scrapping from Google and continued on to Bing :(.. I guess they got banned (temporarily I hope?). I started to scrape after that and could only reach 5% of the links i did on my first scrape. I did not change any settings at all btw.

    I've spent many hours watching your videos on youtube loopline. Regardless of topic thats one of the most informative videos ive ever came across so keep up the good work!

    p.s
    I did not delay as much this time around :p
     
  9. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:
    Yes they got banned. You just need to go slower, you can use the detailed harvester and set a delay. It might seem slow at first, but when you leave it run you will be surprised how fast it adds up. But yes its a temp ban 12-48 usually, depending on various factors. Yeah its about testing, so you delay less and they get banned so next time delay a bit more until you find the sweetspot of just slow enough to not get banned but still going as fast as you can.

    Im glad the videos are helpful. :) Subscribe for more or check the link in my signature for tips in your mailbox.
     
  10. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Yes it seems so, but I still want volume however. I found this formula working out pretty good on a small scale so I am now trying to achieve 10 times the emails or even 100 times (long shot) and therefore my focus is mainly on the custom harvester. If i chose shared proxies instead is there any downside to it?

    Thanks for the tip btw I will continue to read and study the material you have put out like im obsessed ;)
     
  11. Jonaz86

    Jonaz86 Junior Member

    Joined:
    Sep 16, 2015
    Messages:
    140
    Likes Received:
    5
    Loopline, a good feature for Scrapebox would be the ability to remove duplicate email domain provider excluding the mail providers.. So it would exclude all the emails sharing same ISP provider. So you dont endup sending emails to every frigging person on that company lol.

    Just a thought, right now using third party app to fix this.

    P.S
    Is there a way to find expired domains with Scrapebox for the scandinavian market? It only works with GoDaddy it seems which is a shame. Wish you had some complete tutorials for that, i know Google can be used to track down expired domains etc but a complete guide would be frigging awesome.
     
    Last edited: Nov 26, 2015