1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[METHOD] How I scraped 300 000 emails within 3-4 hours.

Discussion in 'Black Hat SEO' started by todordonev, Dec 7, 2014.

  1. todordonev

    todordonev Regular Member

    Joined:
    Nov 23, 2012
    Messages:
    382
    Likes Received:
    229
    Gender:
    Male
    Location:
    Bulgaria
    Home Page:
    Hey guys I have this strange feeling that I want to share something today. Here we go.

    We are going to scrape emails from xls files that many businesses/organisations use to organize their data in.
    What I used:
    1. Licorne AIO (works faster than scrapebox on my end)
    2. Wget.
    3. Maxprog eMail extractor.

    Basically if you dont have funds for the softwares above, you can download and extract the data manually. Also it doesnt have to be xls files, alternatives I used and gained results: rar,zip,xlsx,csv and others.
    So first we will be using a google dork/footprint to scrape google for xls files.

    filetype:xls intext:mad:gmail.com <----- mark this, right click, search google for it and see that you will gain tons of results. You can add a keyword on the end.

    This is the very first dork/footprint that I used. You can play with it and see what the outcome is. It doesnt have to be "@gmail.com" you can use @aol.com @yahoo.com etc etc. Paste this dork/footprint in google or your favorite google scraper and scrape the results.

    Okay so now that we have the urls for the xls files (downloads automatically) we are going to use wget. Assuming your scraped results are already saved in a txt file each on new line, we are going to create a .bat file for wget to work.

    Create a txt file and paste the bellow line:

    wget -t 1 -i file.txt

    where file.txt is obviously the file with the scraped urls. Save as file.bat file and close it. Wget.exe file.bat and file.txt have to be in one folder for this to work. Now doubleclick on the bat file and you will see a black window - thats wget, it will download the xls files in the same folder. Note: if you have slow internet connection change the bat file to:

    wget -t 3 -i file.txt

    -t
    refers to "tries or retries" or the number of times wget will try to download the file.

    Okay so now that we have the xls files we need to extract the emails from them. If you open the xls files you can see that there could be way much info, not only the emals. You can take advantage of this and laser target your campaigns.
    Fire up Maxprog eMail extractor. This tool is going to get all mails from txt/csv/xls/xlsx files.

    Edit->Preferences..-> General tab -> enable to get the bad emails too.

    do the above to squeeze maximum juice from the files.

    Drag the xls files to the email extractor and leave it to do its job. After finished place the "valid" file somewhere in safe - those are the email addreses. Put "bad" somewhere and keep it. Note that quite often the name of the xls file is strongly relevant to its niche for example "somesite.com members.xls" If you have finished with the extracting. Put all the "bad" files back into the software - this way it will gather some more emails. Now this software displays the LINES that it extracted not the EMAILS. Open the "bad" file -> cut all and go to http://www.textmechanic.com this is very handy site that I use very often. We are going to weed out all the bad data. Click on "remove lines that contain" and start filtering
    remove lines that dont contain "@"
    remove lines that dont contain "."
    remove lines that contain "insert bad data here"
    Now sometimes in the there will be lines like [email protected] or [email protected] we are going to clean that also. Go to "make/remove line breaks"
    make new line after ".com"
    make new line after ".net"
    make new line after ".org"
    and so on. Now copy that text and replace it to the "bad" file and save. Put that file into the email extractor to gain even more results.


    There you have it. With this method I gained more than 1 billion lines of data. You can twist this method alot to gain different results. Mods I am aware of the email threads being moderated very strictly, if this thread falls in your blacklist, feel free to delete it.
     
    • Thanks Thanks x 13
    Last edited: Dec 7, 2014
  2. socialsmartm

    socialsmartm BANNED BANNED

    Joined:
    Nov 6, 2014
    Messages:
    94
    Likes Received:
    7
    Gender:
    Male
    nice share bro, i will give a try :)
     
  3. todordonev

    todordonev Regular Member

    Joined:
    Nov 23, 2012
    Messages:
    382
    Likes Received:
    229
    Gender:
    Male
    Location:
    Bulgaria
    Home Page:
    Dont forget to use proxies when scraping :)
     
  4. thisismymp3

    thisismymp3 Power Member

    Joined:
    Jan 6, 2010
    Messages:
    772
    Likes Received:
    291
    you can also skip all that, and just extract all the emails from scrape box, not nearly as many but i pulled about 20k emails in 5 minutes testing this method.
     
  5. Eternal1912

    Eternal1912 Power Member

    Joined:
    Dec 6, 2014
    Messages:
    616
    Likes Received:
    242
    Gender:
    Male
    Occupation:
    Freelance Writer
    Location:
    Bulgaria
    This is very useful, thanks for sharing. I'll try it when I fully develop my website !

    P.S: Тодор, май сме сънародници, ако можеш пиши ми лично или e-mail, че още немога да пиша, да те питам няколко неща относно това, ако е удобно. Благодаря ти,предварително!
     
  6. todordonev

    todordonev Regular Member

    Joined:
    Nov 23, 2012
    Messages:
    382
    Likes Received:
    229
    Gender:
    Male
    Location:
    Bulgaria
    Home Page:
    Hey man, feel free to use it. Sending you a PM :)

    Well I never knew you can use scrape box to do this kind of stuff. Thanks for your input :)
     
  7. emailextractions

    emailextractions Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 23, 2014
    Messages:
    119
    Likes Received:
    19
    Home Page:
    nice one, could definitely try with my tool
     
  8. Jeremy96x

    Jeremy96x Junior Member

    Joined:
    Aug 9, 2014
    Messages:
    149
    Likes Received:
    24
    Thanks for the share.. I'll have to get my hands on those tools soon
     
  9. abhi007

    abhi007 Jr. VIP Jr. VIP

    Joined:
    Aug 31, 2010
    Messages:
    5,475
    Likes Received:
    3,784
    Location:
    snip.li/TubH
    Might give bulk emailing a go in the neat future so bookmarking this thread :)
     
  10. FastService247

    FastService247 Power Member

    Joined:
    Oct 17, 2014
    Messages:
    682
    Likes Received:
    87
    Occupation:
    Internet Marketer and Doing Affiliate
    Location:
    HOPE NEVER DIES
    yeaaaa...looking smart method.
     
  11. thisismymp3

    thisismymp3 Power Member

    Joined:
    Jan 6, 2010
    Messages:
    772
    Likes Received:
    291
    my vote for one of those underrated methods that gets posted and not a lot of buzz around it. Good for us .
     
  12. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    954
    Likes Received:
    664
    Occupation:
    Web/Bot Developer
    Thanks for the share. I use another footprint to grab emails from Google but this one is pretty clever. The way you extract the emails from the .xls files however is a little labour intensive and could be done programmatically.

    Will automated this method from end to end using Python.
     
  13. linkking

    linkking Power Member

    Joined:
    Dec 23, 2013
    Messages:
    585
    Likes Received:
    112
    Location:
    NewYork
    What is the best way to use this amount of email addresses for marketing ?
     
  14. todordonev

    todordonev Regular Member

    Joined:
    Nov 23, 2012
    Messages:
    382
    Likes Received:
    229
    Gender:
    Male
    Location:
    Bulgaria
    Home Page:
    Yes I already have some ideas for full automation but the thing is that email marketing is not my favourite type of marketing. I got this whole idea last few days, it just popped in my head and I executed it to see if it works. There are tons of improvements that can be done.
    For example I have searched on how to make google query to select results from the last week/month/year as well as search in a specific place, but couldnt find any info about that.

    Also you can make a google search about other dorks/footprints ie. bing,yahoo,yandex etc. Those will give different results a.k.a more results.

    This method can be tweaked a lot.

    Probably rent a dedi/vps that has high email sending daily limit, install powerMTA, interspire, and blast the list with offers. Or take advantage of all the information that could be inside a xls file and laser target your campaigns to recieve higher output

    Example (im not good at this): I got my hands on list of accepted indian students (much infos about those guys). I got a high paying bisop CPA offer and made the emails look like
    subject: "hello %%fullname%% please open up. This is your %%university name%% secretary"
    body: "we give you this opportunity to pay your student taxes home debt bla bla bla bla bla"
    Here I expected to have big open rate because it doesnt look like spam, more like its super legit cause I have much info about the people that I send my campaign to. Well from 900 sent opened was only one email :D turns out indian students dont use their email addresses quite often. 2/4 mails didnt excist 1/4 bounced (bullshit data) 1/4 sent successfuly.

    If you have questions or suggestions feel free to post them here

    BTW i dont currently use this. I got the idea, made it work and shared it here so it doesnt collect dust on my desk.
     
    • Thanks Thanks x 1
  15. kelso

    kelso Regular Member

    Joined:
    Nov 30, 2009
    Messages:
    477
    Likes Received:
    276
    Very interesting. I am curious as to what do you intend to do with them and aren't they a little niche-random?
     
  16. todordonev

    todordonev Regular Member

    Joined:
    Nov 23, 2012
    Messages:
    382
    Likes Received:
    229
    Gender:
    Male
    Location:
    Bulgaria
    Home Page:
    I paid for a service to send 10k mails per day, they told me that I can use all my credit at once so I imported 240 000 emails into interspire and blew my credits. I now have only 30 000 credits left :D . Didnt get much results as I didnt research my list, just blasted a 3 liner with a adult offer url. Had 50-100 clicks and no money for that.

    If you do research those xls files however, you can skyrocket your conv. rate. Make a google search without the tools whatsoever and download 1-2 xls files. You'll see much info there.

    So yes they are niche-random files, but each has niche info in it. You can split them in different lists and send them different campaigns. :)
     
    • Thanks Thanks x 1
  17. DX-GENERATION

    DX-GENERATION Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 14, 2010
    Messages:
    1,192
    Likes Received:
    263
    This is great 300k emails scraping is not that easy mate !!!!

    Posted via Topify on Android
     
  18. netcelal

    netcelal Senior Member

    Joined:
    Jul 12, 2009
    Messages:
    949
    Likes Received:
    374
    Location:
    7/24 Internet
    Great methode...can you scraping emails for a specific country and keywords ?
     
  19. trance92071

    trance92071 Senior Member

    Joined:
    Nov 1, 2009
    Messages:
    950
    Likes Received:
    848
    Occupation:
    Internet Marketing
    Location:
    BoosterBots.com
    Home Page:
    Pretty old method here, nothing really new however it can help some people I am sure. However, a lot of these emails have already been blasted or they are not live. Save your bandwidth and learn how to grab opt-ins, you wont be disappointed.
     
  20. todordonev

    todordonev Regular Member

    Joined:
    Nov 23, 2012
    Messages:
    382
    Likes Received:
    229
    Gender:
    Male
    Location:
    Bulgaria
    Home Page:
    Old is gold my friend
    If you are doing manual google search you can specify date range and location in your search.
    Doing this you can choose only the new ones and not spammed-to-death ones, also this way you save your ip's and domains from blacklisting when you mass mail. I do have to agree with you though - opt-ins is way better but also not free ;)