1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Is it possible to make this POWERFUL bot? What do you think?

Discussion in 'Black Hat SEO' started by CoyoteAssassin, Jun 8, 2012.

  1. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    Fellow members.

    If what I am looking for already exists or if you know someone that can make it, PLEASE reply or PM me.


    Before I start looking for a developer, I need to know if you think it is possible to make this bot.

    Here we go..

    Lets say I have an Excel sheet with the following fields:
    Company
    Contact
    Address
    Phone
    URL
    Email

    Is there a way to have the bot look at the URL, scrape the website for an email address and post the results into the Email column (which will be blank originally)?

    I know that I can copy all URL's and dump them into an email extractor or do WHOIS, but I need to be able to tie the found URL back to the listing.

    IF the only option I have is to match domain.com with email@domain.com, I'll do that but I really do not want to since many may have free email accounts listed on their website.

    So, can this be done. Any idea how it will or problems that I may face?

    Any comments or thoughts for improvement will be appreciated.

    -CA
     
  2. forwardedlandlines

    forwardedlandlines Jr. VIP Jr. VIP

    Joined:
    Feb 10, 2012
    Messages:
    540
    Likes Received:
    372
    Heya, yes this can be done. However, the problem you will come across is that a website may have multiple email addresses for multiple contacts. Considering that every website is different in their formatting, you can not know which email address belongs to which contact. Unless you make it very complex to visually recognize what belongs to what..
     
    • Thanks Thanks x 1
  3. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    For the type of companies I am crawling, most will only have one email as they are owner operated.

    However, I do want it to save all emails into one cell. Once finished, I'll sort and have Excel separate multiple emails into their own column. I'll then manually have someone go back to the page and get the contact information to go along with the additional email addresses and to make sure it matches.

    At best, I think only 5% of my domains would have more than one email on their website.

    So, since it is possible, are you able to build such script or do you know someone that can? I want it to be desktop based. It will most likely be Java but not sure.
     
  4. forwardedlandlines

    forwardedlandlines Jr. VIP Jr. VIP

    Joined:
    Feb 10, 2012
    Messages:
    540
    Likes Received:
    372
    Is the excel sheet easily editable? Is it CSV? Or is it in some encoded format?
     
  5. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    At the moment, it is .xls but I can easily save it to CSV and strip out any formatting. I would imagine that the results would save as CSV.
     
  6. HelloInsomnia

    HelloInsomnia Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Mar 1, 2009
    Messages:
    1,817
    Likes Received:
    2,913
    This is possible, even with Ubot you can have it only save the details if it finds an email, then save the url in one column and the email in another. However, I don't know how to make it scrape the whole site - I'm sure somebody more advanced could whip up something for it pretty quickly.
     
    • Thanks Thanks x 1
  7. jairathnem

    jairathnem Power Member

    Joined:
    Oct 27, 2010
    Messages:
    550
    Likes Received:
    316
    Occupation:
    Student
    Location:
    Incredible India!
    Home Page:
    I think this can be done using zennoposter.
    If you have zennoposter PM me, i will get it done , for free!
     
    • Thanks Thanks x 1
  8. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    This is a pretty easy and straight forward bot.. If you have Ubot, it's even easier (and desktop based EXE.)

    One issue that I would see is like above.. If there are multiple emails on the page, personally I would solve that by prioritizing them.. For example, if the email contains the URL, then that's probably the one you want (owner@whatever.com) Otherwise, if it contains the company name (company@yahoo.com) etc.

    And an issue specific to Ubot is that it I don't believe it reads/writes to Excel files, but that's easy to solve. Save the XLS data to CSV format and Ubot can handle that with no problem and so can Excel.

    If you go to the Ubot forums, somebody can probably get this done for you for a good price.
     
    • Thanks Thanks x 1
  9. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    I do not have zennoposter. I looked it up - does it automate tasks like uBot?

    I need something created for me that I will run as I'll need to do it many times and in the future. It sounds like Ubot is part of the solution but I have never used it and wouldn't know where to start.

    But I guess a good programmer would.

    Prioritizing the multiple emails makes sense or I can just have each email save to a new column. Then, I can go in and copy Company Name - Web and paste it to a new line and bring the email address down with it and manually take John out of John@domain.com and use that as the first name or get a VA or someone to revisit those sites to get full name.

    Thanks again guys.

    Out of curiosity... is this type of bot of interest to anyone? I would think that it exists already. I can't be the only one who see's the potential.
     
  10. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    Definitely not the first one to see its potential as email extractors are fairly common, I think it's mainly the format of the XLS that sets it apart. Most email extractors can import XLS, but they import only a URL list; although I'm sure with some searching you could find one that either can accept your existing format or a similar format that your XLS can be easily changed to.
     
  11. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    Again, I don't care about format. I just have it in Excel because it doesn't ask me five times if I want to save it like working with CSV does.

    I can' convert it to CSV. I just want it to read the domain from the CSV, find an email, and put it back in the same row that it got the domain from so that it matches up.

    I use about 12 other email scrappers but none that do this.

    At least we know (or think) that it is possible. No idea how much it will costs but if it is too much, I'll resell and hope to recoup some costs.
     
  12. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    That's actually what I meant. :)

    Not necessarily XLS vs. CSV (like you said, they can easily be switched) but that most email scrapers are not going to use all of your information (name, company, url, address, phone, etc.) Most only take the URL, so your stuff wouldn't end up in the correct row most likely.

    Sorry for the confusion, that's what I meant by them accepting your existing format.
     
    • Thanks Thanks x 1
  13. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    Not very hard to do.
    Regarding multiple emails just add additional columns : Email1,Email2,Email3
    or separate the emails with pipe and use one column : email@email.com|email1@email.com|email2@email.com
    You can get this done for $30-$50 to $100 at freelancer.com for example.
    I suggest to make it more generic, so the output depends on the input and not on fixed columns.
    For example i want to upload a CSV file with 3 columns Company Name, Phone,URL - the output will be Company Name,Phone,URL,Email
    or i want to upload a CSV file with 4 columns Company Name, Phone, Fax, URL - the output will be Company Name,Phone,Fax,URL,Email
    I hope that makes sense. It will be more valuable that way.
     
    • Thanks Thanks x 1
  14. IMShane

    IMShane Junior Member

    Joined:
    Sep 20, 2011
    Messages:
    131
    Likes Received:
    23
    The job is not hard at all.
    You can simply use python + mechanize to write a script to do it. I'm doing stuff like this all the time, scraping specific pages for certain contents, form submitting, etc.
    PM me if you still need help.
     
    • Thanks Thanks x 1
  15. rulez05

    rulez05 Power Member

    Joined:
    Feb 3, 2011
    Messages:
    745
    Likes Received:
    142
    I saw something like this on imacros but I guess ubot can do it all.
     
  16. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    Thanks guys for the input. As some mentioned, it sounds like I need a Ubot programmer...

    If you can do this, send me a bid via PM.
     
  17. flyingbear

    flyingbear Junior Member

    Joined:
    Mar 7, 2011
    Messages:
    195
    Likes Received:
    19
    if you need to do this on large scale, you should consider multi threads. since it is for different sites, you do not have to use proxy. the limit is your connection's band width.

    not sure ubot can actually do it this way.
     
    • Thanks Thanks x 1
  18. kshatriya

    kshatriya Regular Member

    Joined:
    May 17, 2010
    Messages:
    341
    Likes Received:
    98
    Location:
    Sharjah, UAE
  19. CoyoteAssassin

    CoyoteAssassin Elite Member

    Joined:
    Jan 3, 2010
    Messages:
    1,862
    Likes Received:
    3,906
    Occupation:
    Full Time IMer
    Location:
    USA
    From what I can tell, that is just an email scrapper. The trick is getting the scrapper to read the URL and save the results in the same row and not in a new, email only, list.

    I've hired someone to do it for me. It will be $250 and be a file that I install, point to the CSV, watch the process, and then get the results.

    I'm looking forward to it as I have some great ideas for it. :)

    Thanks everyone for the input and suggestions.

     
  20. wowhaxor

    wowhaxor Executive VIP Premium Member

    Joined:
    Apr 28, 2007
    Messages:
    2,021
    Likes Received:
    3,353
    Location:
    ?¿?
    Home Page:
    Have you not just tried scraping the whois? That way they are in a nice standard format (the e-mail, name of person or business, everything is in the same spot for each site which would make automation a snap) - I know a lot of people spam/mass mail webmasters by scraping the domain registrars info like that.