Is it possible to make this POWERFUL bot? What do you think?

CoyoteAssassin

Elite Member
Joined
Jan 3, 2010
Messages
1,868
Reaction score
3,988
Fellow members.

If what I am looking for already exists or if you know someone that can make it, PLEASE reply or PM me.


Before I start looking for a developer, I need to know if you think it is possible to make this bot.

Here we go..

Lets say I have an Excel sheet with the following fields:
Company
Contact
Address
Phone
URL
Email

Is there a way to have the bot look at the URL, scrape the website for an email address and post the results into the Email column (which will be blank originally)?

I know that I can copy all URL's and dump them into an email extractor or do WHOIS, but I need to be able to tie the found URL back to the listing.

IF the only option I have is to match domain.com with [email protected], I'll do that but I really do not want to since many may have free email accounts listed on their website.

So, can this be done. Any idea how it will or problems that I may face?

Any comments or thoughts for improvement will be appreciated.

-CA
 
Heya, yes this can be done. However, the problem you will come across is that a website may have multiple email addresses for multiple contacts. Considering that every website is different in their formatting, you can not know which email address belongs to which contact. Unless you make it very complex to visually recognize what belongs to what..
 
Heya, yes this can be done. However, the problem you will come across is that a website may have multiple email addresses for multiple contacts. Considering that every website is different in their formatting, you can not know which email address belongs to which contact. Unless you make it very complex to visually recognize what belongs to what..

For the type of companies I am crawling, most will only have one email as they are owner operated.

However, I do want it to save all emails into one cell. Once finished, I'll sort and have Excel separate multiple emails into their own column. I'll then manually have someone go back to the page and get the contact information to go along with the additional email addresses and to make sure it matches.

At best, I think only 5% of my domains would have more than one email on their website.

So, since it is possible, are you able to build such script or do you know someone that can? I want it to be desktop based. It will most likely be Java but not sure.
 
At the moment, it is .xls but I can easily save it to CSV and strip out any formatting. I would imagine that the results would save as CSV.
 
This is possible, even with Ubot you can have it only save the details if it finds an email, then save the url in one column and the email in another. However, I don't know how to make it scrape the whole site - I'm sure somebody more advanced could whip up something for it pretty quickly.
 
This is a pretty easy and straight forward bot.. If you have Ubot, it's even easier (and desktop based EXE.)

One issue that I would see is like above.. If there are multiple emails on the page, personally I would solve that by prioritizing them.. For example, if the email contains the URL, then that's probably the one you want ([email protected]) Otherwise, if it contains the company name ([email protected]) etc.

And an issue specific to Ubot is that it I don't believe it reads/writes to Excel files, but that's easy to solve. Save the XLS data to CSV format and Ubot can handle that with no problem and so can Excel.

If you go to the Ubot forums, somebody can probably get this done for you for a good price.
 
I do not have zennoposter. I looked it up - does it automate tasks like uBot?

I need something created for me that I will run as I'll need to do it many times and in the future. It sounds like Ubot is part of the solution but I have never used it and wouldn't know where to start.

But I guess a good programmer would.

Prioritizing the multiple emails makes sense or I can just have each email save to a new column. Then, I can go in and copy Company Name - Web and paste it to a new line and bring the email address down with it and manually take John out of [email protected] and use that as the first name or get a VA or someone to revisit those sites to get full name.

Thanks again guys.

Out of curiosity... is this type of bot of interest to anyone? I would think that it exists already. I can't be the only one who see's the potential.
 
Definitely not the first one to see its potential as email extractors are fairly common, I think it's mainly the format of the XLS that sets it apart. Most email extractors can import XLS, but they import only a URL list; although I'm sure with some searching you could find one that either can accept your existing format or a similar format that your XLS can be easily changed to.
 
Again, I don't care about format. I just have it in Excel because it doesn't ask me five times if I want to save it like working with CSV does.

I can' convert it to CSV. I just want it to read the domain from the CSV, find an email, and put it back in the same row that it got the domain from so that it matches up.

I use about 12 other email scrappers but none that do this.

At least we know (or think) that it is possible. No idea how much it will costs but if it is too much, I'll resell and hope to recoup some costs.
 
That's actually what I meant. :)

Not necessarily XLS vs. CSV (like you said, they can easily be switched) but that most email scrapers are not going to use all of your information (name, company, url, address, phone, etc.) Most only take the URL, so your stuff wouldn't end up in the correct row most likely.

Sorry for the confusion, that's what I meant by them accepting your existing format.
 
Not very hard to do.
Regarding multiple emails just add additional columns : Email1,Email2,Email3
or separate the emails with pipe and use one column : [email protected]|[email protected]|[email protected]
You can get this done for $30-$50 to $100 at freelancer.com for example.
I suggest to make it more generic, so the output depends on the input and not on fixed columns.
For example i want to upload a CSV file with 3 columns Company Name, Phone,URL - the output will be Company Name,Phone,URL,Email
or i want to upload a CSV file with 4 columns Company Name, Phone, Fax, URL - the output will be Company Name,Phone,Fax,URL,Email
I hope that makes sense. It will be more valuable that way.
 
The job is not hard at all.
You can simply use python + mechanize to write a script to do it. I'm doing stuff like this all the time, scraping specific pages for certain contents, form submitting, etc.
PM me if you still need help.
 
I saw something like this on imacros but I guess ubot can do it all.
 
Thanks guys for the input. As some mentioned, it sounds like I need a Ubot programmer...

If you can do this, send me a bid via PM.
 
if you need to do this on large scale, you should consider multi threads. since it is for different sites, you do not have to use proxy. the limit is your connection's band width.

not sure ubot can actually do it this way.
 
From what I can tell, that is just an email scrapper. The trick is getting the scrapper to read the URL and save the results in the same row and not in a new, email only, list.

I've hired someone to do it for me. It will be $250 and be a file that I install, point to the CSV, watch the process, and then get the results.

I'm looking forward to it as I have some great ideas for it. :-)

Thanks everyone for the input and suggestions.

I'm not sure if you need a Ubot programmer for this.

Try this out. And see if it works.

http://scriptmafia.org/modules/90353-codecanyon-simple-email-extractor-v22.html

You put your list of URLs in there, and it extracts emails from there for you.
 
Have you not just tried scraping the whois? That way they are in a nice standard format (the e-mail, name of person or business, everything is in the same spot for each site which would make automation a snap) - I know a lot of people spam/mass mail webmasters by scraping the domain registrars info like that.
 
Back
Top