Develop a Search Engine Like Google

agag2

Supreme Member
Joined
Feb 17, 2009
Messages
1,303
Reaction score
260
HelloI have an idea of developing a search engine similar to Google. I'm looking for a programmer that will be able to develop this, but I have a few questions? what is the cost for such a program? Is a $5,000 - $10,000 budget ok? (I'm talking about the software itself, not money for servers)? how long should I expect to wait for such a program to be developed ?? how do I screen applicants? What questions should I ask them? (I'll be using freelancer to hire, seems to be hard to fif a good freelancer)Thanks
 
He didnt sayed, he will compare with google, he asking how muck will cost website with search engine. My friend bought script for that less than 100$, but I think you need unique site. Good luck.
 
Hmm... With a budget of around 5k to 10k i think you are best of if you find freelancers who are experts with lucene:

http://lucene.apache.org/core/

For a development of a very own search library your budget is simply too low - but I think you can do a reasonably good search engine with lucenes solrsearch - heres a sample architecture:

http://wiki.apache.org/solr/SolrCloud

elasticsearch (bases on solrsearch) might be an option aswell... https://foursquare.com/ for example uses this searchengine...

As for freelancers you should look for peaple who have a solid knowhow (preferably certified) regarding Java as lucene is written in java (theres also a python implementation - but i would let this one slide as its more difficult to find a capable python programmer than a seasoned java programmer), as its possible to save the indizes of lucene in a NoSQL DB like Apaches CouchDB you might want to look for a well chosen freelancer who has sufficent knowhow in this field or a SQL specialist for the DB you intend to use. A Linux/Unix Guru familiar with Clustering - certified LPIC3 or LPIC2 atleast.

Remember that projectmanagement is EVERYTHING if you want to pull through such a project!!!

cheers olystyle
 
Last edited:
Hmm... With a budget of around 5k to 10k i think you are best of if you find freelancers who are experts with lucene:

http://lucene.apache.org/core/

For a development of a very own search library your budget is simply too low - but I think you can do a reasonably good search engine with lucenes solrsearch - heres a sample architecture:

http://wiki.apache.org/solr/SolrCloud

elasticsearch (bases on solrsearch) might be an option aswell... https://foursquare.com/ for example uses this searchengine...

As for freelancers you should look for peaple who have a solid knowhow (preferably certified) regarding Java as lucene is written in java (theres also a python implementation - but i would let this one slide as its more difficult to find a capable python programmer than a seasoned java programmer), as its possible to save the indizes of lucene in a NoSQL DB like Apaches CouchDB you might want to look for a well chosen freelancer who has sufficent knowhow in this field or a SQL specialist for the DB you intend to use. A Linux/Unix Guru familiar with Clustering - certified LPIC3 or LPIC2 atleast.

Remember that projectmanagement is EVERYTHING if you want to pull through such a project!!!

cheers olystyle

Thanks for reply.

Ive heard of lucenes but don't know much about it. I don't care to use it as long as it won't limit me in functionality..
 
I can try ( provided, you don't do chargebacks :D )..

Frankly speaking, the search you see is only the Tip of the Iceberg. If you just think about the data collection / staging / processing behind, probably you will cancel this project.
 
You can't beat Google with money or $s if you have better idea over Google then may be you're successful...In Simple, Google is an Idea and Just Another New Idea Beats Google"
 
The critical thing is how big it has to be in order for you to make money. Making a web crawler and a search functionality isn't special, any half-decent programmer will be able to deliver (actually I remember reading a language manual in 1997 that had a bare-bones web crawler as an example).

What is special is scalability. That 's hard to do right and that 's where the real cost of development is. I 'm not talking about the server cost, things like that don't scale simply by pouring more hardware on them if you don't design it to from the very beginning.

If you are going for a lot of data harvesting and processing, the software must be designed with scalability in mind from scratch. And details are critical in this scale. For example olystyle mentioned elasticsearch (which is super cool!, I use it a lot). Elasticsearch scales simply by adding nodes (sweet)- but one little detail you might come up if you get involved with it is that you can't just throw your data on it and forget about it, because it 's not robust as a database - if data gets corrupted, the only way is to feed it again from your copy, which means that this design decision has just doubled the cost of storage in your system :)

Apart from the development, you should be aware that everything changes when scale goes up, even the most simple things. Take the filesystem for example. Do you think you 'll be storing 100 Terrabytes of data in an ext3 partition? Welcome in the area of distributed filesystems. Again, design choices to be made.

Will $10k be enough for a big data implementation? Not by a longshot, but if you have money to burn, why not try, you might get very lucky and find someone who will have the skills and take it for that.

Will $10k be enough for a small crawler/search implementation? Yes.
 
Developing something similar to what Google has could take your years also, you would need a very good development team. Like Jazz pointed out, it all depends on how much flexible and scalable your script would be for a small decent engine, $10k would be decent but if you are planning for real big stuffs then I can't even estimate the cost :)
 
f I were you, I'd have a chat with the devs of http://www.inoutscripts.com
They have the most sophisticated search engine script commercially available that I've seen. They may be able to provide custom work to fit your needs.
 
You would need to engage a gang of master programmers and before that you would need to manage a lot of money and to spend a long time.
 
I think you need to try something new with the small budget you have, instead of starting the next big search engine, then scale if you succeed.
Here's an example: this failed search engine made in Italy: volunia(dot)com. Its don't simply scrap the web, and rank by content,PR,... like google does; but try to merge social signals with a social interface for its users and rank social trends higher.
It is a flop, a start-up failure because they tried hard to beat google when the original goal was another, to meta-tagging the social preferences.
An epic fail not only for the limited budget, but also for the bad programming skills, bad marketing, bad investors that transformed the original idea in shit.
You need to overcome all this and Google will still beat you... :)
 
1. you want to make it similar to google so why bother, google already exists so u wont have any advantage over them.
2. $10k no is not enough because u cant develop the system in one go. google did thousands of itterations, you need to do thousands too.
3. you need an entire infrastructure and you need a big ass team just to manage it and keep it running.
4. google doesn't have a crawler and just that. thats the tip of the iceberg. they have 99% behind the crawler (e.g. anti-spam) and without it your engine will be full of crap/spam.
5. frankly with $50k total cost you wouldnt even be able to run a clone of majesticseo or ahref and thats just a fraction of the data.
6. you do realize you will need several datacenters worth of hardware, teams to manage and 70%+ of the cost will be with power and HVAC not the hardware itself. Google has a big ass DC cooled with water from the river 363 days/year. that's almost zero cost with HVAC for that DC.
7. if you want to profit from this you won't. even if you manage to build this and run it google will just sink you through other ways. think behind the curtain games.
8. if u want to make big money there are tens of thousands ideas which are more realistic tha this.
9. look for some videos of google tech people talking architecture and scaling and you will understand how far from it you are. they inovated almost everything. they invented a linux distro, they invented filesystems and daemons and all sorts of software and algorithms.
10. the post you made by itself tells me not only you're not close but you are 1000 miles further away than you think and there are a billion things you never thought about that will appear as problems.
11. google was built and started to grow when the web was very very small. tiny compared to today. not only that but they were the first and they built something that was a solution to a problem. you don't provide a solution to a problem. in your case is something more like building a nice machine gun designed to fight against somebody who has 1000 nukes, 20 aircraft carriers, 1000 jet fighters and 1000 tanks. just doesn't work.
12. google is worth like 50 billion. they have 50 billion ways to bury you alive at the very first sign that you may be a problem for them.
13. look at wolframalpha.com - awesome concept, not nearly as succesful as google.
14. yahoo and bing are in a coma and they started about the same time with google and had more money than google initially.
15. part of the problem you have is same problem yahoo/bing have - patents google owns. you cant implement certain algorithms like PR in your engine because google has a patent on it. that is one of the reasons yahoo/bing couldnt compete with google.
16. you have a chance though if you somehow get Chuck Norris to help you... Chuck Norris doesn't have to crawl the web, he already has all the information in his memory - he was born with all the info that has been, is now and will ever be on the web.

How_Chuck_Norris_Uses_a_Computer_MegaDump_2-s540x404-111552-580.jpg
 
Back
Top