1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Develop a Search Engine Like Google

Discussion in 'BlackHat Lounge' started by agag2, Jan 27, 2013.

  1. agag2

    agag2 Supreme Member

    Joined:
    Feb 17, 2009
    Messages:
    1,308
    Likes Received:
    254
    HelloI have an idea of developing a search engine similar to Google. I'm looking for a programmer that will be able to develop this, but I have a few questions? what is the cost for such a program? Is a $5,000 - $10,000 budget ok? (I'm talking about the software itself, not money for servers)? how long should I expect to wait for such a program to be developed ?? how do I screen applicants? What questions should I ask them? (I'll be using freelancer to hire, seems to be hard to fif a good freelancer)Thanks
     
  2. warp0011

    warp0011 Junior Member

    Joined:
    May 13, 2008
    Messages:
    137
    Likes Received:
    66
    Occupation:
    IT Executive
    Home Page:
    Maybe you can seek advice from Microsoft.
     
  3. MadMaddy

    MadMaddy Junior Member

    Joined:
    Dec 6, 2012
    Messages:
    165
    Likes Received:
    282
    Location:
    221B Baker Street
    So you are going to compete with Google with a budget of $10,000?

    Good Luck ! :p
     
    • Thanks Thanks x 2
  4. back2basics

    back2basics Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 11, 2012
    Messages:
    540
    Likes Received:
    360
    One does not simply clone Google...
     
    • Thanks Thanks x 6
  5. lietuvis002

    lietuvis002 BANNED BANNED

    Joined:
    Aug 3, 2012
    Messages:
    2,415
    Likes Received:
    2,576
    He didnt sayed, he will compare with google, he asking how muck will cost website with search engine. My friend bought script for that less than 100$, but I think you need unique site. Good luck.
     
  6. bizzlewebmaster

    bizzlewebmaster Registered Member

    Joined:
    Jan 26, 2013
    Messages:
    55
    Likes Received:
    9
    marketing the site is what you need to worry about bro.
     
  7. olystyle

    olystyle Regular Member

    Joined:
    Jan 6, 2012
    Messages:
    238
    Likes Received:
    103
    Hmm... With a budget of around 5k to 10k i think you are best of if you find freelancers who are experts with lucene:

    http://lucene.apache.org/core/

    For a development of a very own search library your budget is simply too low - but I think you can do a reasonably good search engine with lucenes solrsearch - heres a sample architecture:

    http://wiki.apache.org/solr/SolrCloud

    elasticsearch (bases on solrsearch) might be an option aswell... https://foursquare.com/ for example uses this searchengine...

    As for freelancers you should look for peaple who have a solid knowhow (preferably certified) regarding Java as lucene is written in java (theres also a python implementation - but i would let this one slide as its more difficult to find a capable python programmer than a seasoned java programmer), as its possible to save the indizes of lucene in a NoSQL DB like Apaches CouchDB you might want to look for a well chosen freelancer who has sufficent knowhow in this field or a SQL specialist for the DB you intend to use. A Linux/Unix Guru familiar with Clustering - certified LPIC3 or LPIC2 atleast.

    Remember that projectmanagement is EVERYTHING if you want to pull through such a project!!!

    cheers olystyle
     
    • Thanks Thanks x 3
    Last edited: Jan 27, 2013
  8. agag2

    agag2 Supreme Member

    Joined:
    Feb 17, 2009
    Messages:
    1,308
    Likes Received:
    254
    Thanks for reply.

    Ive heard of lucenes but don't know much about it. I don't care to use it as long as it won't limit me in functionality..
     
  9. redrubies

    redrubies Supreme Member

    Joined:
    Jan 17, 2011
    Messages:
    1,424
    Likes Received:
    2,560
    Location:
    USA
    He didn't say he wanted to clone google.
     
    • Thanks Thanks x 1
  10. The SEO

    The SEO Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 14, 2011
    Messages:
    3,976
    Likes Received:
    3,115
    Occupation:
    SEO/SMM
    Location:
    BHW
    hahahaha...Good and Perfect Comment
     
  11. Gogol

    Gogol Elite Member

    Joined:
    Sep 10, 2010
    Messages:
    3,066
    Likes Received:
    2,872
    Gender:
    Male
    I can try ( provided, you don't do chargebacks :D )..

    Frankly speaking, the search you see is only the Tip of the Iceberg. If you just think about the data collection / staging / processing behind, probably you will cancel this project.
     
    • Thanks Thanks x 1
  12. The SEO

    The SEO Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 14, 2011
    Messages:
    3,976
    Likes Received:
    3,115
    Occupation:
    SEO/SMM
    Location:
    BHW
    You can't beat Google with money or $s if you have better idea over Google then may be you're successful...In Simple, Google is an Idea and Just Another New Idea Beats Google"
     
  13. lanbo

    lanbo Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 23, 2009
    Messages:
    3,437
    Likes Received:
    595
    Home Page:
    The server costs will cost much more than the script
     
  14. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,155
    The critical thing is how big it has to be in order for you to make money. Making a web crawler and a search functionality isn't special, any half-decent programmer will be able to deliver (actually I remember reading a language manual in 1997 that had a bare-bones web crawler as an example).

    What is special is scalability. That 's hard to do right and that 's where the real cost of development is. I 'm not talking about the server cost, things like that don't scale simply by pouring more hardware on them if you don't design it to from the very beginning.

    If you are going for a lot of data harvesting and processing, the software must be designed with scalability in mind from scratch. And details are critical in this scale. For example olystyle mentioned elasticsearch (which is super cool!, I use it a lot). Elasticsearch scales simply by adding nodes (sweet)- but one little detail you might come up if you get involved with it is that you can't just throw your data on it and forget about it, because it 's not robust as a database - if data gets corrupted, the only way is to feed it again from your copy, which means that this design decision has just doubled the cost of storage in your system :)

    Apart from the development, you should be aware that everything changes when scale goes up, even the most simple things. Take the filesystem for example. Do you think you 'll be storing 100 Terrabytes of data in an ext3 partition? Welcome in the area of distributed filesystems. Again, design choices to be made.

    Will $10k be enough for a big data implementation? Not by a longshot, but if you have money to burn, why not try, you might get very lucky and find someone who will have the skills and take it for that.

    Will $10k be enough for a small crawler/search implementation? Yes.
     
    • Thanks Thanks x 2
  15. ugjunk

    ugjunk Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 1, 2011
    Messages:
    2,345
    Likes Received:
    721
    Location:
    Los Angeles
    Home Page:
    Developing something similar to what Google has could take your years also, you would need a very good development team. Like Jazz pointed out, it all depends on how much flexible and scalable your script would be for a small decent engine, $10k would be decent but if you are planning for real big stuffs then I can't even estimate the cost :)
     
    • Thanks Thanks x 1
  16. Tosmekop

    Tosmekop Supreme Member

    Joined:
    Oct 24, 2011
    Messages:
    1,208
    Likes Received:
    815
    f I were you, I'd have a chat with the devs of http://www.inoutscripts.com
    They have the most sophisticated search engine script commercially available that I've seen. They may be able to provide custom work to fit your needs.
     
  17. back2form

    back2form Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 15, 2012
    Messages:
    2,682
    Likes Received:
    1,203
    Gender:
    Male
    Location:
    in front of imac
    [​IMG]

    All the best!
     

    Attached Files:

    • Thanks Thanks x 1
  18. SnowWar

    SnowWar Power Member

    Joined:
    Mar 3, 2012
    Messages:
    595
    Likes Received:
    48
    Occupation:
    Pure student :p
    You would need to engage a gang of master programmers and before that you would need to manage a lot of money and to spend a long time.
     
    • Thanks Thanks x 1
  19. gabmasm

    gabmasm Registered Member

    Joined:
    Feb 19, 2012
    Messages:
    60
    Likes Received:
    29
    I think you need to try something new with the small budget you have, instead of starting the next big search engine, then scale if you succeed.
    Here's an example: this failed search engine made in Italy: volunia(dot)com. Its don't simply scrap the web, and rank by content,PR,... like google does; but try to merge social signals with a social interface for its users and rank social trends higher.
    It is a flop, a start-up failure because they tried hard to beat google when the original goal was another, to meta-tagging the social preferences.
    An epic fail not only for the limited budget, but also for the bad programming skills, bad marketing, bad investors that transformed the original idea in shit.
    You need to overcome all this and Google will still beat you... :)
     
    • Thanks Thanks x 1
  20. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    1. you want to make it similar to google so why bother, google already exists so u wont have any advantage over them.
    2. $10k no is not enough because u cant develop the system in one go. google did thousands of itterations, you need to do thousands too.
    3. you need an entire infrastructure and you need a big ass team just to manage it and keep it running.
    4. google doesn't have a crawler and just that. thats the tip of the iceberg. they have 99% behind the crawler (e.g. anti-spam) and without it your engine will be full of crap/spam.
    5. frankly with $50k total cost you wouldnt even be able to run a clone of majesticseo or ahref and thats just a fraction of the data.
    6. you do realize you will need several datacenters worth of hardware, teams to manage and 70%+ of the cost will be with power and HVAC not the hardware itself. Google has a big ass DC cooled with water from the river 363 days/year. that's almost zero cost with HVAC for that DC.
    7. if you want to profit from this you won't. even if you manage to build this and run it google will just sink you through other ways. think behind the curtain games.
    8. if u want to make big money there are tens of thousands ideas which are more realistic tha this.
    9. look for some videos of google tech people talking architecture and scaling and you will understand how far from it you are. they inovated almost everything. they invented a linux distro, they invented filesystems and daemons and all sorts of software and algorithms.
    10. the post you made by itself tells me not only you're not close but you are 1000 miles further away than you think and there are a billion things you never thought about that will appear as problems.
    11. google was built and started to grow when the web was very very small. tiny compared to today. not only that but they were the first and they built something that was a solution to a problem. you don't provide a solution to a problem. in your case is something more like building a nice machine gun designed to fight against somebody who has 1000 nukes, 20 aircraft carriers, 1000 jet fighters and 1000 tanks. just doesn't work.
    12. google is worth like 50 billion. they have 50 billion ways to bury you alive at the very first sign that you may be a problem for them.
    13. look at wolframalpha.com - awesome concept, not nearly as succesful as google.
    14. yahoo and bing are in a coma and they started about the same time with google and had more money than google initially.
    15. part of the problem you have is same problem yahoo/bing have - patents google owns. you cant implement certain algorithms like PR in your engine because google has a patent on it. that is one of the reasons yahoo/bing couldnt compete with google.
    16. you have a chance though if you somehow get Chuck Norris to help you... Chuck Norris doesn't have to crawl the web, he already has all the information in his memory - he was born with all the info that has been, is now and will ever be on the web.

    [​IMG]
     
    • Thanks Thanks x 3