1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

In the plex: the beginning of search

Discussion in 'BlackHat Lounge' started by davids355, Jan 24, 2014.

  1. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,786
    Likes Received:
    6,325
    Home Page:
    Don't son me off for posting this, but I just started reading "in the plex" (on audible because my eyes hurt:)); it's about google and how it all started off.

    the first chapter documents when the search engine (originally just an index of the web) made its first crawl of the web using their python crawler and early pagerank algorithm - they started comparing results to alta vista (the biggest search company at the time), and literally no one could beleive how much more accurate googles results were (although I don't think it was google back then, it was a university project), due to the fact that they based it on pagerank, compared to alta vista who based it on on page metrics.

    i know pagerank is not seen to have much value these days, but obviously the concept of pagerank, and using what is essentially public opinion, to rank sites is obviously still the same.

    i wonder what sergie and brin would have thought back then if someone told them that one day people would be creating hundreds of thousands of fake sites in order to fool their system into thinking a particular site was actually popular!

    Also makes me think - the traditional on page metrics of cataloging the web have all but died out due to the way it can be so easily manipulated. I wonder if the concept of pagerank will be replaced eventually as well, or if it will stay for good ?

    ive often thought about a system where website visitors can make a comment, or a selection about a website they visit, and that will then go towards categorising that site - sort of like how captcha works to solve unknown words from books.

    Anyway, as they say - know thine enemy :)
     
    • Thanks Thanks x 3
  2. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,786
    Likes Received:
    6,325
    Home Page:
    Every now and then I do I post and it gets absolutely nothing:)
     
    • Thanks Thanks x 1
  3. DarkPixel

    DarkPixel Jr. VIP Jr. VIP Premium Member

    Joined:
    Oct 4, 2011
    Messages:
    1,328
    Likes Received:
    1,239
    Location:
    ↓↓↓↓
    Home Page:
    Well I just thanked you twice, so you have that going for you, which is nice. :D
     
    • Thanks Thanks x 1
  4. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,786
    Likes Received:
    6,325
    Home Page:
    Thanks mate, appreciate it :)
     
  5. Panther28

    Panther28 Elite Member

    Joined:
    May 2, 2010
    Messages:
    2,268
    Likes Received:
    3,405
    Occupation:
    Internet.
    Location:
    Internet.
    i remember always using alta vista, and then someone said try this new thing called "google." yeah whatever.
     
    • Thanks Thanks x 1
  6. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,786
    Likes Received:
    6,325
    Home Page:
    I remember using altavista, but I don't really remember hearing about google (I can't remember google not being there).
     
    • Thanks Thanks x 1
  7. Panther28

    Panther28 Elite Member

    Joined:
    May 2, 2010
    Messages:
    2,268
    Likes Received:
    3,405
    Occupation:
    Internet.
    Location:
    Internet.
    What!! i can remember who told me about google, but i can't remember who i first sleep with.
     
  8. davids355

    davids355 Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 25, 2011
    Messages:
    8,786
    Likes Received:
    6,325
    Home Page:
    I have a bad memory:)

    seriously though, I would recommend the book to everyone here - obviously, being google they don't give away much details, but this book gives you a real insight into how google works, and how their algorithms have progressed to tackle various issues.

    It's actually amazing how comprehensive their system is, and how much data they hold on the Internet, and how much information they can glean from it.

    This may be a really obvious point to others, although I hadn't really thought of it before, but they can meassure how relevant a particular website is for a search term, by how quickly (if at all) a user returns to google after clicking the link!

    and also, reading about how they determine relevance on site, it's also amazing how deeply they analise words, related words, words that are and aren't often found together, how words change in meaning depending what other words they are together with and so on- and this is just the little info that google release, and based on 5-10 years ago.

    it makes me think how comprehensive their algorithms are now. And it's interesting to come up with more lavish ways to outwit the system.

    But it's also amazing how ineffective the system is against manipulation - or maybe it isn't?
    i wonder what percentage of top 10 results (across every single search made on google) are spam? And based on that, how much effort they are really putting into stopping it?