1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Looking for a way to clean up a url to remove any subdomain?

Discussion in 'General Programming Chat' started by tb303, Nov 18, 2012.

  1. tb303

    tb303 Power Member

    Joined:
    Dec 18, 2011
    Messages:
    601
    Likes Received:
    280
    I just wanted to knock something up that can go through a list of urls and output just the host name & tld and not any subdomains (if one exists)

    eg
    Code:
    www.google.com       becomes    google.com          (easy striping www.)
    groups.google.com    becomes    google.com          (split at "." and just use the last two items)
    groups.google.co.uk  becomes    google.co.uk        (but heres where i get stuck as that wont work)
    Started this thinking it would be a quick one but now Im stuck.
    If i could find a complete list of all TLD's then maybe could do it with then but surely there must be an easier way!

    Ive started in VB but could do it in PHP if theres a method with that. Really just trying to figure out the theory of it.

    Anyone have an idea?

    I have googled this but what i find is not really related and I figured someone on here must have run into this problem before :fingerscrossed: :)
     
  2. guvenor

    guvenor Junior Member

    Joined:
    Aug 11, 2009
    Messages:
    165
    Likes Received:
    89
    I dont think it can be done without a full tld list im afraid, it would be easy then :)
     
  3. cgimaster

    cgimaster Power Member

    Joined:
    Jun 30, 2012
    Messages:
    525
    Likes Received:
    311
    Gender:
    Male
    http://stackoverflow.com/a/569219/342740

    Best way would be having a list of ccTLD and gTLD like guvenor mentions above, there are other ways to do it however they are not 100% reliable.

    Here some php sample however it may not be as reliable as having the list:
    Code:
    print get_domain("http://google.com");
    
    function get_domain($url)
    {
      $pieces = parse_url($url);
      $domain = isset($pieces['host']) ? $pieces['host'] : '';
      if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
        return $regs['domain'];
      }
      return false;
    }
     
    • Thanks Thanks x 1
    Last edited: Nov 18, 2012