1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to trim URLs to root domain? (including URLs with subdomain)

Discussion in 'Black Hat SEO Tools' started by aldis, Nov 26, 2012.

  1. aldis

    aldis Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 10, 2009
    Messages:
    1,326
    Likes Received:
    532
    Occupation:
    CSEO Founder
    Home Page:
    I've trying unsuccessfully to trim to root some web 2.0 properties that contains subdomains with Scrapebox.

    for example, If I would like to trim to root this URL:
    Code:
    http://thav44shbu.wordpress.com/2012/11/26/outlines-for-effective-programs-of-dieta-proteica/
    It will only be trimmed to:
    Code:
    http://thav44shbu.wordpress.com/
    and I'm looking a way to trim it directly to:
    Code:
    http://wordpress.com
    Is there a way/tool/method to do this? I've been searching around the web and playing around with SB with no luck

    any help is kindly appreciated.
     
  2. kappa84

    kappa84 Power Member

    Joined:
    May 19, 2010
    Messages:
    736
    Likes Received:
    334
    Location:
    Bath, UK
    Maybe these answers here might help?
    Code:
    [URL]http://stackoverflow.com/questions/6433799/regular-expression-to-remove-subdomain-from-root-domain-in-list-notepad-or-g[/URL]
     
  3. aldis

    aldis Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 10, 2009
    Messages:
    1,326
    Likes Received:
    532
    Occupation:
    CSEO Founder
    Home Page:
    can't get it to work with the regular expressions for notepad++ posted on that page, any other suggestions?
     
  4. shailzrocks

    shailzrocks Jr. Executive VIP Jr. VIP Premium Member

    Joined:
    Jan 1, 2012
    Messages:
    1,113
    Likes Received:
    1,766
    Occupation:
    Dolphin Seller in a Black Market
    Location:
    No Man's Land
    any more solutions to this?
     
  5. bramantya

    bramantya Regular Member

    Joined:
    Oct 23, 2010
    Messages:
    244
    Likes Received:
    55
    i think u should hire a developer for that
     
  6. guvenor

    guvenor Junior Member

    Joined:
    Aug 11, 2009
    Messages:
    165
    Likes Received:
    89
    <?php
    function createTLD($cache_filename, $max_tl=2) {
    $cache_folder = str_replace(basename($cache_filename), '', $cache_filename);
    if (!file_exists($cache_folder) || !is_writable($cache_folder)) {
    throw new Exception($cache_folder . ' is not writable!');
    }
    // feel free to use "fsockopen()" or "curl_init()" if "fopen wrappers" are disabled or "memory_limit" is to low
    $tlds = @file('http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1');
    if ($tlds === false) {
    throw new Exception('effective_tld_names.dat is not readable!');
    }
    $i = 0;
    // remove unnecessary lines
    foreach ($tlds as $tld) {
    $tlds[ $i ] = trim($tld);
    // empty comments top level domains this is overboard
    if (!$tlds[ $i ] || $tld[0] == '/' || strpos($tld, '.') === false || substr_count($tld, '.') >= $max_tl) {
    unset($tlds[ $i ]);
    }
    $i++;
    }
    $tlds = array_values($tlds);
    file_put_contents($cache_filename, "<?php\n" . '$tlds = ' . str_replace(array(' ', "\n"), '', var_export($tlds, true)) . ";\n?" . ">");
    // feel free to split the file into multiple and smaller files f.e. by first char of the domain-level-name to reduce memory usage
    }
    function getHost($dom='', $fast=false) {
    // general
    $dom = !$dom ? $_SERVER['SERVER_NAME'] : $dom;
    // for parse_url() ftp:// http:// https://
    $dom = !isset($dom[5]) || ($dom[3] != ':' && $dom[4] != ':' && $dom[5] != ':') ? 'http://' . $dom : $dom;
    // remove "/path/file.html", "/:80", etc.
    $dom = parse_url($dom, PHP_URL_HOST);
    // replace absolute domain name by relative (http://www.dns-sd.org/TrailingDotsInDomainNames.html)
    $dom = trim($dom, '.');
    // for fast check
    $dom = $fast ? str_replace(array('www.', 'ww.'), '', $dom) : $dom;
    // separate domain level
    $lvl = explode('.', $dom);// 0 => www, 1 => example, 2 => co, 3 => uk
    // fast check
    if ($fast) {
    if (!isset($lvl[2])) {
    return isset($lvl[1]) ? $dom : false;
    }
    }
    // set levels
    krsort($lvl);// 3 => uk, 2 => co, 1 => example, 0 => www
    $lvl = array_values($lvl);// 0 => uk, 1 => co, 2 => example, 3 => www
    $_1st = $lvl[0];
    $_2nd = isset($lvl[1]) ? $lvl[1] . '.' . $_1st : false;
    $_3rd = isset($lvl[2]) ? $lvl[2] . '.' . $_2nd : false;
    $_4th = isset($lvl[3]) ? $lvl[3] . '.' . $_3rd : false;
    // tld check
    require('cache/tlds/all.txt'); // includes "$tlds"-Array or feel free to use this instead of the cache version:
    //$tlds = array('co.uk', 'co.jp');
    $tlds = array_flip($tlds);// needed for isset()
    // fourth level is TLD
    if ($_4th && !isset($tlds[ '!' . $_4th ]) && (isset($tlds[ $_4th ]) || isset($tlds[ '*.' . $_3rd ]))) {
    $dom = isset($lvl[4]) ? $lvl[4] . '.' . $_4th : false;
    }
    // third level is TLD
    else if ($_3rd && !isset($tlds[ '!' . $_3rd ]) && (isset($tlds[ $_3rd ]) || isset($tlds[ '*.' . $_2nd ]))) {
    $dom = $_4th;
    }
    // second level is TLD
    else if (!isset($tlds[ '!' . $_2nd ]) && (isset($tlds[ $_2nd ]) || isset($tlds[ '*.' . $_1st ]))) {
    $dom = $_3rd;
    }
    // first level is TLD
    else {
    $dom = $_2nd;
    }
    return $dom ? $dom : false;
    }
    $urls = array(
    'http://www.example.com',// example.com
    'http://subdomain.example.com',// example.com
    'http://www.example.uk.com',// example.uk.com
    'http://www.example.co.uk',// example.co.uk
    'http://www.example.com.ac',// example.com.ac
    'http://example.com.ac',// example.com.ac
    'http://www.example.accident-prevention.aero',// example.accident-prevention.aero
    'http://www.example.sub.ar',// example.sub.ar
    'http://www.congresodelalengua3.ar',// congresodelalengua3.ar
    'http://congresodelalengua3.ar',// congresodelalengua3.ar
    'http://www.example.pvt.k12.ma.us',// k12.ma.us (wrong) / if $max_tl=4: example.pvt.k12.ma.us
    'http://www.example.lib.wy.us',// lib.wy.us (wrong) / if $max_tl=3: example.lib.wy.us
    'com',// false
    '.com',// false
    'http://big.uk.com',// big.uk.com
    'uk.com',// false / in fast mode: uk.com (wrong)
    'www.uk.com',// http://www.uk.com / in fast mode: uk.com (wrong)
    '.uk.com',// false / in fast mode: uk.com (wrong)
    'stackoverflow.com',// stackoverflow.com
    );
    if (!file_exists('cache/tlds/all.txt')) {// feel free to refresh by interval
    createTLD('cache/tlds/all.txt');
    }
    echo '<pre>';
    foreach ($urls as $url) {
    echo $url . ':' . var_export(getHost($url), true) . "\n";
    }
    echo $_SERVER['SERVER_NAME'] . ':' . var_export(getHost(), true) . "\n";
    echo '</pre>';
    ?>

    I dont take credit for this code, but it has worked for me in the past if u have access to a linux server.
     
  7. kvmcable

    kvmcable Supreme Member

    Joined:
    Dec 28, 2010
    Messages:
    1,355
    Likes Received:
    2,815
    Occupation:
    24 year business owner - old school dude
    Location:
    KFC - BW3
    Not the cleanest method but will work without special code.

    1.) Use scrapebox as you have

    2.) Drop in Excel the list

    3.) Use Text to Columns with delimiter period

    4.) Sort list so all subdomain at top or bottom

    5.) Concatenate the root domain with the TLD inserting a period.

    Done
     
  8. iglow

    iglow Elite Member

    Joined:
    Feb 20, 2009
    Messages:
    2,080
    Likes Received:
    856
    Home Page:
    SCRAPEBOX sry for capslock
     
  9. aldis

    aldis Jr. VIP Jr. VIP Premium Member

    Joined:
    Apr 10, 2009
    Messages:
    1,326
    Likes Received:
    532
    Occupation:
    CSEO Founder
    Home Page:
    what?
     
  10. Flappage

    Flappage Newbie

    Joined:
    Aug 23, 2011
    Messages:
    15
    Likes Received:
    5
    I can't post the URL, but this has proved very useful for me - search G for 'url_tools xlam' and look at the connect.icrossing page.