1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Remove duplicate domain from a listbox

Discussion in 'Visual Basic .NET' started by 4wanglah, Nov 10, 2010.

  1. 4wanglah

    4wanglah Junior Member

    Joined:
    Sep 14, 2008
    Messages:
    199
    Likes Received:
    69
    Location:
    behind you
    Hi,

    I would like to ask..

    How do I remove duplicate domain in a listbox?
    This is my code and I want it to remove duplicate domain from a listbox that has been loaded before.

    Let say I have URL

    google.com/analytics
    google.com/insight
    yahoo.com
    blackhatworld.com
    blackhatworld.com/blackhat-seo

    And when I click a button

    It will remove the duplicate domain. So the output will be like this

    google.com/analytics
    yahoo.com
    blackhatworld.com

    Thanks for you help in advance..
     
  2. peterlolz

    peterlolz Junior Member

    Joined:
    Jan 27, 2009
    Messages:
    135
    Likes Received:
    480
    Add all urls to an string[] url_array and create an empty List<string> outputlist
    then do a foreach with your url_array and check if your outputlist contains Regex.match(currenturl,"[^/]*"), use anotehr foreach loop for that. If there are no matches add the url to the outputlist.
    This will do this job for you.

    peterlolz
     
  3. Monrox

    Monrox Power Member

    Joined:
    Apr 9, 2010
    Messages:
    615
    Likes Received:
    579
    Peter's suggestion will work but once you get it implemented don't forget to take care of subdomains. See this list:
    Code:
    sonerkaya1985.so.funpic.de/index.php?action=profile;u=2163
    disikurtlar.di.funpic.de/forum/index.php?action=profile;u=5967
    eplavi.ep.funpic.de/viewpage.php?page_id=12
    figosport.de/index.php?option=com_sobi2
    martin-kremer.de/forum/index.php?action=profile;u=2118;sa=summary
    kapitalanlageforum.de/index.php?topic=20657.0
    tutopials.de/forum/index.php?action=profile;u=2206
    Using match(currenturl,"[^/]*") will say that there are no duplicates in it but from a SEO point of view this isn't so since there are only 5 domains in that list. The domain 'funpic.de' is present 3 times.

    So in reality instead of having 7 links that count you would end up with 5 and a spammy footprint recognized through more than one link per domain. Search engines might simply give you the reputation of one link but some spam prevention mechanism on the site itself may see this, delete all three links and waste your efforts.

    If the list is small you won't be wasting much work but if it is 50 000 or something this can have a big impact.

    Just getting the 2 last parts of the main url will introduce another problem. You can have
    Code:
     
    something.co.uk
    another.us
    third.edu.cn
    fourth.edu.cn
    
    Neither of this is a duplicate of the same domain so you will need a lot of tweaking to get rid of duplicates without removing actual uniques.
     
  4. andee

    andee Regular Member

    Joined:
    Jul 24, 2010
    Messages:
    218
    Likes Received:
    83
    nevermind, did not read the OP question properly.
     
    Last edited: Nov 12, 2010
  5. popzzz

    popzzz Supreme Member

    Joined:
    Apr 12, 2009
    Messages:
    1,337
    Likes Received:
    13,716
    Location:
    Don't touch the REP!
    Last edited: Nov 13, 2010
  6. 4wanglah

    4wanglah Junior Member

    Joined:
    Sep 14, 2008
    Messages:
    199
    Likes Received:
    69
    Location:
    behind you
    Thanks for the input peter.
    Will try that out..

    Yeah.. I agree with you..
    Thanks Monrox

    Thanks will try to look at it..
    I prefer to have the source code as well since I just started learning .net

    BTW,
    thank you very much for the help guys..
     
  7. atomic999

    atomic999 Newbie

    Joined:
    Apr 22, 2009
    Messages:
    7
    Likes Received:
    1
    You can use Uri for this, in c# it will look like :
    Code:
     
    Uri Url;
    HashSet<Int32> AllDomains = new HashSet<Int32>();
    List<string> UniqueDomains = new List<string>();
    foreach(string Str in listbox)
    {
    if (Uri.TryCreate(str,UriKind.Absolute,out Url))
    {
    if(AllDomains.Add(Url.Host.GetHashCode())) { UniqueDomains.Add(str); }
    }
    }
    
    HashSet can have only unique values, so it will return false if duplicate detected. In list UniqueDomains You have that what You've wanted ;)
     
    • Thanks Thanks x 1
  8. 4wanglah

    4wanglah Junior Member

    Joined:
    Sep 14, 2008
    Messages:
    199
    Likes Received:
    69
    Location:
    behind you
    Thanks Atomic.. I'll try that