1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to remove duplicate rows from a datatable based on two columns? (c# or vb?)

Discussion in 'General Programming Chat' started by tb303, Jan 1, 2013.

  1. tb303

    tb303 Power Member

    Joined:
    Dec 18, 2011
    Messages:
    601
    Likes Received:
    280
    Hi BHW,

    I have a data table with columns "name","phone","login","password" as columns

    I want to remove all the rows where "name" & "phone" are duplicated.

    But im having a hard time coming up with a google search to find my answer.

    I know I can do it based on one column like this...
    Code:
    Dim primaryKey(1) As DataColumnprimaryKey(0) = newdataset.Tables("tablename").Columns("name")
    newdataset.Tables("tablename").PrimaryKey = primaryKey
    newdataset.Tables("tablename").Merge(originaldataset.Tables("tablename"), True) 'merge in unique only
    newdataset.Tables("tablename").PrimaryKey = Nothing
    but that doesnt work if you have two people with the same name but different phone numbers.

    Any one point me in the right direction?
     
  2. zyx2000

    zyx2000 Registered Member

    Joined:
    May 6, 2011
    Messages:
    54
    Likes Received:
    5
    You could do this straight in SQL with a subquery.
     
  3. BlackHatWhiteHearth

    BlackHatWhiteHearth BANNED BANNED

    Joined:
    Dec 26, 2012
    Messages:
    24
    Likes Received:
    9
    It is not very clear what you are trying to achieve. How to handle the login and password columns ?
    John 123456 and John 123456 - obvious diplicates
    John 654321 and John 123456 - duplicates or not ? People with duplicated names, doesn't necessarily mean that they are the same person.

    John 123456 username1 password1 and John 123456 username2 password2 - duplicates, but the person has two accounts, which of the accounts should be deleted ?
     
  4. tb303

    tb303 Power Member

    Joined:
    Dec 18, 2011
    Messages:
    601
    Likes Received:
    280
    im sorry maybe it was a bad example - i rushed my post.

    Another example is a proxychecker.
    So it scrapes and imports proxys with various columns to the data table where the proxy address and port are in two columns on the datatable called "port" and "proxy"

    so this list....
    Code:
    52.20.10.92:8888
    52.19.99.30:3128
    52.20.10.92:8888
    68.82.77.20:8000
    68.82.77.20:8888
    becomes...
    Code:
    52.20.10.92:8888
    52.19.99.30:3128
    68.82.77.20:8000
    68.82.77.20:8888
    currently what im doing as a work around for this is taking the "proxy" & "port" fields joining them back together with at ":" to make a string for each proxy.
    then adding those strings to a new list of strings using distinct. Then parsing that list to split the proxy string back into proxy / port and adding that to a new datatable.
    but that seems a very long way round!

    SQL i guess as posted above would do it but I cant find an example.
     
    Last edited: Jan 5, 2013
  5. cgimaster

    cgimaster Power Member

    Joined:
    Jun 30, 2012
    Messages:
    525
    Likes Received:
    311
    Gender:
    Male
  6. BlackHatWhiteHearth

    BlackHatWhiteHearth BANNED BANNED

    Joined:
    Dec 26, 2012
    Messages:
    24
    Likes Received:
    9
    Use the GROUP BY clause:
    Code:
     
    using (SqlConnection connection = new SqlConnection("your connection string goes here"))            
    using (SqlCommand command = new SqlCommand("SELECT [IP], [Port] FROM TableName GROUP BY IP, Port", connection))
    {
      connection.Open();
      using (SqlDataReader reader = command.ExecuteReader())
      {
        while (reader.Read())
        {
           string ip = reader.GetString(0);
           string port = reader.GetString(1);
           //do whatever is needed
        }
      }
      connection.Close();
    }
    The code is in C#
     
    • Thanks Thanks x 1
    Last edited: Jan 5, 2013
  7. tb303

    tb303 Power Member

    Joined:
    Dec 18, 2011
    Messages:
    601
    Likes Received:
    280
    thanks, that pointed me in the right direction
     
  8. BlackHatWhiteHearth

    BlackHatWhiteHearth BANNED BANNED

    Joined:
    Dec 26, 2012
    Messages:
    24
    Likes Received:
    9
    You can also use:
    Code:
    SELECT [IP], MAX([Port]) FROM TableName GROUP BY IP
    assuming [Port] is a numeric value(integer)
    this will eliminate the duplicated IPs and will return the IPs with the largest [Port]
    For more information look for "SQL Aggregate Functions"
    or
    Code:
    SELECT DISTINCT [IP], [Port] FROM TableName
    which will produce the same results as grouping by two columns.
     
    • Thanks Thanks x 1