1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

remove ALL duplicates (2x) on big files?

Discussion in 'Black Hat SEO Tools' started by BigTroll, Mar 8, 2015.

  1. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Are many tools for duplicating remove, what I need is:

    Let's say I have a big list (3m users).
    First lines
    A
    B
    C
    C
    D

    I don't want to delete only "C", I want to delete 2xC, the final result to be something like:
    A
    B
    D

    Any website/tool for this, please?
     
  2. Brad100

    Brad100 Supreme Member

    Joined:
    Nov 9, 2014
    Messages:
    1,348
    Likes Received:
    966
    Gender:
    Male
    Notepad++ use its find and replace feature to do it. Put the "C" in the "find" box and leave the replace box emtry and you're done.
     
    • Thanks Thanks x 1
  3. rashedhns

    rashedhns Junior Member

    Joined:
    Aug 13, 2014
    Messages:
    127
    Likes Received:
    9
    Maybe its a long words not only the first letter :)
     
  4. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Dude...I said "3m lines", not only ABCD
    And no, I dont know which line is duplicate.

    And no, I won't take any line 1 by 1, to ctrl+f and replace. It would take me like 10 years
     
  5. Brad100

    Brad100 Supreme Member

    Joined:
    Nov 9, 2014
    Messages:
    1,348
    Likes Received:
    966
    Gender:
    Male
    Yeah, it'll remove all 3 millions, try it, its notepad++ not notepad.
     
  6. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Man, you don't understand.
    I dont know which is "C" on this 3m list. I don't know what to search on "find" box.
    I need a tool to IDENTIFY ALL duplicates and remove them.
     
  7. ttrox

    ttrox Regular Member

    Joined:
    Jun 28, 2013
    Messages:
    217
    Likes Received:
    75
    Grab TextFX Plugin for Notepad++.
    Sort by Ascending/descending order.
    Pull out a regex to remove duplicate lines that are separated by at least one line.
     
  8. kcollier63

    kcollier63 Registered Member

    Joined:
    Nov 23, 2010
    Messages:
    66
    Likes Received:
    28
    Look up TextPipe. It is a pretty good program for working with files. Saved my ass a bunch of times...
     
  9. Nitros

    Nitros Power Member

    Joined:
    Jan 30, 2009
    Messages:
    580
    Likes Received:
    298
    Brad is right, Notepad++ will do this work just fine. It supports regex expressions.
     
  10. conrulez

    conrulez Power Member

    Joined:
    Dec 29, 2009
    Messages:
    539
    Likes Received:
    426
    Gender:
    Male
    Location:
    USA
  11. HelloInsomnia

    HelloInsomnia Jr. Executive VIP Jr. VIP

    Joined:
    Mar 1, 2009
    Messages:
    1,825
    Likes Received:
    2,936
    In Notepad++

    Go to edit -> line operations -> sort ascending

    Hit Ctrl + H for find and replace

    Top box put this: ^(.*)(\r?\n\1)+$

    Second box put nothing

    Check regular expression (bottom left)

    Find and replace all

    Then: edit -> line operations -> remove empty lines

    That's it!
     
    Last edited: Mar 9, 2015
  12. HelloInsomnia

    HelloInsomnia Jr. Executive VIP Jr. VIP

    Joined:
    Mar 1, 2009
    Messages:
    1,825
    Likes Received:
    2,936
    That site doesn't do what he needs.

    And that popup tries to get you to install a virus. Don't use that site anymore, try this one next time: http://textmechanic.com/
     
    • Thanks Thanks x 1