1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

how the hell do to that? remove duplicates .txt

Discussion in 'BlackHat Lounge' started by BigTroll, Jan 28, 2016.

  1. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Notepad++ crash as hell.
    I have a 26m lines .txt (476mb). I want to remove duplicates only, how can I do that? :(
     
  2. AlexIonescu

    AlexIonescu Regular Member

    Joined:
    May 20, 2015
    Messages:
    428
    Likes Received:
    71
    Gender:
    Male
    Occupation:
    SMO & SEO
    Location:
    Paradise
    Select all - TextFX- TextFX Tools - Sort lines case sensitive or insensitive

    TextFX is a plugin for notepad++. If you don`t have it, you'll need to download it.

    Hope this helps
     
  3. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Dude, I just said that notepadd crash on such a large file :(
     
  4. AlexIonescu

    AlexIonescu Regular Member

    Joined:
    May 20, 2015
    Messages:
    428
    Likes Received:
    71
    Gender:
    Male
    Occupation:
    SMO & SEO
    Location:
    Paradise
    Try this

    Hopefully browser will not crash.

    Or just try to separate the file in multiple files
     
  5. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    It doesnt work.
    And how am I supposed to remove duplicates, if the all lines aren't on the same .txt? :s
     
  6. AlexIonescu

    AlexIonescu Regular Member

    Joined:
    May 20, 2015
    Messages:
    428
    Likes Received:
    71
    Gender:
    Male
    Occupation:
    SMO & SEO
    Location:
    Paradise
    Select de first million, remove duplicates from that one, than the next...and eventually files will get smaller.
    And then combine them step by step. Needs some work but can be done. I made this way huge files
     
  7. RalphSEO

    RalphSEO Newbie

    Joined:
    Aug 25, 2014
    Messages:
    37
    Likes Received:
    19
    vim text editor or command line
     
    • Thanks Thanks x 1
  8. rexstacy

    rexstacy Junior Member

    Joined:
    Oct 28, 2015
    Messages:
    170
    Likes Received:
    87
    Gender:
    Male
    Occupation:
    Full Time IM
    Location:
    Dominican Republic
    Go the programming way
    Search google for python script to remove duplicate lines it will be job done in a couple of minutes
    of course you will have to install python.
     
  9. qrazy

    qrazy Senior Member

    Joined:
    Mar 19, 2012
    Messages:
    1,115
    Likes Received:
    1,723
    Location:
    Banana Republic
    Try Editplus. It does work well for me in the past. Just make sure you use 64-bit version as 32-bit applications have a memory limit of 2GB.
     
    Last edited: Jan 28, 2016
  10. trex6187

    trex6187 Newbie

    Joined:
    Sep 15, 2013
    Messages:
    24
    Likes Received:
    5
    try this php script, don't know if it will work for such a large file...
    <?php
    $array = file("yourfile.txt");
    $array = array_unique($array);
    file_put_contents('yourfileunique.txt', implode($array));
    ?>
     
    • Thanks Thanks x 1
  11. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Just downloaded editplus, what a shit...
     
  12. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Anyone could help me, please? :(
     
  13. Automated

    Automated Regular Member

    Joined:
    Jun 7, 2012
    Messages:
    289
    Likes Received:
    123
    Location:
    Online
    Split the file into smaller lists, remove the duplicates from each of the smaller lists, then combine the lists and dedupe the main list...
     
  14. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    from 26m only 3-4m are duplicates. it would't work
     
  15. avi619

    avi619 Jr. VIP Jr. VIP

    Joined:
    Apr 1, 2012
    Messages:
    1,370
    Likes Received:
    1,892
    Location:
    Somewhere out there
    Last edited: Jan 28, 2016
  16. BigTroll

    BigTroll Jr. VIP Jr. VIP

    Joined:
    Oct 15, 2014
    Messages:
    1,864
    Likes Received:
    725
    Occupation:
    CPA
    Location:
    ROMANIA
    Doesn't work.

    Help me guys please :(
     
  17. berkay1907

    berkay1907 Senior Member

    Joined:
    Mar 25, 2012
    Messages:
    1,127
    Likes Received:
    617
    Scrapebox can solve your problem if you got it.
     
  18. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,569
    Likes Received:
    11,034
    Occupation:
    Pusillanimous Knitter
    Location:
    Buenos Aires
    On a Linux console:

    cd to the folder that has the file

    Code:
    sort filename | uniq
    
    replace "filename" with the actual filename
     
  19. gashead

    gashead Junior Member

    Joined:
    Jun 15, 2014
    Messages:
    165
    Likes Received:
    50
  20. rere003

    rere003 Newbie

    Joined:
    Sep 22, 2012
    Messages:
    33
    Likes Received:
    17
    Location:
    New Java
    You can use scrapebox dup remove addons.