Remove dup lines from nodes.txt

Discussion in 'Black Hat SEO' started by MariosElGreco, Nov 2, 2016.

  1. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Hi everyone , goodmorning .
    Does anyone know a better way to remove the dup lines from nodes.txt of xrumer ?
    I tried some software but they freazing / stock (its 10gb file) .
    Any script on python maybe ?
     
  2. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    8,051
    Likes Received:
    8,884
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    There are a lot of results on Google for this: https://www.google.com/search?q=remove+duplicates+from+big+file, which lead to stackoverflow and other places, try a few.

    If you have a lot of duplicate lines, alternatively you can just split the big file into smaller chunks, run your dupe remover application, join the new files into one, split them again, remove the dupes... and do this until you have the desired result.
     
  3. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    6,111
    Likes Received:
    3,880
    Location:
    Europe
    Home Page:
    Scrapebox ftw
     
  4. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    i try it , it stuck :p
     
  5. mnunes532

    mnunes532 Elite Member

    Joined:
    Jan 21, 2014
    Messages:
    1,757
    Likes Received:
    588
    Gender:
    Male
    Location:
    Portugal
  6. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    6,111
    Likes Received:
    3,880
    Location:
    Europe
    Home Page:
    Haha, well, in that case, try their "dup remover" plugin and split the file into a few smaller pieces. That's how I do it.
     
  7. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    • Thanks Thanks x 1
  8. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Not luck at all it stuck and autoclosing the software ... i will try to speak with scrapebox team if they can do that for it , but i don't beilive in miracles .
     
  9. dabp

    dabp Junior Member

    Joined:
    Sep 24, 2016
    Messages:
    145
    Likes Received:
    33
    Gender:
    Male
    I don't know the exact nature of your problem, but for deleting duplicate lines I use excel.

    [​IMG]

    Hope this helps :)
     
  10. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    6,111
    Likes Received:
    3,880
    Location:
    Europe
    Home Page:
    Mate, he is talking about a 10gb .txt file. Excel won't handle that ever. :)
     
  11. ankit03

    ankit03 Jr. VIP Jr. VIP

    Joined:
    Apr 3, 2016
    Messages:
    1,869
    Likes Received:
    163
    try textmechnaic
     
  12. dabp

    dabp Junior Member

    Joined:
    Sep 24, 2016
    Messages:
    145
    Likes Received:
    33
    Gender:
    Male
    Alright, as I said, didn't know the nature of it :) Ignore my advise :)
     
  13. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Well , most of the advice cannot help me , yes its 10gb file , and its not only that . every 2 lines (url and content) it like 1 line . So it need advance duplicate remover .
    I will try to post it on xrumer section , after all , they can create a script for that .!
     
  14. botrockets

    botrockets Regular Member

    Joined:
    Mar 16, 2013
    Messages:
    360
    Likes Received:
    555
    I have created a tool for removing duplicates from file with size around 100 GB !
    But its not free !
     
  15. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Give a shot a tell me price
     
  16. BlogPro

    BlogPro Jr. VIP Jr. VIP

    Joined:
    Apr 23, 2012
    Messages:
    825
    Likes Received:
    795
    What OS are you on?

    Are the duplicate lines adjacent? Or are they spread across your entire 10 GB of file?
     
  17. Pakal

    Pakal Jr. VIP Jr. VIP

    Joined:
    Dec 6, 2015
    Messages:
    125
    Likes Received:
    59
    Gender:
    Male
    Location:
    http://proxylte.com
    If you have access to a linux machine it should be fairly simple, just use the command below:

    Code:
    cat nodes.txt | uniq > nodes-unique.txt
     
  18. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    271
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    http://www.heypasteit.com/clip/32R7
    For example , in this link , there are 2 duplicate lines (by lines i mean , url and after url the content) , if it has same content and diffrent url is not duplicate ..


    Windows 7 / Kali linux
     
  19. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    6,147
    Likes Received:
    4,183
    Home Page:
    Try Scrapebox, but at 10 gigs all you need is patience and a prayer so your software doesn't crash.
     
  20. Pakal

    Pakal Jr. VIP Jr. VIP

    Joined:
    Dec 6, 2015
    Messages:
    125
    Likes Received:
    59
    Gender:
    Male
    Location:
    http://proxylte.com
    Perfect, if you run Kali Linux you can use the command I have posted in my previous answer. That should do the trick, but it could take a while though