1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Remove dup lines from nodes.txt

Discussion in 'Black Hat SEO' started by MariosElGreco, Nov 2, 2016.

  1. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Hi everyone , goodmorning .
    Does anyone know a better way to remove the dup lines from nodes.txt of xrumer ?
    I tried some software but they freazing / stock (its 10gb file) .
    Any script on python maybe ?
     
  2. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,290
    Likes Received:
    8,260
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    There are a lot of results on Google for this: https://www.google.com/search?q=remove+duplicates+from+big+file, which lead to stackoverflow and other places, try a few.

    If you have a lot of duplicate lines, alternatively you can just split the big file into smaller chunks, run your dupe remover application, join the new files into one, split them again, remove the dupes... and do this until you have the desired result.
     
  3. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    4,603
    Likes Received:
    2,945
    Location:
    Europe
    Home Page:
    Scrapebox ftw
     
  4. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    i try it , it stuck :p
     
  5. mnunes532

    mnunes532 Supreme Member

    Joined:
    Jan 21, 2014
    Messages:
    1,433
    Likes Received:
    463
    Gender:
    Male
    Location:
    Portugal
  6. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    4,603
    Likes Received:
    2,945
    Location:
    Europe
    Home Page:
    Haha, well, in that case, try their "dup remover" plugin and split the file into a few smaller pieces. That's how I do it.
     
  7. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    • Thanks Thanks x 1
  8. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Not luck at all it stuck and autoclosing the software ... i will try to speak with scrapebox team if they can do that for it , but i don't beilive in miracles .
     
  9. dabp

    dabp Junior Member

    Joined:
    Sep 24, 2016
    Messages:
    143
    Likes Received:
    31
    Gender:
    Male
    I don't know the exact nature of your problem, but for deleting duplicate lines I use excel.

    [​IMG]

    Hope this helps :)
     
  10. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    4,603
    Likes Received:
    2,945
    Location:
    Europe
    Home Page:
    Mate, he is talking about a 10gb .txt file. Excel won't handle that ever. :)
     
  11. ankit03

    ankit03 Jr. VIP Jr. VIP

    Joined:
    Apr 3, 2016
    Messages:
    1,413
    Likes Received:
    123
    try textmechnaic
     
  12. dabp

    dabp Junior Member

    Joined:
    Sep 24, 2016
    Messages:
    143
    Likes Received:
    31
    Gender:
    Male
    Alright, as I said, didn't know the nature of it :) Ignore my advise :)
     
  13. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Well , most of the advice cannot help me , yes its 10gb file , and its not only that . every 2 lines (url and content) it like 1 line . So it need advance duplicate remover .
    I will try to post it on xrumer section , after all , they can create a script for that .!
     
  14. botrockets

    botrockets Regular Member

    Joined:
    Mar 16, 2013
    Messages:
    355
    Likes Received:
    550
    Gender:
    Male
    Occupation:
    Entrepreneur
    Location:
    BotRockets
    I have created a tool for removing duplicates from file with size around 100 GB !
    But its not free !
     
  15. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Give a shot a tell me price
     
  16. BlogPro

    BlogPro Jr. VIP Jr. VIP

    Joined:
    Apr 23, 2012
    Messages:
    551
    Likes Received:
    475
    Home Page:
    What OS are you on?

    Are the duplicate lines adjacent? Or are they spread across your entire 10 GB of file?
     
  17. Pakal

    Pakal Junior Member

    Joined:
    Dec 6, 2015
    Messages:
    116
    Likes Received:
    55
    Gender:
    Male
    Location:
    http://bit.cards
    If you have access to a linux machine it should be fairly simple, just use the command below:

    Code:
    cat nodes.txt | uniq > nodes-unique.txt
     
  18. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    268
    Likes Received:
    8
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    http://www.heypasteit.com/clip/32R7
    For example , in this link , there are 2 duplicate lines (by lines i mean , url and after url the content) , if it has same content and diffrent url is not duplicate ..


    Windows 7 / Kali linux
     
  19. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,988
    Likes Received:
    4,083
    Occupation:
    SEO (Senior Erection Officer)
    Location:
    your 6 o'clock
    Home Page:
    Try Scrapebox, but at 10 gigs all you need is patience and a prayer so your software doesn't crash.
     
  20. Pakal

    Pakal Junior Member

    Joined:
    Dec 6, 2015
    Messages:
    116
    Likes Received:
    55
    Gender:
    Male
    Location:
    http://bit.cards
    Perfect, if you run Kali Linux you can use the command I have posted in my previous answer. That should do the trick, but it could take a while though