1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Remove dup lines from nodes.txt

Discussion in 'Black Hat SEO' started by MariosElGreco, Nov 2, 2016.

  1. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Hi everyone , goodmorning .
    Does anyone know a better way to remove the dup lines from nodes.txt of xrumer ?
    I tried some software but they freazing / stock (its 10gb file) .
    Any script on python maybe ?
     
  2. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,496
    Likes Received:
    8,427
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    There are a lot of results on Google for this: https://www.google.com/search?q=remove+duplicates+from+big+file, which lead to stackoverflow and other places, try a few.

    If you have a lot of duplicate lines, alternatively you can just split the big file into smaller chunks, run your dupe remover application, join the new files into one, split them again, remove the dupes... and do this until you have the desired result.
     
  3. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    5,014
    Likes Received:
    3,180
    Location:
    Europe
    Home Page:
    Scrapebox ftw
     
  4. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    i try it , it stuck :p
     
  5. mnunes532

    mnunes532 Supreme Member

    Joined:
    Jan 21, 2014
    Messages:
    1,438
    Likes Received:
    468
    Gender:
    Male
    Location:
    Portugal
  6. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    5,014
    Likes Received:
    3,180
    Location:
    Europe
    Home Page:
    Haha, well, in that case, try their "dup remover" plugin and split the file into a few smaller pieces. That's how I do it.
     
  7. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    • Thanks Thanks x 1
  8. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Not luck at all it stuck and autoclosing the software ... i will try to speak with scrapebox team if they can do that for it , but i don't beilive in miracles .
     
  9. dabp

    dabp Junior Member

    Joined:
    Sep 24, 2016
    Messages:
    143
    Likes Received:
    31
    Gender:
    Male
    I don't know the exact nature of your problem, but for deleting duplicate lines I use excel.

    [​IMG]

    Hope this helps :)
     
  10. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    5,014
    Likes Received:
    3,180
    Location:
    Europe
    Home Page:
    Mate, he is talking about a 10gb .txt file. Excel won't handle that ever. :)
     
  11. ankit03

    ankit03 Jr. VIP Jr. VIP

    Joined:
    Apr 3, 2016
    Messages:
    1,579
    Likes Received:
    143
    try textmechnaic
     
  12. dabp

    dabp Junior Member

    Joined:
    Sep 24, 2016
    Messages:
    143
    Likes Received:
    31
    Gender:
    Male
    Alright, as I said, didn't know the nature of it :) Ignore my advise :)
     
  13. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Well , most of the advice cannot help me , yes its 10gb file , and its not only that . every 2 lines (url and content) it like 1 line . So it need advance duplicate remover .
    I will try to post it on xrumer section , after all , they can create a script for that .!
     
  14. botrockets

    botrockets Regular Member

    Joined:
    Mar 16, 2013
    Messages:
    355
    Likes Received:
    551
    Gender:
    Male
    Occupation:
    Entrepreneur
    Location:
    BotRockets
    I have created a tool for removing duplicates from file with size around 100 GB !
    But its not free !
     
  15. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    Give a shot a tell me price
     
  16. BlogPro

    BlogPro Jr. VIP Jr. VIP

    Joined:
    Apr 23, 2012
    Messages:
    565
    Likes Received:
    497
    Home Page:
    What OS are you on?

    Are the duplicate lines adjacent? Or are they spread across your entire 10 GB of file?
     
  17. Pakal

    Pakal Junior Member

    Joined:
    Dec 6, 2015
    Messages:
    120
    Likes Received:
    57
    Gender:
    Male
    Location:
    http://bit.cards
    If you have access to a linux machine it should be fairly simple, just use the command below:

    Code:
    cat nodes.txt | uniq > nodes-unique.txt
     
  18. MariosElGreco

    MariosElGreco Regular Member

    Joined:
    May 31, 2014
    Messages:
    270
    Likes Received:
    10
    Gender:
    Male
    Occupation:
    http://eas-seo.com
    Location:
    Greece
    Home Page:
    http://www.heypasteit.com/clip/32R7
    For example , in this link , there are 2 duplicate lines (by lines i mean , url and after url the content) , if it has same content and diffrent url is not duplicate ..


    Windows 7 / Kali linux
     
  19. Aty

    Aty Jr. VIP Jr. VIP

    Joined:
    Jan 27, 2011
    Messages:
    5,994
    Likes Received:
    4,088
    Occupation:
    SEO (Senior Erection Officer)
    Location:
    your 6 o'clock
    Home Page:
    Try Scrapebox, but at 10 gigs all you need is patience and a prayer so your software doesn't crash.
     
  20. Pakal

    Pakal Junior Member

    Joined:
    Dec 6, 2015
    Messages:
    120
    Likes Received:
    57
    Gender:
    Male
    Location:
    http://bit.cards
    Perfect, if you run Kali Linux you can use the command I have posted in my previous answer. That should do the trick, but it could take a while though