Remove dup lines from nodes.txt

MariosElGreco

Regular Member
Joined
May 31, 2014
Messages
277
Reaction score
10
Hi everyone , goodmorning .
Does anyone know a better way to remove the dup lines from nodes.txt of xrumer ?
I tried some software but they freazing / stock (its 10gb file) .
Any script on python maybe ?
 
There are a lot of results on Google for this: https://www.google.com/search?q=remove+duplicates+from+big+file, which lead to stackoverflow and other places, try a few.

If you have a lot of duplicate lines, alternatively you can just split the big file into smaller chunks, run your dupe remover application, join the new files into one, split them again, remove the dupes... and do this until you have the desired result.
 
Scrapebox ftw
 
Haha, well, in that case, try their "dup remover" plugin and split the file into a few smaller pieces. That's how I do it.
 
Haha, well, in that case, try their "dup remover" plugin and split the file into a few smaller pieces. That's how I do it.
Not luck at all it stuck and autoclosing the software ... i will try to speak with scrapebox team if they can do that for it , but i don't beilive in miracles .
 
I don't know the exact nature of your problem, but for deleting duplicate lines I use excel.

5e5cf1fad7874f6e9181933f4c3d9252.png


Hope this helps :)
 
Mate, he is talking about a 10gb .txt file. Excel won't handle that ever. :)
 
Well , most of the advice cannot help me , yes its 10gb file , and its not only that . every 2 lines (url and content) it like 1 line . So it need advance duplicate remover .
I will try to post it on xrumer section , after all , they can create a script for that .!
 
I have created a tool for removing duplicates from file with size around 100 GB !
But its not free !
 
What OS are you on?

Are the duplicate lines adjacent? Or are they spread across your entire 10 GB of file?
 
If you have access to a linux machine it should be fairly simple, just use the command below:

Code:
cat nodes.txt | uniq > nodes-unique.txt
 
Try Scrapebox, but at 10 gigs all you need is patience and a prayer so your software doesn't crash.
 
http://www.heypasteit.com/clip/32R7
For example , in this link , there are 2 duplicate lines (by lines i mean , url and after url the content) , if it has same content and diffrent url is not duplicate ..


Windows 7 / Kali linux

Perfect, if you run Kali Linux you can use the command I have posted in my previous answer. That should do the trick, but it could take a while though
 
Back
Top