1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Howto compare and delete/filter out all lines in txt not containing text from 2nd txtfile?

Discussion in 'Black Hat SEO Tools' started by NamenloserHeld, Feb 8, 2016.

  1. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    Hey, i hope its ok to ask in SEO Tools for this. I need a tool/programm/webapp that can compare 2 .txt files. I hope you get it from the title. Its very hard to google a solution for this. You will find many threads from stackoverflow.com or superuser.com. I tried some wordlist tools but they cant do it like i want it. i also tried notepad++ and excel and failed. Most "Solutions" will only allow you to search for ONE SINGLE WORD/STRING! Thats NOT what i want! I need to check 10.000 lines against maybe 1.000 lines (OK - Maybe not THAT much, but for sure more than 1 check for 1 word in 1 file)! I wont upload a file because i need to learn to do this task many many times, so i need to be able to do it myself at any time. I hope you guys can help me with this. :)
     
  2. Sheraf

    Sheraf Registered Member

    Joined:
    Jan 19, 2014
    Messages:
    61
    Likes Received:
    8
    Basically you want to delete every like from text2.txt that are in text1.txt, right?

    on linux there is a tool called "diff", it kind of does what you want, google for a version with a GUI if you're not used to command line.
    If that's not good for you, I can build this tool, PM me i'll give you a quote.
     
  3. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    Hi! Thanks for your answer! Its hard to explain in this thread, because i cant make a paragraph/break in lines, it gets merged (SUCKS!). Its maybe not THAT easy as you think. Ill send you a PM, maybe we can code me a costum tool :)
     
  4. BlackBDO

    BlackBDO Jr. VIP Jr. VIP

    Joined:
    Jan 4, 2016
    Messages:
    507
    Likes Received:
    321
    Challenge accepted !
    Send me a PM with what you need, I think I can make it in C# in no time if I understood well what you want to do
     
    • Thanks Thanks x 1
  5. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    • Thanks Thanks x 2
  6. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    very nice! thank you for another solution. i test it out tomorrow :)

    //Edit: Can you maybe zip/rar the .exe and reupload it?
    uploaded site dosn't shows #hash/SHA256 so i cant verify the file even if you provide a virustotallink.
    yes, im very paranoid, but still happy that you try to help me! :)
     
    Last edited: Feb 12, 2016
  7. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    File rared:
    https://www.sendspace.com/file/bbxoa7

    Source code with compiled x86 and x 64 binaries:
    https://www.sendspace.com/file/i6l9r8
     
    • Thanks Thanks x 1
  8. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    hmmm i got all .NET Frameworks updated but im not able to execute this :O
     
  9. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    You would be running on a 32 bit system then. Download the source and binaries.

    Go to Link_delete -> link_delete -> bin -> x86 -> release and run the 32 bit executable link_delete.exe
     
    • Thanks Thanks x 1
  10. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    Hey, great tool! I like the GUI. But Unfortunately it dosnt look like it doing what i need. :(
    Here are 2 Screenshots of my test and what happens:

    Filter1.png:
    Field1/List1 is the bigger list i want to filter, Field2 is the FilterFile.
    It filters out everything but not the 2 lines i set for the filter, it does the thing but inverted. The Filteroutput should be:

    ojsvejts5s5evt
    aw4jtj4watja4wf

    Filter2.png:
    I tried to change the Files/Fields. Field1/List1 is the FilterList, Field2/List2 is Biglist I want the filter to apply.
    But as you can see, nothing shows up in field 3 (Filter-Output). I Don't know why.

    Do you understand my problem? :) Its a great tool, i found many Dupe-remove and Listcompare tools, but i need one that checks for the word in the whole line and filters out the line found inthe bigfiltes that are defined in the filterfile. It DOES filter, but not the needed lines, its the reserved way i needed it! Can you maybe help me again with this? :)

    When i Uncheck the [No Seperator "::"] box i get an Error on the real file i try'd to filter (See 3rd Screen)


    //Edit: The Filter must apply to any world found in the whole line! Like this:

    Biglist=

    w56985492356g_randomTextShowingUpinTheLines
    ezewvlgirvhit_randomTextShowingUpinTheLines
    ojsvejts5s5evt_randomTextShowingUpinTheLines
    he5vth5ekhvt5e_randomTextShowingUpinTheLines
    hdyhtrvli5vehtv_randomTextShowingUpinTheLines
    iethe5vtrgwr_randomTextShowingUpinTheLines
    pwu4t9euvwt_randomTextShowingUpinTheLines
    piwv4htoiph4wt_randomTextShowingUpinTheLines
    she5v5tihaw4ith_randomTextShowingUpinTheLines
    i4tihw4tha4wl_randomTextShowingUpinTheLines
    aj5thja4wthjf4aw_randomTextShowingUpinTheLines
    aw4jtj4watja4wf_randomTextShowingUpinTheLines
    law4jtoa4wjti3_randomTextShowingUpinTheLines
    j4tihj5eipzjer_randomTextShowingUpinTheLines

    FilterFile=

    ojsvejts5s5evt
    aw4jtj4watja4wf


    Output would be look like this=

    ojsvejts5s5evt_randomTextShowingUpinTheLines
    aw4jtj4watja4wf_randomTextShowingUpinTheLines

    Because it removes all lines not containing TextString from the FilterFile. I hope people can understand me. :)

    Filter3.png are the 3rd try results, no line was filterd out :(
     

    Attached Files:

    Last edited: Feb 12, 2016
  11. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    I did not expect to have to give tech support for such a basic program. You used the program wrong.

    Load the file with all the elements in box 1; check "Not http" and "No Seperator"; load the file that contains the elements that you want filtered out of the file in box 1 into box two; click remove in box 3.

    [​IMG]

    This was written for a person that wanted to remove urls in one file file that were in another file. I threw in removing dup keywords as well. It works for the application you described.
     
  12. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38

    No - sorry you have not accurate readed my post then.

    On your screen you filter out the exact same textline. But thats not a solution for my problem.
    If try to filter out a word what appears in the line it has zero effect with your tool.
    Here are 2 screen that proofs i used the settings you explaind without the results, this is what i get.

    [​IMG]

    [​IMG]

    Your tool can filter out lines, but only if its exactly the same line, not for a word or based on 1 signle chracters that are appearing in the line.

    The if the FilterFile has this inside:

    ojsvejts5s5evt
    aw4jtj4watja4wf

    How can the result be this?

    ojsvejts5s5evt_randomTextShowingUpinTheLines
    aw4jtj4watja4wf_randomTextShowingUpinTheLines


    You take the wrong lines for the filter that you used on your screenshot. I diddnt add a _randomTextShowingUpinTheLines but you did on your screen, thats why you though it works - it dosnt :)
     
    Last edited: Feb 12, 2016
  13. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    However, I added a small modification to specifically fit what you want.

    This is file 1:
    Code:
    w56985492356g_randomTextShowingUpinTheLines
    ezewvlgirvhit_randomTextShowingUpinTheLines
    ojsvejts5s5evt_randomTextShowingUpinTheLines
    he5vth5ekhvt5e_randomTextShowingUpinTheLines
    hdyhtrvli5vehtv_randomTextShowingUpinTheLines
    iethe5vtrgwr_randomTextShowingUpinTheLines
    pwu4t9euvwt_randomTextShowingUpinTheLines
    piwv4htoiph4wt_randomTextShowingUpinTheLines
    she5v5tihaw4ith_randomTextShowingUpinTheLines
    i4tihw4tha4wl_randomTextShowingUpinTheLines
    aj5thja4wthjf4aw_randomTextShowingUpinTheLines
    aw4jtj4watja4wf_randomTextShowingUpinTheLines
    law4jtoa4wjti3_randomTextShowingUpinTheLines
    j4tihj5eipzjer_randomTextShowingUpinTheLines
    This is file 2:
    Code:
    ojsvejts5s5evt
    aw4jtj4watja4wf
    This is the output:
    [​IMG]

    The file is here:
    https://www.sendspace.com/file/zzvuhl

    Virus total is here:
    https://www.virustotal.com/en/file/...1949df8f5a1989803b6dab6a/analysis/1455319940/

    I have no idea what the single positive is and I do not have the time to rewrite the section of the program to eliminate the positive. If the program does not fit your need, you will have to find another solution.

    This is the line that was changed (line 195 Form1.cs):
    if (string.Equals( str, line, StringComparison.Ordinal) || (partialLine.Checked && line.Contains(str)))

    it was changed from this:
    if (string.Equals( str, line, StringComparison.Ordinal))
     
    Last edited: Feb 12, 2016
  14. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    Thanks! I really really appreciate your time and work you spend here helping me!
    But I think its again a misunderstanding, I wrongly expressed what i want i need. if the filter would be:

    "rvhit_randomTe"

    The line that is the result would be:

    "ezewvlgirvhit_randomTextShowingUpinTheLines"

    All other lines that dosnt contain "rvhit_randomTe" have to be gone.

    We are very close to the solution.
    But I understand if you dont have the time to make another change in your code. But thanks for trying anyway :)
     
  15. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    626
    Likes Received:
    585
    Code:
    line.Contains(str)
    str = "rvhit"

    Switch the elements in boxes 1 and 2

    Read as: if line in box 1 is contained in box 2 copy line.box1 to box 3.
     
  16. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,314
    Likes Received:
    8,281
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    ok OP, i see JustUs has a great tool and tries everything to help you, but if you still can't seem to do it for any reason, the following should work:

    1. Copy-paste or open your file with Notepad++
    2. Open the search tool (ctrl+f), go to the 'Mark' tab
    3. Enter your search string into the 'Find what' box: ojsvejts5s5evt|aw4jtj4watja4wf
    Separate the words you're searching for with | (alt+w)
    Check the 'Bookmark lines' and Regular expression' radio buttons!
    4. Hit 'Mark All', close the search window
    The lines which contains your search string will be bookmarked.
    5. From the menu go to Search/Bookmark/Remove Unmarked Lines
    You should be left with the lines which contain any of the words you included in your search string.
     
    Last edited: Feb 13, 2016
  17. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38
    I'm sorry if im if I seem ungrateful, i know this tool is still a good one, but it dosnt solve my problem.
    Sometimes people underestimate/misunderstand the complexity of a task
    Like i said i testet dozens of filter and wordlist tools, they cant do it like i need it.

    I found the notepad++ step by step tutorial in the first 15 minutes when i was starting to research.
    the notepad++ way is kinda what should happen, but its limited to one single search string.

    But i have files with over 10.000 lines i need to check against other files with maybe + 100.000 lines.
    So it would take me alot of time to to do this step again and again and again.
    I cant search 10000x with strg + F one word. That would take forever.

    If you know a way (maybe with regular expressions) to add ALL words of the first file to the Bookmarks,
    then maybe it will work if its possible to check a second file after that against the bookmarks from List1.
    I dont know how to add +1000 lines to the bookmarks to check against a second file manully.

    I dont know everything and maybe I understand things wrong people are offering here to help,
    i apologize for this case, but at the moment it seems that the problem is still not solved. :(
     
  18. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,314
    Likes Received:
    8,281
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    regular expression is what i'm using in the above example :)

    you need to put | (alt+w) between the words and search like that while regular expression is enabled, have you missed this bit from my post?
    the search string will have a character limit, but it is definitely not 1 word or 2 words, but many, so you may need to rinse and repeat this a few times, but not 10.000 times for sure

    let's say you have a list of words you want to use for searching in the other list, call the former list1, the latter list2

    copy-paste or open list1 in Notepad++, where you want to separate the words with | (alt+w)

    from this:
    word1
    word2
    word3

    you want to make this:
    word1|word2|word3

    how to do that?
    it's pretty easy, hit ctrl+h, it will bring up the replace tool
    Find what: \r\n
    Replace with: | (alt+w)
    Make sure that 'Regular expression' is checked!
    Hit 'Replace All'

    then you will have your list of words separated by | (alt+w), word1|word|word3
    open list2 with Notepad++ and follow my instructions from my above post to the last detail, you will need to input this new search string you just generated into the 'Find what' box i highlighted above in step 3

    i made a gif of the process, it uses your example from above, i think this should clear things up

    iNGoDe6.gif
     
    • Thanks Thanks x 1
    Last edited: Feb 13, 2016
  19. NamenloserHeld

    NamenloserHeld BANNED BANNED

    Joined:
    Nov 23, 2015
    Messages:
    76
    Likes Received:
    38

    THANK YOU SO MUCH! Thats what i wanted! Wow i feel really stupid now that i diddnt think of the | seperator.
    (Its was not Alt+W on my keyboard layout)
    Great move that you made a whole "idiot-safe" gif animation for me <3
    Sometime its hard to follow if you dont work much with code and scripts. But this was really helpful!

    A BIG THX AGAIN TO ALL WHO TRIED TO HELP ME IN THIS THREAD! <3

    @ Sheraf, BlackBD, JustUs, HoNeYBiRD

    BHW seems to be full of people who are sharing their knowlede and try to help others, that made my day!
     
    • Thanks Thanks x 1
    Last edited: Feb 13, 2016
  20. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,314
    Likes Received:
    8,281
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    i'm glad it worked :)

    one thing though, i mentioned that the search string will have a character limit, so when you separated the list of words with the | separator, just copy-paste it into the 'Find what' box and see where the string is cut, so you'll know with which word you need to start the following search, if this makes sense
     
    • Thanks Thanks x 1