Remove lines in a text/csv file based on another text file

xovian · Sep 15, 2012

Problem Solved, Please close and remove thread.

godmonkee · Sep 15, 2012

I could do that for you, let me know what you are offering.

loopline · Sep 15, 2012

"IF" file 1 that has 5 million - if those lines are urls, scrapebox will do it, assuming you have scrapebox or buy it for the $57.

If the 5 million rows are just random lines of data, you can use my scrapebox classroom url analyzer tool. Its free. It would not load in 5 million lines at once, but you could break that up into like 500K chunks and try it, depends too on your available resources.

I had this up on a nice easy to use site, but the site has a DB error at the moment and haven't yet had time to fix it.

Vid tuts

http://www.youtube.com/watch?v=WT3fIHNd-48&feature=plcp

http://www.youtube.com/watch?v=JTUjmgZa5UY&feature=plcp

The first vid explains the concept, the 2nd vid explains the editing of filters. You would just have to make a filter that contained your file 2 with your couple of thousand keywords. It would need to be in RegEx format. But then you would wind up with the results you wanted. Mind you its going to be SLOW with a few thousand keywords in there, and you might need to go in chunks lower then 500K. But its a line by line filter, so it will do what you want.

Tool Download:
http://www.scrapestuff.com/tools/scrapebox-classroom-url-analyzer.zip

Its over 32MB so VT won't scan it.

As for splitting it, you can use the free scrapebox dupe remove addon, the file splitter, merger and duplicate "url" removal are all line by line argement based, so they don't work with just urls they work based on any data in a given line.

http://www.scrapebox.com/free-dupe-remove

All that said, if godmonkee can make something for you, thats probably going to be more robust. My tool was written in python and it doesn't seem to very robustly handle large data sets. Something in C# or Delphi or something else would be more ideal. But then Im no expert on programing either, so whatever the dev things is prob best.

xovian · Sep 15, 2012

Problem Solved. Please close thread.

Remove lines in a text/csv file based on another text file

xovian

Newbie

godmonkee

Regular Member

loopline

Elite Member

xovian

Newbie

Main Menu

Marketplace

Making Money

BlackHat World