1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Remove lines in a text/csv file based on another text file

Discussion in 'Black Hat SEO' started by xovian, Sep 15, 2012.

  1. xovian

    xovian Newbie

    Joined:
    Jan 15, 2012
    Messages:
    5
    Likes Received:
    0
    Problem Solved, Please close and remove thread.
     
    Last edited: Sep 16, 2012
  2. godmonkee

    godmonkee Regular Member

    Joined:
    Jan 12, 2009
    Messages:
    396
    Likes Received:
    766
    Occupation:
    IM
    Location:
    Gallifrey
    I could do that for you, let me know what you are offering.
     
  3. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,372
    Likes Received:
    1,799
    Gender:
    Male
    Home Page:
    "IF" file 1 that has 5 million - if those lines are urls, scrapebox will do it, assuming you have scrapebox or buy it for the $57.

    If the 5 million rows are just random lines of data, you can use my scrapebox classroom url analyzer tool. Its free. It would not load in 5 million lines at once, but you could break that up into like 500K chunks and try it, depends too on your available resources.

    I had this up on a nice easy to use site, but the site has a DB error at the moment and haven't yet had time to fix it.


    Vid tuts

    http://www.youtube.com/watch?v=WT3fIHNd-48&feature=plcp

    http://www.youtube.com/watch?v=JTUjmgZa5UY&feature=plcp

    The first vid explains the concept, the 2nd vid explains the editing of filters. You would just have to make a filter that contained your file 2 with your couple of thousand keywords. It would need to be in RegEx format. But then you would wind up with the results you wanted. Mind you its going to be SLOW with a few thousand keywords in there, and you might need to go in chunks lower then 500K. But its a line by line filter, so it will do what you want.

    Tool Download:
    http://www.scrapestuff.com/tools/scrapebox-classroom-url-analyzer.zip

    Its over 32MB so VT won't scan it.



    As for splitting it, you can use the free scrapebox dupe remove addon, the file splitter, merger and duplicate "url" removal are all line by line argement based, so they don't work with just urls they work based on any data in a given line.

    http://www.scrapebox.com/free-dupe-remove


    All that said, if godmonkee can make something for you, thats probably going to be more robust. My tool was written in python and it doesn't seem to very robustly handle large data sets. Something in C# or Delphi or something else would be more ideal. But then Im no expert on programing either, so whatever the dev things is prob best.
     
    • Thanks Thanks x 1
    Last edited: Sep 15, 2012
  4. xovian

    xovian Newbie

    Joined:
    Jan 15, 2012
    Messages:
    5
    Likes Received:
    0
    Problem Solved. Please close thread.
     
    Last edited: Sep 16, 2012