1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to subtract list of urls in file from another list?

Discussion in 'Black Hat SEO' started by bokeboke, Jan 5, 2015.

  1. bokeboke

    bokeboke Junior Member

    Joined:
    Mar 21, 2010
    Messages:
    132
    Likes Received:
    31
    Using scrapebox I have 2 files that have millions of urls in them. I want to subtract one list from the other. The Dup Remove Addon just removes duplicates but still leaves the url. I want to completely remove the urls that are in both lists. Any idea how to do this? Loopline? Anyone?
     
    Last edited: Jan 5, 2015
  2. magicman2003

    magicman2003 Registered Member

    Joined:
    Oct 17, 2014
    Messages:
    84
    Likes Received:
    14
    lol...Loopline is not going to respond anyways :) check whether you can do it using send safe list manager.. I used it to clean/subtract email address from multiple lists...Not sure whether it will work in your case(URLs).
     
    Last edited: Jan 5, 2015
  3. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,724
    Likes Received:
    1,993
    Gender:
    Male
    Home Page:
    If you have list A and list B and you want to remove all the urls from list A that appear in list B then you load list A in the urls harvested grid in Scrapebox. Then select import and compare url list and then select list B. That will remove all urls that are in list B from list A. If you want to do it on a domain level instead of url level, then do the same thing but select import and compare on domain level.

    Why won't I respond exactly?
     
    • Thanks Thanks x 4
  4. bokeboke

    bokeboke Junior Member

    Joined:
    Mar 21, 2010
    Messages:
    132
    Likes Received:
    31
    Thanks for the replies , the problem is that its well over 1,000,000 urls so it won't load in the harvested grid. Is there any other way to do this without splitting the files and doing this with each and every split file?
     
  5. Sweetfunny

    Sweetfunny Jr. VIP Jr. VIP

    Joined:
    Jul 13, 2008
    Messages:
    1,785
    Likes Received:
    5,067
    Location:
    ScrapeBox v2.0
    Home Page:
    Yes ScrapeBox v2 http://www.scrapebox.com/v2-beta
     
    • Thanks Thanks x 2
  6. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    6,952
    Likes Received:
    7,982
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
  7. magicman2003

    magicman2003 Registered Member

    Joined:
    Oct 17, 2014
    Messages:
    84
    Likes Received:
    14
    If you are going to load millions of URL's in notepad++ it cannot handle it.
     
  8. bokeboke

    bokeboke Junior Member

    Joined:
    Mar 21, 2010
    Messages:
    132
    Likes Received:
    31
    Wow. After trying to do it with php code, notepad++, DiffMerg Tool, the new scrapebox did it super easily. Thumbs up for the new scrapebox!
     
    • Thanks Thanks x 1
  9. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    6,952
    Likes Received:
    7,982
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    that is not quite true, it can easily handle it, if you have an appropriate config, but if you have a weak config, it won't handle it, but it's not because of the program
    you can always use vim on a weaker machine to work with huge lists
     
    • Thanks Thanks x 1
  10. Repulsor

    Repulsor Power Member

    Joined:
    Jun 11, 2013
    Messages:
    766
    Likes Received:
    275
    Location:
    PHP Scripting ;)
    You could do this in PHP really quickly if you know what you are doing.

    I shared this for another person who asked for my help. He wanted me to share it publicly so I posted it here.

    http://www.blackhatworld.com/blackh...are-two-files-find-difference-php-script.html

    Do it the usual way, and it may take a long while. But doing this making use of the key, it wont take over a minute.
     
  11. Sweetfunny

    Sweetfunny Jr. VIP Jr. VIP

    Joined:
    Jul 13, 2008
    Messages:
    1,785
    Likes Received:
    5,067
    Location:
    ScrapeBox v2.0
    Home Page:
    Awesome good to hear it worked, ScrapeBox v2 is faster and the 64 bit version means huge files are no problem. :)
     
  12. magicman2003

    magicman2003 Registered Member

    Joined:
    Oct 17, 2014
    Messages:
    84
    Likes Received:
    14
    oh that's news to me..! My bad, never knew that...!
     
  13. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    6,952
    Likes Received:
    7,982
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    well, you were partially true :), i just tried to point out the real reason, that notepad++ eats a lot of ram, if you work with very huge files, but it is optimized much more better than the built-in win notepad for instance
    just test it with bigger files and monitor the ram and cpu usage, notepad starts to hang/crash a lot sooner than notepad++ and that happens when your ram/cpu usage reaches its maximum and if the file is really big, it won't recover, just hang/crash, but that's not because any of the notepads would be an entirely bad tool, but because the limit of resources they're given to use
    with really big files vim or i guess SB is always an alternative, if you're low on resources
     
    • Thanks Thanks x 1
  14. macdonjo3

    macdonjo3 Jr. VIP Jr. VIP

    Joined:
    Nov 8, 2009
    Messages:
    5,614
    Likes Received:
    4,362
    Location:
    Toronto
    Home Page:
    Why would you use a server-side language for something as simple as this?