1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[Urgent] Does Scrapebox have this capability?

Discussion in 'Black Hat SEO Tools' started by keith88, Feb 10, 2014.

  1. keith88

    keith88 Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    287
    Likes Received:
    23
    Occupation:
    Internet Marketer
    Location:
    Home
    hey,

    I'm scraping lists but I would like to compare my new list with the old 1.

    For instance say I scraped 1 million urls and then a bit later I scraped 2 million.

    I wanted to compare the first list with the old and ONLY take the new targets.

    I was told to do this ....
    load your newly scraped list into SB, then use "select the url list to compare" if you want to compare lists by url, or "select the url list to compare on domain level" if you want to compare by domain."


    I'm not sure where this option is....

     
  2. kinlee

    kinlee Supreme Member

    Joined:
    Sep 10, 2012
    Messages:
    1,250
    Likes Received:
    2,325
    Occupation:
    Full Time Foodie
    Location:
    Dreamland
    You can load both lists into scrapebox (or GVIM since you lists are huge) and remove the duplicate URLs.
     
  3. JustUs

    JustUs Power Member

    Joined:
    May 6, 2012
    Messages:
    609
    Likes Received:
    450
    This is a few step process, but I think it is what you want.

    1. Load each file into Notepad ++ separately, sort the list and remove duplicates (TextFX Tools add in required), save each file separately.
    2. Load the first file in Code Compare from Dev. Art, then select the second file to compare with.

    Code compare will show in blocks the differences and similarities between the two files.
     
  4. alaltaierii

    alaltaierii Supreme Member

    Joined:
    Jun 11, 2010
    Messages:
    1,408
    Likes Received:
    349
    I've created a tool for this purpose some time ago. Here you go.


    Enter your first list, your list that you want to compare with and click "Start Filter". It's a fast tool and you will get your results in few seconds. For example I tested on my personal computer (Core 2 Duo T6600, Windows 7) a list of over 3 million urls (173 MB) compared on domain level with a list of almost 450k urls(12 MB).

    The output text document was generated in about 9 seconds. So, is very fast !

    Just make sure the domains from "old list" are trimmed to root. You can do this with scrapebox. ;)