1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Editing a big list, how to remove certain URLs?

Discussion in 'BlackHat Lounge' started by dgfalk, Apr 22, 2011.

  1. dgfalk

    dgfalk Power Member

    Joined:
    Apr 26, 2010
    Messages:
    687
    Likes Received:
    94
    I have a big list of URLs that I want to sort out. About 1/4 of the urls have a specific keyword in the title that I want to remove. Is there anyway in Word or another program that I can say any URL that contains this "keyword" remove from the list?
     
  2. bezopravin

    bezopravin BANNED BANNED

    Joined:
    May 11, 2010
    Messages:
    461
    Likes Received:
    3,471
    Just Made a Quick Video on How to do this in Notepad++ (Watch in HD 1080p)

    http://www.youtube.com/watch?v=TrcTcw_yipE

    It will take a whole lot of time if your list size is bigger than 10 Megabyte. Alternatively you can do this easily in scrapebox. If this doesn't works for you, send it to me. I'll Process it within few seconds with my Custom Text Editor! :)
     
    • Thanks Thanks x 3
  3. TapTapper

    TapTapper Junior Member

    Joined:
    Apr 15, 2009
    Messages:
    163
    Likes Received:
    138
    Occupation:
    coder, webstore mangler
    Location:
    US
    Home Page:
    If you have any version of XL since 2003/5 or so you can use AutoFilter (toolbar button looks like a funnel with an equal sign) and say "contains" or "does not contain"
     
  4. dgfalk

    dgfalk Power Member

    Joined:
    Apr 26, 2010
    Messages:
    687
    Likes Received:
    94
    Works perfectly!! Thank you sir, rep given.
     
    • Thanks Thanks x 1
  5. jb2008

    jb2008 Senior Member

    Joined:
    Jul 15, 2010
    Messages:
    1,158
    Likes Received:
    972
    Occupation:
    Scraping, Harvesting in the Corn Fields
    Location:
    On my VPS servers
    I do it in regex with notepad++.

    Go to Search -> Replace , now make sure the Regular Expressions radio button is checked

    If you have a keyword, for example KEYWORD1, type the following into the find box: ^.*KEYWORD1.*$

    (In Regex ^ means beginning of line, $ means end of line, and .* means any old text)

    and in the Replace box, leave it blank

    Click Replace All

    After the lines containing KEYWORD1 are deleted, press CTRL+A to select all, then go to TextFx -> TextFx Edit -> Delete Blank lines

    It's quick and it works, although others may have different methods. I generally use this to clean up my harvested URL lists - even with the best of footprints for some reason I get a shitload of blogspot blogs which need to be eliminated! Peace out!
     
    • Thanks Thanks x 1