1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Filter .TLDs/.ccTLDs from a .txt url list?

Discussion in 'Black Hat SEO' started by beolion, Dec 20, 2010.

  1. beolion

    beolion Junior Member

    Joined:
    Aug 10, 2010
    Messages:
    113
    Likes Received:
    9
    I need a tool to filter only the .com and .net or co.uk domains (or the one I specify) from a list of urls in .txt

    Any idea? thanks.
     
    • Thanks Thanks x 1
  2. MaDeuce

    MaDeuce Newbie

    Joined:
    Oct 24, 2008
    Messages:
    45
    Likes Received:
    16
    Location:
    Austin, TX
    Depends upon the file format. If each line ends in the TLD,
    Code:
    egrep '(\.com|\.net|\.co\.uk)$' files*.txt > results.txt 
    would extract the lines and put them in results.txt.

    --Ma
     
    • Thanks Thanks x 1
  3. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    Hello,
    In which tool/program I can use this rule?
    I need similar rule for USStudio-UltraEdit if does exists.
    Thanks in advance
     
  4. Georgebg

    Georgebg Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 2, 2009
    Messages:
    1,681
    Likes Received:
    774
    Home Page:
    I need something simmiliar...

    Basicly I have a list of alot of Auto Approve Domains, and I want to extract the EDU Domains can anyone tell me how to do this ?

    I think it can be done in excell if I find the way to do it I'll post it here :)
     
  5. xpleet

    xpleet Regular Member

    Joined:
    Jan 18, 2010
    Messages:
    377
    Likes Received:
    327
    Location:
    Morocco
    You can do that using Notepad++
     
    • Thanks Thanks x 1
  6. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    Thanks I thought that you mean that.
     
  7. bezopravin

    bezopravin BANNED BANNED

    Joined:
    May 11, 2010
    Messages:
    461
    Likes Received:
    3,471
    Hi MingS, chris456, and George! You can Easily Filter out .edu or .gov or any other extensions using Notepad++

    =======================================================================

    10 Simple Steps to Filter out .edu or TLD Urls

    1. Open your List in Notepad++
    2. Select or Highlight ".edu" from any of those URLs
    3. Press Ctrl+C to Copy ".edu" text to clipboard
    4. Navigate to TextFX Menu --> TextFX Viz --> and Select Hide Lines Without (Clipboard) Text
    5. Now Notepad++ will list Text Lines Or Url's only with ".edu" text in it
    6. Press Ctrl+A to Select All
    7. Goto TextFX Menu -->TextFX Viz --> and Select Delete Invisible Selection
    8. Now we've removed all those url's with extensions other than .edu
    9. Press Ctrl+A to Select All
    10. Goto TextFX Menu --> TextFX Edit --> and Select Delete Blank Lines
    Thats it! :)

    Hope you enjoyed this Tip...

    =======================================================================

    Note : Its recommended to make a backup of your list file before proceeding with this instructions as in any case if you pressed Ctrl+S accidentally during this process, Notepad++ will overwrite your text file and you know what happens next! ;)

    Download Link for Notepad++:

    Code:
    http://sourceforge.net/projects/notepad-plus/files/latest


    Guide Shot :

    [​IMG]
     
    • Thanks Thanks x 3
  8. beolion

    beolion Junior Member

    Joined:
    Aug 10, 2010
    Messages:
    113
    Likes Received:
    9
    You shoul create a blog with your tips. :)
     
  9. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    Very nice post , I like "tutorial" like type of posts
     
  10. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    Hi Bezopravin ,
    Do you happily know how in Notepad++ or in UltraEdit/UEStudio to batch delete slash "/" from the end of URLs?
    if I grab several thousands Urls , sometimes I get the same Url but one of them has slash at the end so it doesn't mark it like a duplicate and doesn't delete it so I end up with many duplicates .

    Hope you can help , Happy Christmas -:)
     
  11. bezopravin

    bezopravin BANNED BANNED

    Joined:
    May 11, 2010
    Messages:
    461
    Likes Received:
    3,471
    Oh Yeah, Here We Go...

    ================================================== =====================

    Simple way to Strip out Forward Slash (/) at end of URL's

    1. Open your List in Notepad++
    2. Press Ctrl+H (Find and Replace Function)
    3. Select Extended Radio Button Under Search Mode(Shortcut : Alt+X )
    4. Enter /\r in Find What Box
    5. Leave Blank Replace with Box
    6. Press Replace All
    Thats it! :)

    Hope you enjoyed this Tip...

    ================================================== =====================

    [​IMG]

    Merry Christmas! :)
     
    • Thanks Thanks x 1
  12. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    Thank you very very much , you are absolutely the best -:) Merry Christmas to you !!! , will look if I can give you a "rep" , but I think that I like newbie can't do nothing here , so at least again "thanks" sent.
     
  13. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    Thank you Huzizi I need the tools like this these days , I have already found it before in your original post , so I will examine it thoroughly tomorrow -:)
     
  14. chris456

    chris456 Regular Member

    Joined:
    May 17, 2010
    Messages:
    281
    Likes Received:
    567
    I have to respond immediately , just tried that , excellent tool , I need it a lot , thank you again !!
     
    • Thanks Thanks x 1