1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Simple One Liner To Extract URLs From Large Text File [GET]

Discussion in 'Black Hat SEO Tools' started by xrfanatic, Feb 24, 2016.

  1. xrfanatic

    xrfanatic Jr. VIP Jr. VIP

    Joined:
    Aug 28, 2010
    Messages:
    406
    Likes Received:
    174
    Location:
    http://bit.ly/slb64
    Home Page:
    Hi BHW,

    I thought about sharing this here since lots of us work with large files and urls.
    So basicly if you have a file which contains urls within a text and you would like to extract the urls you can apply this script from command line (Windows):

    Code:
     cat INPUT.txt | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | uniq >>OUTPUT.txt 
    This works on Windows, but you have to have installed cat,grep,sort and uniq which all come with free CoreUtils package available for Windows (Gnuwin32).
    This is pretty much robust solution for url extraction, tested on i7-16 GB RAM, extracts urls from 4 gb text file in around 1 minute.