1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

I need to extract URLs from a piece of code, but how?

Discussion in 'Link Building' started by xthoms, Nov 21, 2010.

  1. xthoms

    xthoms Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    280
    Likes Received:
    99
    Let's say I have some code like this

    Code:
    <a target="_blank" href="http://link.com">
    abc</a> </p>
    But I of course have a lot more. How can I automatically extract the URLs?
     
  2. xxxzzxxxzz

    xxxzzxxxzz Newbie

    Joined:
    Sep 9, 2010
    Messages:
    16
    Likes Received:
    1
    I hope this will help you

    http ://social.msdn.microsoft.com/Forums/en/sharepointdevelopment/thread/7381457b-2538-46c5-9ff4-a82bf8a75dd9

    http ://bytes.com/topic/php/answers/824906-extract-url-rss-feed-source-code-regex
     
  3. catchers

    catchers Newbie

    Joined:
    Aug 12, 2010
    Messages:
    32
    Likes Received:
    6
    Read the following text on regex, it will cover 99% of the regex you will use

    Code:
    Assuming you're using PHP:
    I suck at regex, but a good trick is to just learn this, it's almost all you ever need anyway:
    Code:
    
    preg_match("/abc(.*?)/","abcdefg",$match); $output = $match[1];
    echo $output;
    //Will return defg
    
    (.*?) will, for the most part, work for everything.
    I just put the variable assignment on the same line.
    
    If you want to be more specific with caps etc add Uism:
    Code:
    
    preg_match("/abc(.*?)/Uism","abcdefg",$match); $output = $match[1];
    echo $output;
    //Will return defg
    
    Also, let's say you only want it to return "de":
    Code:
    
    preg_match("/abc(.*?)fg/Uism","abcdefg",$match); $output = $match[1];
    echo $output;
    //Will return de
    
    If you want to work with preg_match_all instead, I suggest you debug with print_r:
    Code:
    
    preg_match_all("/abc(.*?)fg/Uism","abcdefgabcdefg",$match);
    print_r($match);
    
    
     
  4. movieman32

    movieman32 Regular Member

    Joined:
    Aug 6, 2008
    Messages:
    371
    Likes Received:
    346
    save the code as a text file and open the text file in a spreadsheet. Use find and replace to replace *href="
    leave the replace box blank
    Then do it again with ">*

    That should leave you with a clean list of just the URLs.
     
  5. kuzzi

    kuzzi Junior Member

    Joined:
    Jun 13, 2010
    Messages:
    198
    Likes Received:
    63
    Occupation:
    PROFESSIONAL VIDEO EDTOR/MARKETER OFFLINE SEO
    Location:
    world
  6. xthoms

    xthoms Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    280
    Likes Received:
    99
    Thanks for all the answers but unfortunately none of them can do what i really want.

    I have a big code and I want to return all values that are put in between

    href=" and ">

    I don't know if the regex is doing that, but couldn't figure out how to adapt it to my situation
     
  7. Monrox

    Monrox Power Member

    Joined:
    Apr 9, 2010
    Messages:
    615
    Likes Received:
    579
    Look for 'URL Extractor Ver 1.0' from focal media.
    If it doesn't work, find something else, the internet is full with such free tools. Just make sure they are actual apps, not some virus.

    Alternatively, get someone on the freelancer sites to make you a custom one for $2-3 or something, it's a couple of code lines.
     
  8. sikandar

    sikandar Senior Member

    Joined:
    Mar 15, 2008
    Messages:
    1,136
    Likes Received:
    1,026
    There is a very good product called 'Web Data Extractor'. It can extract not only URLs, but many other things like metadata, phone numbers etc. You can customize it to suit your needs.

    You can download the trial free at

    Code:
    http://www.webextractor.com/download.htm 
     
  9. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,229
    Occupation:
    Retired
    Location:
    Old Peoples Home
    Here is a quick PHP script that seems to do what you are after (I think) - download it from http://shrinkr.info/1tv1mq (my webspace).

    In the grab.php file, just change put the URL you want to grab links from.

    edit: Forgot to say, just run it from the command line - works on my Ubuntu laptop, haven't tested on windows.
     
    Last edited: Nov 21, 2010
  10. haridada

    haridada Senior Member

    Joined:
    Oct 9, 2008
    Messages:
    996
    Likes Received:
    382
    Location:
    Chennai
    thread bookmarked. thanks all for providing some utilities.

    Was looking for something like this too..
     
  11. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,229
    Occupation:
    Retired
    Location:
    Old Peoples Home
  12. haridada

    haridada Senior Member

    Joined:
    Oct 9, 2008
    Messages:
    996
    Likes Received:
    382
    Location:
    Chennai
    goes to a 404 for me. :eek:
     
  13. fatboy

    fatboy Elite Member

    Joined:
    Aug 13, 2008
    Messages:
    1,618
    Likes Received:
    3,229
    Occupation:
    Retired
    Location:
    Old Peoples Home
    Had to take it down as it was being hit a little hard from people using it!!
     
  14. mark0v

    mark0v Junior Member

    Joined:
    May 6, 2010
    Messages:
    114
    Likes Received:
    20
    I haven't tested this but it should work so long as each link is in the exact format you mentioned with one link per line:
    PHP:
    <?php
    $urllist
    =file("yourfile.txt");
    foreach(
    $urllist as $urlline){
     
    $array=explode('"',$urlline);
     echo 
    $array[3]."\n";
    }
    ?>
    then just view source and copy/paste
     
  15. dvdvids

    dvdvids BANNED BANNED

    Joined:
    Apr 5, 2010
    Messages:
    242
    Likes Received:
    188
    Download EmailSpider Gold 10, upload the file to ur server and extract the emails from there...:D
     
  16. xthoms

    xthoms Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    280
    Likes Received:
    99
    I got it all figured out. Thanks for everyone's contribution.
    It's not 1 per line, it's the code from a website, but I got it now :)
     
  17. haridada

    haridada Senior Member

    Joined:
    Oct 9, 2008
    Messages:
    996
    Likes Received:
    382
    Location:
    Chennai
    Will try it and reply. If i have some doubt will post here for sure to get help from you guys..
     
  18. AsadMoeen

    AsadMoeen Newbie

    Joined:
    Nov 9, 2010
    Messages:
    23
    Likes Received:
    0
    If you search Google,

    There are URL extractors or rippers easily available.
     
  19. zelma143

    zelma143 Power Member

    Joined:
    Jun 25, 2010
    Messages:
    572
    Likes Received:
    37
    Occupation:
    PHP programmer,Bot maker,iMacro script maker
    try php with

    PHP:
     preg_match('/href="(.*)"/',$source,$arry);