1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Link Scraper - Remove all links from posts

Discussion in 'Blogging' started by smashedpumpkins, Mar 27, 2010.

  1. smashedpumpkins

    smashedpumpkins Regular Member

    Joined:
    Mar 3, 2010
    Messages:
    231
    Likes Received:
    84
    I'm looking for a plugin that will remove all links from my posts. I'm using WP-Robot to grab articles, but with the articles comes many affiliate links. I'd rather strip out all links and leave it as text only. (I don't want to mess up the formatting though)

    Is there a plugin that can do this for me?

    Thanks

    EDIT: This may actually make it easier. I think... What would I add here to strip the links? Is it something simple? This is the section where it leaves all formating as it's seen. If I can change it to strip the links that'd be great. Sadly, I just don't know enough.

    Code:
            $xpath = new DOMXPath($dom);
            $paras = $xpath->query("//div[@id='KonaBody']//p"); 
    
            for ($i = 0;  $i < $paras->length; $i++ ) {  //$paras->length
    
                $para = $paras->item($i);
                $paragraph = $para->textContent;
                
                if ($paragraph != '') {
                        if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);}
                
                    $content .= $paragraph . ' ';
                    $content .= "<br/><br/>";
                }
            }
    
     
    Last edited: Mar 27, 2010
  2. Kid Shaleen

    Kid Shaleen Regular Member

    Joined:
    Oct 29, 2009
    Messages:
    250
    Likes Received:
    63
    There's a down-and-dirty brute force method that should work.

    Use a versatile text editor that can run macros on multiple files.

    Take your files.

    Search for the end-of-line character and replace it with a space followed by some string that won't ever be in the original articles, like "ggggg."

    Now your article is one single line.

    Search again for the space character and replace it with the end-of-line character.

    Now your article consists of lines that all contain one word or string.

    Now search for the string "http" or the "@" character (email links) and delete every line with those.

    You've just zapped all links. Now reassemple your article with original formating.

    Search for the end-of-line character and replace it with the space character.

    Your article is back to a single line.

    Search for the earlier "ggggg" string and replace it with the end-of-line character.

    You've now restored your paragraphs.

    Once you've set up the macro you'll find it takes longer to read this than to actually create the stored keystrokes.

    Also, none of these search and replace commands should mess up css or html formating.

    Hope this helps.
     
  3. smashedpumpkins

    smashedpumpkins Regular Member

    Joined:
    Mar 3, 2010
    Messages:
    231
    Likes Received:
    84
    Are you talking about coding my own little macro? I definitely want to automate the entire procedure. I can do a bit in PHP, but it'd take a lot of reading to figure it all our on my own. I was hoping there might be some type of plugin already.
     
  4. smashedpumpkins

    smashedpumpkins Regular Member

    Joined:
    Mar 3, 2010
    Messages:
    231
    Likes Received:
    84
    They're taken from an open and free article database.

    I've tried using the following with no success.
    PHP:
    $content preg_replace('%</?a\b[^>]*>%'''$content);
    I noticed the script has a similar option for other features. I've tried to copy it over, but I can't figure it out. I've used the below code with both paragraph and content, but the links still exist! Any ideas based on this code?

    PHP:
    $paragraph ma_strip_selected_tags($paragraph, array('a','iframe','script'));
    PHP:
    function ma_strip_selected_tags($text$tags = array()) {
        
    $args func_get_args();
        
    $text array_shift($args);
        
    $tags func_num_args() > array_diff($args,array($text))  : (array)$tags;
        foreach (
    $tags as $tag){
            while(
    preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'$tag .'>/iusU'$text$found)){
                
    $text str_replace($found[0],$found[2],$text);
            }
        }
        return 
    preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU'''$text);
    }
     
  5. bizminder

    bizminder Registered Member

    Joined:
    Mar 16, 2010
    Messages:
    68
    Likes Received:
    3
    Delete links Wizard is available and it might help you to some extent.