1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Yahoo pipes, regex remove html but leave some tags

Discussion in 'General Programming Chat' started by Wizzing, Nov 17, 2011.

  1. Wizzing

    Wizzing Newbie

    Joined:
    Jan 11, 2011
    Messages:
    16
    Likes Received:
    2
    Hi there. I'm setting up a autoblog and am using yahoo pipes to manipulate my feed. I am using
    Code:
    <[/\]?[^p][a]\s+[^>]*>
    to remove all the html, and this works fine, only I would like to save some html.

    What I want is to remove all html except for the <p></p> tags and the <br /> tags.

    What would also be great is to be able to automaticly clean u certain tags like:
    <p class="rtecenter">, replace with <p>.

    Can any one of you help me with a solution?
     
  2. meatro

    meatro BANNED BANNED

    Joined:
    Nov 21, 2009
    Messages:
    568
    Likes Received:
    997
    A little more details, please. I have no idea what you're trying to do, what you're trying to do it on, etc.

    Post a sample of the code that you're talking about.
     
  3. jazzc

    jazzc Moderator Staff Member Moderator Jr. VIP

    Joined:
    Jan 27, 2009
    Messages:
    2,468
    Likes Received:
    10,143
    • Thanks Thanks x 1
  4. Wizzing

    Wizzing Newbie

    Joined:
    Jan 11, 2011
    Messages:
    16
    Likes Received:
    2
    Code:
    <div class="field field-type-computed field-field-commentary-writer-name"> <div class="field-items"> <div class="field-item odd"> <div class="field-label-inline-first"> By:  Peter Schiff <div class="field field-type-date field-field-commentary-date"> <div class="field-items"> <div class="field-item odd"> <span class="date-display-single">Friday, October 28, 2011</span> <p> Last week, I spent the afternoon visiting the Occupy <span id="lw_1319835445_1">Wall Street</span> demonstrations in <span id="lw_1319835445_2">lower Manhattan</span>. I brought a film crew and a sign that said "I Am The 1%, Let's Talk." The purpose was to understand what was motivating these protesters and try to educate them about what caused the financial crisis. I went down there with the feeling that much of their anger was justified, but broadly misdirected.</p> <p> Indeed, there were plenty of heated discussions. I did little more than ask how much of my earnings I should be allowed to keep. In return, I was called an idiot, a fool, heartless, and selfish. But when we started talking about the issues, it seemed like the protesters fell into two categories: those who generally understood and agreed that <span id="lw_1319835445_3">Washington</span> caused this mess, and those who could only recite Marxist talking points. It was the latter who usually resorted to calling names once I pointed out the hypocrisy of their positions. They might shout, "the banks have taken over the regulatory agencies, so we need more regulations!" Obviously, this is paradoxical. If they're blaming government for causing this problem, why would they suggest more government as the solution? </p> <p class="rtecenter"> <span style="font-size:14px;"><strong><a rel="nofollow" target="_blank" href="XXX"><u>CLICK HERE</u></a> to see Peter go head-to-head with an Occupier!</strong></span></p> <p class="rtecenter"> <a rel="nofollow" target="_blank" href="xxx"><img alt="" style="width:638px;height:389px;"/><br /> </a></p> <p> I think some of the leadership of Occupy Wall Street comes from this kind of radical Marxist background – and perhaps they're smart to intentionally keep quiet about their goals. Because the vast majority of protesters I met did believe in capitalism - they're just tired of being screwed over by <em>crony</em> capitalism. Noted school-choice activist Michael Strong calls it "crapitalism," and that's what it is. It's a rotten deal for everyone, and they know it.</p> <p> The problem is that many of these people are under the mistaken impression that Wall Street banks are to blame for this state of affairs. That's like blaming the dogs for getting into the trashcan. Sure, it's bad behavior, but the ultimate responsibility lies with the authority figures - in this case, Washington. After all, it's not the <span id="lw_1319835445_6">New York</span> metro area that has benefitted the most from this crisis. Rather, the counties around <span id="lw_1319835445_7">DC</span> are now ranking as the wealthiest in the country. And while wealthy New Yorkers have historically made their living providing essential financial services to the global economy, Washington has always made its living one way: at our expense.</p> <p> That's why I have trouble sympathizing with people calling themselves the “99%”, implying they stand in opposition to wealth no matter how it's earned. I own a brokerage firm, but I didn't receive any bailout money. In fact, I have to work twice as hard to compete with bigger financial firms that are propped up by the US government. The least I deserve is the ability to keep what I earn.</p> 
    This is a piece of the source. I want to strip it, so remove the html tags from it except some basic tags like <p> and <br />.
    After Yahoo pipes removes the unwanted tags, wp_robot is going to translate it to another language and I'm gonna republish it (after spelling correcting).
     
  5. Wizzing

    Wizzing Newbie

    Joined:
    Jan 11, 2011
    Messages:
    16
    Likes Received:
    2
    Ok guys got it!

    replace
    Code:
    <p.*?>
    with
    Code:
    <p>
    (g)
    This removes the attribute out of the paragraphe tag.

    .*? means about everything after.

    I've had some problems with google pipes, my changes weren't updated in the output of the feed when I ran it, but I think I have to refresh each module in the debug output, now it works fine.
     
    • Thanks Thanks x 1