Yahoo Pipe Tutorial- Convert RSS summary to RSS full feed

sean815

Junior Member
Joined
Nov 19, 2009
Messages
117
Reaction score
179
I noticed there are many people who are clueless about Yahoo Pipes or can't figure it out. First off, let me explain what Pipes is. "Pipes is a powerful composition tool to aggregate, manipulate, and mash-up content from around the web." Some people will use Pipes to takes several RSS feeds and combine them into 1 feed for use with their website. Others might use Pipes to take a RSS feed that only displays a summary and convert the feed to display the entire content of the article or news story. Or you can use both techniques to aggregate or combine multiple summary feeds into a "content super feed" (I'm trademarking that LOL). For this tutorial, I will explain the second technique.

Step1
Sign up if you haven't already: http://pipes.yahoo.com

Step 2
Click the "Create a pipe" button on the main page.

Step 3
On the left menu, drag the "Fetch Feed" module into the grid. At this point you need to add your RSS feed that you want to convert into the "URL" text-box of this module. Use the XML version of your feed. For this example, I will use a health feed from the Washington Post:
Code:
feed://feeds.washingtonpost.com/wp-dyn/rss/health/index_xml

The purpose of this module is to tell Pipes what feed we are going to be working with. Simple!

Step 4
Drag the "Loop" module into the grid. The "Loop" module is located in the Operators submenu on the left menu. Click the little arrow to the left of "Operators" to display the submenu. After you have done this, you will notice there is another grid located inside of this "Loop" module. We will drag another module here in the next step.

The purpose of this module is to loop through each RSS item from the feed we specified in the previous module. For example, our feed has 100 stories in it. This module is going to loop through each story, 1 at a time, and do what we tell it to do.

Step 5
Drag the "Fetch Page" module into the "Loop" grid from the previous step. You should see a red box outlining the grid of the "Loop" module when you are hovering correctly. Now select the first dropdown next to "URL" and select item.link or type that in exactly.

The purpose of this module will basically look at the source page with in the RSS feed and strip out the full story. Since we threw this module into a "Loop" module, this will be looped for each RSS item(story) and grab the full story from the source page.

Step 6
Drag the "Regex" module onto the main grid. This module is located in the "Operators" submenu.

The purpose of this module is to manipulate the story, link, or even title. This module is only optional. Some examples I've used are to strip all links from a story. Or to remove the season and episode number from the title on a Hulu feed. Or to change every instance of the word Blackhat and make it output the word Whitehat instead. There are many regex examples out there on the web.

Step 7
Connect these all up. Currently these modules are all separate and there is no data flow from each module to the next. Data comes in the top of the module, filters through the module, then exits the module on the bottom. So click and hold the little circle at the bottom of the first module("Fetch Feed") and drag it to the top circle of the "Loop" module and release the click. You should see a connection or Pipe between the two modules. Now connect the bottom of the "Loop" module to the top of the "Regex" module. Finally, connect the bottom of the "Regex" module to the top of the "Pipe Output"


Now that the design is setup, you should test your connections. If you click on the "Pipe Output" module on the grid, it should turn orange. And at the bottom of the webpage, it should be generating the feed. After completion, you should see a list of your RSS stories in the bottom pane. If not, or there is an error, click the Refresh button in the bottom panes a few times. If you do not see any items then you either forgot to put the URL to your feed from Step 3 or you messed up your connections in Step 7. Try again or try another feed to test. Once you see your items in the bottom pane, you may continue.

Step 8
We need to find markers or characteristics on the source pages to pull out the full story. Open up your original RSS feed in a new browser or tab. Click on the first story so we are now on the source website reading the original story. Now on your browser, you need to view the source of the webpage. We need to find the story in this source. We also need to find a unique marker just before and just after the story. Here's a snippet of an article from the feed:
Code:
<!-- End New Comments Box: Common -->
 
<div class="sidebarhack"><b></b></div>
<div class="sidebar">
<div class="seo-header"><div style="float:left;padding-left:7px;">Who's Blogging</div><div style="float:right;padding-right:5px;"><a href="http://www.sphere.com/" style="padding:0;"><img src="http://media3.washingtonpost.com/wp-srv/images/logo_sphere_powered101x13.gif" border="0" width="101" height="13"/></a></div><div style="clear:both;"></div></div>
<div class="sidebarcontent">
» <a class="iconsphere" title="Related Blogs & Articles" onclick="return Sphere.Widget.search();" href="http://www.sphere.com/search?q=sphereit:http://www.washingtonpost.com/wp-dyn/content/article/2010/01/04/AR2010010402752.html" rel="nofollow">Links to this article</a>
</div>
</div>
</div>
<div id="ad_links_inner" style="display:none"><script type="text/javascript" src="http://media.washingtonpost.com/wp-srv/ad/quigo/article_inner.js"></script></div>

</td></tr></table>
<FONT SIZE="2">
<div id="byline">By <a href="http://projects.washingtonpost.com/staff/articles/rachel+saslow/" title="Send an e-mail to Rachel Saslow">Rachel Saslow</a></div>
Washington Post Staff Writer
<br/>
Tuesday, January 5, 2010
</FONT><P>
</div>
<div id="article_body" style="padding-left:10px;">
<span id="aptureStartContent"></span>
<p>
Scientists may have created a vaccine against cocaine addiction: a series of shots that changes the body's chemistry so that the drug can't enter the brain and provide a high.
</p>
<div id="body_after_content_column">
<p>
The vaccine, called TA-CD, shows promise but could also be dangerous; some of the addicts participating in a study of the vaccine started doing massive amounts of cocaine in hopes of overcoming its effects, according to Thomas R. Kosten, the lead researcher on the study, which was published in the Archives of General Psychiatry in October.
</p>
<p>
"After the vaccine, doing cocaine was a very disappointing experience for them," said Kosten, a professor of psychiatry and neuroscience at Baylor College of Medicine in Houston.
</p>
<p>
Nobody overdosed, but some of them had 10 times more cocaine coursing through their systems than researchers had encountered before, according to Kosten. He said some of the addicts reported to researchers that they had gone broke buying cocaine from multiple drug dealers, hoping to find a variety that would get them high.
</p>
<p>
Of the 115 addicts in the study, 58 were given the vaccine, administered in a series of five shots over 12 weeks, while 57 received placebo injections. Six people dropped out before the end of the study. The researchers recruited the participants from a methadone-treatment program in West Haven, Conn., which made it possible to track them for the full 24 weeks of the study. The patients were addicted to cocaine and heroin; TA-CD is designed to work only on cocaine, including the crack form of the drug.
</p>
<p>
Like disease vaccines, TA-CD stimulates a person's immune system to produce antibodies. Of those who received all five vaccine injections, 38 percent reached antibody levels that were high enough to dull the effects of the drug. The antibodies stayed active for eight to 10 weeks after the last shot.
</p>
<p>
In the high-antibodies group, 53 percent stayed off cocaine more than half the time once they had built up immunity. That compares with 23 percent of those who produced fewer antibodies. The researchers monitored cocaine use through regular urinalysis.
</p>
<p>
"In this study, immunization did not achieve complete abstinence from cocaine use," Kosten said. "Previous research has shown, however, that a reduction in use is associated with a significant improvement in cocaine abusers' social functioning and thus is therapeutically meaningful."
</p>
<p>
About a quarter of those who received the vaccine did not make sufficient antibodies at all; Kosten isn't sure why.
</p>
<p>
"That's the million-dollar question," said Margaret Haney, a professor of clinical neuroscience at Columbia University Medical Center, who is also researching the cocaine vaccine though she was not involved in Kosten's study.
</p>
<p>
In October, the journal Biological Psychiatry published online an article by Haney that also tested the effects of TA-CD.
</p>
<p>
Through newspaper ads, Haney had recruited 15 cocaine-dependent men to participate in her study. (Only 10 stayed to the end.)
</p>
</div>
<span id="aptureEndContent"></span>

In the beginning of the story, you should see the text "<span id="aptureStartContent"></span>". This is unique to the page meaning there is only 1 instance of it on the page source AND it is on every news story on this feed. This will be our beginning marker. Hey look at the end, Washington Post is handing us their shit on a silver platter "<span id="aptureEndContent"></span>". This will be our end marker. Now back to Yahoo Pipes.

Step 9
Within the "Fetch Page" module, you will see an area that says "Cut content from:" and this first box will be the beginning marker (<span id="aptureStartContent"></span>) and the box to the right of that will be your end marker(<span id="aptureEndContent"></span>)

Step 10
Within the "Fetch Page" module, ensure that "assign" is selected and NOT "emit". Ensure the dropdown says "first" and NOT "all". To the right of that, change the dropdown for "results to" to item.description. This is where the full content is swapped with the summary on your original RSS feed.

Step 11
Almost there:) This is an optional step. If you are happy with your out put then skip this step. But you MAY want to strip links out of your story that may be inserted such as adds or reference links. You don't want these on your blog. Do you? Within the "Regex" module, add a rule by clicking the plus sign in the module. Select item.description.content in the first box. This is the item that we are editing. Paste into the "replace" box the following
Code:
<[/\]?[a]\s+[^>]*>
Don't ask how to read regex because thats a whole tutorial on its own.

Step 12
Save your Pipe by clicking the Save botton at the top right of your window. And name it.

Step 13
Lets get the NEW and IMPROVED feed url. Click "Run Pipe..." at the top of the page. Now click the "Get as RSS" link and you should see your new RSS feed. Copy that url into your favorite autoblogging plugin and you will now be ripping full news story's instead of excerpts :)


Feel free to ask questions and GOOD LUCK !!! If you like this then don't forget the Thanks :p
 
Hey sean very detailed tutorial thanks... I tried everything like you said and I keep getting this Preview failed. I'm very familiar with the basics of pipes.

I followed your instructions to a " T " but I keep getting "preview failed". WTF!?!

Very detailed post though I appreciate it. Another question, in order to maximize my Pipes knowledge what language is this? REGEX, C# ect. Thanks again
 
Pipes are useful. BTW, if you plan to access a pipe too frequently from 1 ip (eg content scraping for autoblogs every x mins) you will be banned with error 600 or error 999 for several hours in a row. Random proxies are a must
 
Thanks for this Tutorial. I'll be tweaking with this when i get time. I know this is a cool added way to implement an autoblogging site.
 
Well constructed tutorials.Bits and pieces from what i learn on my own basically can be well documented in all of ur point,What im wondering is how to get full feed from any source feed which have different html template.You cant define where should u start to cut the content
 
Hey sean very detailed tutorial thanks... I tried everything like you said and I keep getting this Preview failed. I'm very familiar with the basics of pipes.

I followed your instructions to a " T " but I keep getting "preview failed". WTF!?!

Very detailed post though I appreciate it. Another question, in order to maximize my Pipes knowledge what language is this? REGEX, C# ect. Thanks again

PM me the address to your pipe and I will look at it and report back.
 
Heres an example of my yahoo pipe for this example.
Code:
http://pipes.yahoo.com/pipes/pipe.edit?_id=d69ccca47873d5caf25c952a1f36e1ba

Take a look and make sure yours looks the same.
 
sean815 - Awesome man. I've had many mixed results with YP. Alot of feeds are different and need to be tailored. This tutorial helps out immensely. Thanks for helping us all out.

Note to all - There is no magic pipe that works for every rss, as I said alot of feeds need to be tailored.
 
hi sean815. I've been following your instructions. as you've mentioned that some markers or characteristics differ from every feed. In your example you used the <sp9n id="aptureStartContent"></sp9n>. Could you give post the most common characteristics here?

As far as i've tested this tutorial with other feeds, i don't see in the source this marker. And it didn't even work with divs.

It will be a good help for everyone here. Thanks and Good tainted!
 
I think maybe.....just guessing here to look for something common (in each story) above the H1 Tag and at the bottom of the article....each site is gonna try and stuff different things into each article (such as links, ads ect)
 
Query: Is there any way to extract the images from the source RSS feed.

I have followed this tutorial and got the results(excluding images).
Every site uses its own way to represent the data in background. So there is some work to be done there. Great tutorial ...BHW rocks
 
hi sean815. I've been following your instructions. as you've mentioned that some markers or characteristics differ from every feed. In your example you used the <sp9n id="aptureStartContent"></sp9n>. Could you give post the most common characteristics here?

As far as i've tested this tutorial with other feeds, i don't see in the source this marker. And it didn't even work with divs.

It will be a good help for everyone here. Thanks and Good tainted!
macpaulos is completely correct. That Span tag was an example for this particular feed(Washington Post Health). For your feed, you need to locate a unique identifier. Id say that 60 - 70% of the feeds should have one.

Post your feed url and I can locate one for you if it exists.
 
Thanks for the great tutorial! I'm working on twisting it a little bit, but I'm having trouble getting the "assign first results" to work.
When I do this, I can see the content I'm going for (along with a few lines I'll need to regex out) but when I view the feed, it's just the title of the article and it links back to the original posting of it. What should I do?

Thanks!
 
nice very detailed tutorial, I have had some success in the past with pipes. If you haven't used them before I would recommend giving them a try. following this guide should get you where you to go with pipes.

cheers
 
May be too many people test this pipes these days,here are the result:

This Pipe ran successfully but encountered some problems:
warning Can't fetch pages that robots.txt disallow

But this is a good tutorial !thanks again.
 
Thank you, this is by far the best information I've seen on the YA in BHW.
 
i'm using autoblogged, every time i want to take feed from yahoo pipe rss don't know why the article was not posted
 
Killer post, and thanks a bunch for sharing this. I'm going to print out the steps until I get more used to doing it, but the amount of content this has potential for is off the charts.
 
macpaulos is completely correct. That Span tag was an example for this particular feed(Washington Post Health). For your feed, you need to locate a unique identifier. Id say that 60 - 70% of the feeds should have one.

Post your feed url and I can locate one for you if it exists.
Thanks for the prompt reply. here's an example feed i'm trying to fetch
HTML:
hxxp://www.goarticles.com/feeds/Insurance/popular.rss

and here's the full text

HTML:
hxxp://www.goarticles.com/cgi-bin/showa.cgi?C=2418911

Hope you could help me identify the unique characteristic of this type of feed. Thanks in advance.
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock