Discussion in 'White Hat SEO' started by undeterminederror, Oct 10, 2008.
...something interesting: hxxp://labs.echoditto.com/fulltextrss
what you think about?
Well, I like it. More content from rss feeds to scrape and rewrite.
I'd like to see the code for it. Bringing in full text rss content would be better for rewriting it on the fly and spitting it out to a program like RSS2Blog for posting. I'm sure that is one reason he's not open sourcing the code.
how would you rewrite it on the fly? i need more ideas. i may outsource this...
What I did was grab a copy of RSS Magician somewhere here and started with those. Granted, the RSS Magician script doesn't work for scraping article directories anymore as is, but the scripts are not encoded so changing them is possible to work again if you know a bit of php. The rss scraping still works, but doesn't pull fulltext rss unless the feed itself is full text. I think the reason the rss feed parser still works is there is a standard used for creating feeds. Webmasters of articles directories are not all using a standard structure and it can change making the scraping obselete.
After getting the script to scrape the article directories again, it will pull whole articles from the sites and rewrite them based on a dictionary file you upload to your server to use as well. The good thing about the RSS Magician script is it not being encoded so your able to add new sources to it to scrape. Just need to know a bit of php or at least be able to adapt the examples already in there to work with new article directories.
Then I'm using the rewritten content from RSS Magician to post to blogs via RSS2Blog.
What I think would be necessary to get full text rss from feeds that are not full text rss is the ability of the script to get the feed first. Then find the urls of the rss entry and go to that page and scrape it. Problem is every page referenced in a rss feed is not of the same page structure making scraping harder to adapt to different html structures.
Another thing that is good about the RSS Magician other than it being readily changed (unencoded), is it will append random text to your title/description in the rss feed from a text file you upload to your server with multiple entries separated by a token (!split!) making it ideal for using spun content. I'm sure it wouldn't be hard to change or add functionality to the script to prepend text to your rss title/description instead of appending it to the end.
So a good starting point for creating a custom php script that could scrape and rewrite rss on the fly would be the RSS Magician script found on the forum. Another feature of the script is it caches the feeds for a certain amount of time, so lets say you have it creating a feed by pulling 10 articles from a article site. It's then cached. Now lets say you use RSS2Blog to pull at random only 1 of those 10 entries in the feed. By the time the feed cache expires, you probably won't pull the same entry alot from the feed; and even if you do, if you are using RSS2Blog you can be adding other content with the rss feed content so it's not completely duplicate.
ok, thanks for this tips. do you know how do i pull feeds in free hosted blogs, that won't let me install plugins ? i know its a stupid question, but will press the "thanks" button
One way would be to use php if the blog is run via php to pull it in and display it. Example code can be found here for wordpress:
The only problem I see here is you might have to include it within the blog template and may not be different enough from page to page.
The other way would be to post the content of a feed via a program that does this like RSS2Blog, Content Solution, and programs like that.
If I find a different way, I'll post it here; or if anyone else wants to chime in, please do so.
It doesn't work for many feeds.
but if you want full text rss, I know two :
http://treatmentnews.blogspot.com/feeds/posts/default ( Cancer Treatment News, genarally unique )
http://finance.varolmak.com/feeds/posts/default ( Daily Finance and economy news, genarally unique )
ok, i'm working at a service to provide full text feeds right now.
but: is there any plugin to auto comment my own articles from feeds i provide?
Separate names with a comma.