1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Hey, questions about grabbing rss feeds :)

Discussion in 'Blogging' started by big shoes 8, Jun 25, 2009.

  1. big shoes 8

    big shoes 8 Registered Member

    Joined:
    Apr 18, 2009
    Messages:
    68
    Likes Received:
    9
    Home Page:
    Hey, well i have a site that's using a wordpress blog. I want to grab other peoples posts through their rss feeds like an auto blog, could anyone suggest to me a plugin? I have one atm but i want to know what you guys think.

    I'm also wondering, one of the websites I'm trying to grab posts from only displays part of his posts in his rss feed. Is there any way to be able to get the full post?

    Thanks
     
  2. mazgalici

    mazgalici Supreme Member

    Joined:
    Jan 2, 2009
    Messages:
    1,489
    Likes Received:
    881
    Home Page:
  3. Rick4691

    Rick4691 Registered Member Premium Member

    Joined:
    Feb 19, 2008
    Messages:
    70
    Likes Received:
    30
    Occupation:
    Programmer
    Location:
    Oceania
    I personally like WP-O-Matic.

    Do a search here for "autoblog" and you will find a plethora of information.

    As for partial RSS feeds, you can write a bot that grabs the HTML rather than the RSS, and parses it. Then you can have said bot run periodically under cron and feed the parsed results to WP.

    Or ... there is someone on one of the Autoblog threads who is selling a plug-in that does the same thing. The name of the plugin might be "autoblogger", but you will need to confirm that.
     
  4. big shoes 8

    big shoes 8 Registered Member

    Joined:
    Apr 18, 2009
    Messages:
    68
    Likes Received:
    9
    Home Page:
    ok, thanks a lot to both of you :)
     
  5. sandycat

    sandycat Junior Member

    Joined:
    May 20, 2009
    Messages:
    189
    Likes Received:
    66
    that would be a sickly swell script...
     
  6. big shoes 8

    big shoes 8 Registered Member

    Joined:
    Apr 18, 2009
    Messages:
    68
    Likes Received:
    9
    Home Page:
    Would it? Do you think i could make the money back?
     
  7. RifqiAF

    RifqiAF Junior Member

    Joined:
    Feb 21, 2009
    Messages:
    158
    Likes Received:
    29
    it would be really cool if plugin like that really exist...I'm using autoblogged and feedwordpress but as far as i know it can't scrape full content...
     
  8. iglow

    iglow Elite Member

    Joined:
    Feb 20, 2009
    Messages:
    2,081
    Likes Received:
    856
    Home Page:
    feedwordpress will be easisest for you for start
     
  9. big shoes 8

    big shoes 8 Registered Member

    Joined:
    Apr 18, 2009
    Messages:
    68
    Likes Received:
    9
    Home Page:
    ok, well i have feedwordpress and blogslammer, so as soon as the guy who sold me blogslammer answers a few of my questions ill be good to go and start my first autoblog!
     
  10. Rick4691

    Rick4691 Registered Member Premium Member

    Joined:
    Feb 19, 2008
    Messages:
    70
    Likes Received:
    30
    Occupation:
    Programmer
    Location:
    Oceania
    Here's what I have so far:

    Code:
    #!/bin/sh
    # This is a pretty evil script that's meant for 
    # illustrative purposes only.
    # Note that the input feed is hard-coded, and the 
    # parsing conditions will 
    # only work for that feed in particular.
    # 
    # No proxies are being used, but curl and wget do 
    # have proxy capabilities.
    # If a variation of this script is being heavily used, 
    # you should probably 
    # take advantage of those capabilities so as to a
    # void detection --- the 
    # script could set off alarm bells to the 
    # attentive webmaster.
    #
    
    # Get the URLs from the feed and put them in a 
    # file called urls.lst --- 
    # these are the URLs of the full articles.
    curl http://feeds2.feedburner.com/eldis-manuals?format=html\
     2>/dev/null \
    grep 'a href' | cut -f 2 -d \" | sort -u | \
    grep feedproxy > urls.lst
    
    # Make sure we only process new URLs --- already 
    # processed URLs should 
    # be listed in master_urls.lst; new ones will go to
    # new_urls.lst
    comm -13 master_urls.lst urls.lst > new_urls.lst
    
    ##############################################################
    ##############################################################
    ##############################################################
    
    # Use wget to scrape the content from each of the new 
    # urls from the feed
    #
    # The wget -i option causes wget to iterate through all 
    # the URLs listed in the
    # given file, here new_urls.lst
    #
    # The --output-document=scraped_contents.txt option 
    # sends all the contents of 
    # each of the URLs to a file called scraped_contents.txt 
    # --- you can 
    # call it something different if you want ...
    #
    # The --random-wait option causes wget to wait a 
    # randomly chosen period of time
    # before trying to get its next file. This may help 
    # keep your 
    # scraping undetected.
    wget -i new_urls.lst --output-document=scraped_contents.txt\
     --random-wait
    
    ##############################################################
    ##############################################################
    ##############################################################
    
    # Get the Title of each new posting
    # TODO - probably with a call to a Perl script that uses 
    #        HTML::TreeBuilder
    
    # Get the full text of each new posting.
    # TODO - probably with a call to a Perl script that uses 
    #        HTML::TreeBuilder
    
    # Insert into your WP database
    # TODO - quickie MySQL connection, insert appropriate 
    #        data to appropriate WP table(s), MySQL disconnect
    
    
    ##############################################################
    ##############################################################
    ##############################################################
    # Remember to append the new_urls.lst to the 
    # master_urls.lst for next time
    cat new_urls.lst >> master_urls.lst
    
    # For the comm command to work, the contents of both 
    # files involved must be
    # sorted and have no duplicate entries
    sort -u master_urls.lst > temp_urls.lst
    mv temp_urls.lst master_urls.lst
    
    exit 0
    
    I think I went down the wrong track yesterday trying out the HTML::parser. Reading Hack #19 in "Spidering Hacks", it looks like HTML::TreeBuilder would be a better package for this sort of thing. Easier to deal with.

    The unfortunate thing here is that each feed would require some custom coding. But these are things that you could use dependency injection with and put into some sort of feeds.config file --- the feeds shouldn't really be hard-coded like I did here ...

    Also, once this sort of script is done, you just put it in cron and have it run every couple of days or so, and you're on autopilot.

    BTW, that's not my feed. :)
     
    • Thanks Thanks x 1
  11. dmhf123456

    dmhf123456 Registered Member

    Joined:
    Mar 6, 2009
    Messages:
    50
    Likes Received:
    3
    Use either autoblogged or wp-o-matic plugins to get content from feeds. If you want full articles, then look at blogspot feeds - they usually have full article RSS feeds.
     
  12. najiz

    najiz Newbie

    Joined:
    Nov 23, 2008
    Messages:
    21
    Likes Received:
    1
    i always get full article from partial rss feed using yahoo pipes and combine the output from pipes with feedburner. it works either with wpomatc & feedwp.
    I found the pipes in this forum ,you may search it..and edit the source of rss based on what i need.
     
  13. adbox

    adbox Power Member

    Joined:
    May 1, 2009
    Messages:
    658
    Likes Received:
    107
    Home Page:
    same as najiz,
    use yahoo pipes to customize your rss. there is a learning curve but its not too difficult.
     
  14. tshin810

    tshin810 Junior Member

    Joined:
    Jun 16, 2009
    Messages:
    163
    Likes Received:
    668
    I am using autoblogged, it uses too much CPU (around 60~80% of CPU), and I got warning from hosting engineer =P
    Anyway, autoblogged can grab full content by changing %excerpt% to %content%
     
  15. adbox

    adbox Power Member

    Joined:
    May 1, 2009
    Messages:
    658
    Likes Received:
    107
    Home Page:

    what do you mean?
     
  16. tshin810

    tshin810 Junior Member

    Joined:
    Jun 16, 2009
    Messages:
    163
    Likes Received:
    668
    By default, autoblogged is feeding excerpt only. If you want to get full content, you have to change the code inside feed settings -> advance settings -> post templates.
    Find the %excerpt% , and change it to %content%
     
  17. stinky_boy

    stinky_boy Junior Member

    Joined:
    Nov 21, 2008
    Messages:
    128
    Likes Received:
    35
    Occupation:
    head games
    Location:
    money street
    don't use caffeinated content unless you want you post to look like someone wrote them on crack. it use to be a great plugin, but not any more.
     
  18. teguh123

    teguh123 BANNED BANNED Premium Member

    Joined:
    Sep 23, 2008
    Messages:
    703
    Likes Received:
    105

    Can we rewrite the posts using yahoo pipe?


    Translate to chinese and retranslate to english sort of thing?
     
  19. insider

    insider Regular Member

    Joined:
    Jul 5, 2009
    Messages:
    344
    Likes Received:
    134
    Location:
    Europe
    yes there is a module for that