[Method] Feed43 + IFTTT - The New Age Image Scraping

    Can't really think of the right place for this - I can only closely relate it to Money Making because it pertains to content scraping for profit...

    So, over the past few days I've been tinkering with Feed43, IFTTT and my blog trying to concoct a recipe which would let me essentially create a image scraper that I didn't have to touch and could use as a Fire and Forget type of thing. I figured that with the knowledge that I've now acquired, I thought it would be only right for me to share a little of it with you. In order to get moving on something like this, I'm going to assume that you have all of the following (please forgive crappy formatting - not good enough to post urls yet):

    • A feed43 account
    • A IFTTT account
    • A working wordpress blog (I use this, but technically this could be done for any blogging software)

    A challenge that many bloggers face daily is that there is a LOT of good content already on the internet, and the current processes of sourcing images is laborious and seemingly overly-exhaustive. Scraping images onto your computer, having a program that then uploads them to your blog (or even worse, manual :S) can be the final straw for some people.

    Feed43 is a rss feed creator, we'll be using this to generate a custom feed from a website. Please note that this is the hardest step and getting the right information out of what you're looking at can be a pain. But it's worth it! IFTTT is essentially a customisable Cron Job creator formatting cron jobs into a IFTHATTHENTHIS (IFTTT).

    First things first, head to feed43 and click on "Create your own feed"

    STEP 1:

    In the first input, put the name of the website you're wanting to create your rss feed from, for me, it was a pinboard website. Click "Reload"

    This will give you a HTML output of your website, you'll probably get an error saying that the website was too big or something like that, just ignore it, unless you're pulling from a super big website that has a massive header, you'll still be able to get some rss-esque content from the top part of the page.


    This is the tricky bit simply for syntactical wierdness, basically, you're looking to filter using some macros like {*} {%} to get the information out of the rss. For me, I needed to do the following filter:

    <div class="small_pin_box_user_info">{*}</div>{*}<a href="{%}" {*}>{*}<img src="{%}{*}192/{%}" {*} alt="{%}" {*} />{*}
    I suggest that you take a lot of time here to get the bits out of the html that you're wanting, however, note that you'll need THREE dynamic pieces of data for each "news" element of your RSS; A title, A link and Content. Keep changing your filter till you get it to where you want (You can check on your progress by hitting "Extract" and seeing your results.

    STEP 3:

    Output format!

    Your feed title, link and description aren't really important, there more for indentification.

    Under RSS item properties do this:

    In Item Title put {%N} where N is the number of the data you want as the Title of your items (This number can be seen in the clipped data section for reference)

    Fill out the rest of the form - as a tip, if you're creating a RSS feed of images (Yeah, most of us are...) put the content as:

    <img src"{%N}" >
    Again where N is the source of the image from your clipped data.

    Once you've finished all that, hit Preview, this will save your rss feed and you'll get a link to it below the preview box. That's all there is to Feed43. Now, what to do in IFTTT


    Log into IFTTT, you need to "Activate" your RSS and Wordpress channels, do that through the channels section at the top of the page. With the wordpress, you want to give it your admin user and password for wp-admin login.

    Once you've done that go to Browse at the top of the page. Search for: "Rss feed to wordpress" use the top entry (made by a guy named tavito)

    Click on it and it'll take you to the customization page for that recipe. In here you want to do a few things. Under the Feed URL input, put the link to your feed43 rss feed or you can put it through feedburner and use that if you want.

    Then under the Wordpress settings it's really up to you. I put {{EntryTitle}} into Post Title and under Content (because I'm just using this for images) I put {{EntryContent}} which is made as an img tag (because of how we generated our RSS)

    Add in any categories and tags that you want and select if you want to publish, draft or private the posts that come from this rss. I put it as draft as I use Draft Scheduler on my blog. But posting immediately isn't bad either.

    That's pretty much it! The possibilities of this are almost limitless, I've done it with Pinterest boards straight through to my Wordpress.

    The important parts of this are in the IFTTT setup, when you put your content in, you have the power to change the content as it goes to wordpress - for example, adding your affiliate link, or ad code etc.
    wow, this came at the right moment for me. I was looking for a way to scrape images and send them to my wordpress blog. Thanks for your method. I see myself using this in the very near future