1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Scraping Kickstarter - Looking for Custom Script or Tutorial

Discussion in 'Black Hat SEO Tools' started by luketr, May 17, 2017.

Tags:
  1. luketr

    luketr Newbie

    Joined:
    Mar 1, 2015
    Messages:
    6
    Likes Received:
    0
    I'm looking to either have guidance creating or have some offers to create a custom scrape bot for Kickstarter.

    Preferably, I would like the something I can periodically run locally (perhaps .py script) that exports out into a .csv

    The scrape will start from a given URL (sorry can't post them yet) which has a string of 2 subfolders that need to be inserted into the URL for each category/sub-category combination (see attached photo)

    Where {insert_directory} is a list of url parameters defining the project's category and sub-category - such as: "technology/software".
    Full list of required "directories" will be provided.

    On each page resulting from a "directory link", cycle through and visit each link contained within the following class:
    HTML:
    <div class="project-profile-title text-truncate-xs">
        <a target="" href="/projects/johnonolan/ghost-just-a-blogging-platform ref=category_most_funded">
            Ghost: Just a Blogging Platform
        </a>
    </div>
    
    On each project-profile page, I would like to scrape into the .csv for each heading:
    • url
    • project
    • goal
    • total_raised
    • total_backers
    • reward#_amount
    • reward#_backers
    url is the link followed from the "project-profile-title" class above. Rest of data is found on the project-profile page as follows:
    HTML:
    <div class="NS_project_profile__title">
        <h2>{project}</h2>
    </div>
    
    HTML:
    <div class="type-12 medium navy-500">
        pledged of <span class="money">${goal}</span> goal
    </div>
    
    HTML:
    <div class="mb3">
        <h3 class="mb0">
            <span class="money">${total_raised}</span>
        </h3>
    </div>
    
    HTML:
    <div class="mb0">
        <h3 class="mb0">
            {total_backers}
        </h3>
    </div>
    reward#_amount & reward#_backers:
    For each project, there may be multiple (and varying in amount) "Pledge Rewards". Around 20 should be enough though.
    Therefore, the last columns in the .csv would be: "reward1_amount" "reward1_backers" "reward2_amount" "reward2_backers" etc.

    Info for Pledge Rewards found:
    HTML:
    <div class="pledge__info">
        <h2 class="pledge__amount">
            Pledge <span class="money">${reward#_amount}</span> or more
            ...
        </h2>
        ...
        <div class="pledge__backer-stats">
            <span class="pledge__backer-count">
                {reward#_backers} backers
            </span>
        </div>
    </div>
     

    Attached Files:

    Last edited: May 17, 2017
  2. TimelordHarry

    TimelordHarry Regular Member

    Joined:
    Apr 6, 2017
    Messages:
    234
    Likes Received:
    56
    Gender:
    Male
    Occupation:
    Tardis Enginner
    Location:
    Gallifrey
    • Thanks Thanks x 1
  3. luketr

    luketr Newbie

    Joined:
    Mar 1, 2015
    Messages:
    6
    Likes Received:
    0
    Thanks for the link. That looks like a fantastic starting point!...shouldn't you be guarding the vault, though?

    Whilst I understand the gist of the code, my practical python knowledge begins and ends at: Opening Terminal >> cd to the .py script >> python *.py

    Anyone here able to help me get started with / customise this? Or offer a service to the same effect?
     
  4. somethingclever

    somethingclever Newbie

    Joined:
    Nov 26, 2008
    Messages:
    24
    Likes Received:
    4
    Gender:
    Male
    Occupation:
    Anything that puts money in my pocket
    Location:
    in the Ether
    Home Page:
    look into scrapy-splash