1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Basic Scraping Guide - How to make money with scraped content

Discussion in 'Making Money' started by outscrape, Dec 22, 2016.

  1. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
    After years of web scraping and working with people who do data collection, data harvesting, data indexing, data aggregation, web crawling, screen scraping, or whatever you want to call it, I wanted to put together a very basic list of ideas on how anyone can profit from the info that’s already out there.
    First: What’s scraping? My definition is basic: scraping is intelligently, automatically taking content from somewhere, generally structured content, with the intention of reproducing it or examining it for trends or valuable information.

    Second: Why scraping?
    Because data is valuable. Knowledge is power.
    Yadayada. You know all this.

    What you might not know: scraping is often free. So, we're talking about free value. So here's what this guide is going to answer, very basically and quickly to get your mind working:

    1. Where to get the data
    2. What to do with it

    One popular cloud-based scraping product suggests these basic scraping categories:

    Method (site example)

    1. Machine learning (Google images)
    2. Price monitoring (Ebay)
    3. Lead generation (Yelp) [scraping contact info for local biz]
    4. Market research (Brewdog) [scraping types of beer and their ratings, for example)
    5. App Development (Realtor.com) [I can only assume scraping realty data and copying it]
    6. Academic Research (Techcrunch)

    Nice, but I’m going to break it down for you in terms of how to actually make money with this stuff. Here’s the basic categories I could come up with:

    Duplicating sites
    Offering scraped data as a service
    Lead gen
    Offering "scraping" itself as a service
    Scraping to get around APIs




    Duplicating whole sites

    This is an obvious one. No matter what website you want to create, there’s probably already one out there that’s similar. Here’s some site ideas that could benefit from reproducing scraped data:

    1. Forums
    2. Job boards
    3. Blogs
    4. Q&A Site
    5. Coupon Sites
    6. Knowledgebase/Wiki Sites
    7. Social network
    8. Review sites (think Yelp, Amazon, etc)
    9. Any site with data that you could reproduce and create a better interface/app/etc for

    A ton of sites you might have could use one of these, to look active, to get more traffic, for SEO, as part of a PBN, as a place to actually get the data to begin with (for a coupon site), etc.


    Offering scraped data as a service

    People want the info below. If you aggregate it regularly or quickly you’ve got yourself some value. Build a targeted search engine, for example, that pulls data from the top 10 or 20 providers of any kind of niche product and you’ve got something that probably doesn’t exist anywhere else. Consider:

    1. Stocks (Often sites require a cost to scrape anything past a certain date - but you could scrape this once and then provide it for free)
    2. Niche News Aggregation (pick a niche, like celebrity news sites, scrape the top 10 sites, etc)
    3. Daily News (pay for a subscription to get past major site paywalls, then make the data free or discounted)
    4. Anything with a paywall - if you’re a student, you can grab this for free - but be careful, because that’s what got Aaron Swartz in trouble
    5. Any kind of niche content to auto-send your mailing list, post via social media, etc (think a newsletter just for the top trends in blackhat IM, or a bot that auto-tweets new when a house gets sold in a specific zipcode)
    6. Offline, intranet, or hard-to-access data - any legacy database or collection of info can be scraped and converted into a new format and put online, and I’ve seen companies pay big bucks to have this done rather than pay to have entire legacy software systems rebuilt.


    Lead Generation

    This is a goldmine, and one that could be considered less than legal, but you wouldn’t believe the number of big companies who use this data for all sorts of things (import.io SUGGESTS you use Yelp for lead gen, despite scraping Yelp being against the TOS).

    Ever get targeted by a mailer because you bought a house, had a kid, moved, went to jail, started a business, etc? A lot of this is public info. You wouldn’t believe the number of lawyers and realtors I’ve talked to who use public databases to get clients. Those two groups, for example, usually have access to a poorly designed database that doesn’t export easily and requires scraping to go through the vast datasets.

    If you have access to a unique dataset, or you’re willing to pay for it, or you can grab something that’s public and re-form it, you’re in a great position. You could collect the data and sell it, or you could use it yourself by targeting the contacts directly with offers.

    Note - Learn regex. Many places are going to have contact info like email addresses throughout that isn’t easily scrapable. With regex and the right software, you can grab any email address from any dataset and copy ONLY that.

    Places to scrape:

    1. Social networks like Linkedin, Facebook, Twitter
    2. Public datasets/records like insurance data, criminal records and other law databases, voting records, tax records, gov’t spending databases.
    3. Realty (home foreclosures, new homes)
    4. Car / vehicle sales websites
    5. Review sites like Yelp

    “Scraping” as a service:

    This sounds like offering scraped data as a service but it's slightly different, essentially because it's time-based. A lot of SAAS companies out there are just scrapers or content aggregators. You can be too. For instance, you could:

    1. Monitor websites for updates or changes
    2. Proxies
    3. Sales data (Amazon, Ebay, etc) or any kind of item and product listings for competitive price monitoring and market research, a price comparison portal, price arbitrage (what/when can you buy from Amazon and sell on Ebay for a profit?) or inventory tracking
    • Locate the highest-ranking keywords of your competitors on all major search engines
    • Automate ad buying research


    Scraping to get around API’s:

    A lot of sites that have APIs have them because people are willing to pay for the data - if that's true, then just ask yourself, why?

    API’s are awesome, but they often cost money. If you had all the money in the world and plenty of time to code I’m sure you would use them. The great thing is that sites that have APIs usually have structured content on their site as well. If you need to get data fast and easy, and for basically free, skip the API and go straight for scraping the data directly.


    In fact, one way to get scraping site ideas is to look up sites that have APIs. Example:

    http://www.computersciencezone.org/50-most-useful-apis-for-developers/
    http://www.programmableweb.com/news/most-popular-apis-least-one-will-surprise-you/2014/01/23



    It’s all overwhelming. Where to start?


    1. Start with what you know. If you’re into old cars, build a search engine / listing site for old cars for sale. See if you can automate it and monetize it. If you’re into gov’t spending or something related to legislation, here’s a few fun ideas:

    https://www.fcc.gov/licensing-databases/general/search-fcc-databases
    https://www.data.gov/
    https://www.foia.gov/search.html


    2. Play around. One reason I love scraping is that it’s fun. The programming part of it is annoying, but getting the data is fun.


    3. Grab some data and put it into a word cloud.

    This can be fun. Here’s some data I scraped earlier today of scraping jobs. Sometimes it's useful to get a sense of what is popular.

    [​IMG]



    4. Don’t freak out.
    Yes, there's a lot of data. Yes there are almost always sites that exist already that do something like what you plan to do, but usually they’re making money, and you could make some of that money, too. And if your idea is niche enough, you might actually be the first to aggregate the data or offer the service.


    Anyway, I hope this helps some of you who are looking for a method. Scraping is something pretty much anyone can do, and it’s how a lot of sites get started. (Facebook for example)

    I’m working on a larger list of ideas that I’ll share a link to once it’s ready. I'm sure there are plenty of things I missed, this is just a basic getting started guide. Thanks for reading.
     
    • Thanks Thanks x 19
  2. sandyfor80

    sandyfor80 Registered Member

    Joined:
    Aug 31, 2013
    Messages:
    62
    Likes Received:
    6
    Gender:
    Male
    Bro Nice Share!
    can you suggest me any idea to get started?
     
  3. Purush

    Purush Jr. VIP Jr. VIP

    Joined:
    Jul 12, 2016
    Messages:
    1,160
    Likes Received:
    184
    Gender:
    Male
    any nice tool for data scrapping ?
     
  4. satruk

    satruk Regular Member

    Joined:
    Mar 22, 2012
    Messages:
    309
    Likes Received:
    42
    Auto Generate Content?
     
    • Thanks Thanks x 1
  5. Zeemaa

    Zeemaa BANNED BANNED

    Joined:
    Sep 24, 2015
    Messages:
    101
    Likes Received:
    11
    Gender:
    Male
    we can get content from social media and monetize it with adsense can we get banned for this?
     
  6. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
    It depends on what you are doing for $ right now, because I don't want to recommend you start a brand new project. It is better to do something that helps your current business. So if you are tryin to get SEO, you could scrape your competitor's backlinks or content and use that info to build new links for yourself or post new content. If you are trying to get leads you could scrape places that have leads specific for your business.

    Yes, definitely. I should add a section on "Duplicating content" under "Duplicating whole sites" because sometimes you don't want to duplicate the entire site.

    I'm not sure what adsense can ban you for but I would try to mix duplicated content and other real content or spun content into the mix if you are using automated money making via adsense. Anytime you bring Google in (adsense, SEO) I would not use JUST scraped content, because they are checking it against places you've taken the content (assuming those places are public).
     
  7. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
    There are so many tools out there, I am working on a review of them and I am building my own. If you want to get started right now today and you need a tool for free try import.io. If you want something that can click buttons and move through pages you will need either to learn to code with python or to pay for something.
     
  8. Krabtreb

    Krabtreb Newbie

    Joined:
    Sep 9, 2015
    Messages:
    49
    Likes Received:
    12
    Nice read, thanks.

    I am currently building a list and want some information off of one website to be added into my emails. How would I go about this? Would the import.io work?
     
    • Thanks Thanks x 1
  9. Jomasdf

    Jomasdf Jr. VIP Jr. VIP

    Joined:
    Jul 7, 2012
    Messages:
    423
    Likes Received:
    155
    Occupation:
    C# dev
    Location:
    Sweden
    Home Page:
    Nice guide, could probably be complemented with the most common question I get on the topic. What language to use. Maybe in a part two!
     
    • Thanks Thanks x 1
  10. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
     
  11. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
    Woops, only some of that should have been the quote. Trying again:

    Not sure exactly what you mean, but probably not. Import.io is great for grabbing static info once, based on a list of URLs. They've got a nice URL generator so you can scrape pages that are numbered like http://www.website.com/query?=2. If you are trying to search a site based on a big list of emails where you need to actually type the emails into a box you will need something more dynamic, unless you can create the URLs for the site via the emails and put those into a big list. (http:[email protected]).
     
  12. afoolio

    afoolio Newbie

    Joined:
    Jun 1, 2014
    Messages:
    11
    Likes Received:
    3
    One of the best introductions on scraping which provides the foundation for countless SaaS businesses!!
    thank you
     
    • Thanks Thanks x 1
  13. Hawkster

    Hawkster Jr. VIP Jr. VIP

    Joined:
    Jun 22, 2013
    Messages:
    3,428
    Likes Received:
    3,616
    Gender:
    Male
    Occupation:
    Listen to everyone - Follow no-one
    Location:
    UK
    Home Page:
    Interesting read that buddy - Bookmarked
     
    • Thanks Thanks x 1
  14. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
    Everyone - thanks for your questions and feedback. I'm working on more scraping guides now.

    My team and I are now starting a Private Facebook Web Scraping Mastermind group. If you're interested, info is here: https://www.facebook.com/groups/1161377563958965/
     
  15. idkanon345

    idkanon345 Newbie

    Joined:
    Jan 8, 2015
    Messages:
    23
    Likes Received:
    4
    Looking forward to it man! Thanks a bunch
     
    • Thanks Thanks x 1
  16. aminima

    aminima Newbie

    Joined:
    Aug 25, 2016
    Messages:
    30
    Likes Received:
    2
    Gender:
    Male
    Inspiring.
     
    • Thanks Thanks x 1
  17. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
  18. Javardo69

    Javardo69 Junior Member

    Joined:
    Jul 19, 2014
    Messages:
    102
    Likes Received:
    6
    i'm experienced scraper but i'm pretty inexperienced in how to monetize the data and no experience in building websites. I've been only offering scraping tools. Pretty good tutorial about the potentials.
     
    • Thanks Thanks x 1
  19. thetoothfairy

    thetoothfairy Regular Member

    Joined:
    Apr 24, 2016
    Messages:
    468
    Likes Received:
    81
    hey. what scrape tools do you use ? does it require coding skills to make database ...etc..
     
  20. outscrape

    outscrape Jr. VIP Jr. VIP

    Joined:
    Nov 23, 2016
    Messages:
    118
    Likes Received:
    76
    I'm actually working on building a scraping tool for universal use....but more about that later....