I want to program a lead generation software and need help to get data from reddit

eliteaccess

Newbie
Joined
May 17, 2026
Messages
18
Reaction score
5
The software should pull conversations that are already talking with buyer intent or about specific pain points. How do I have to build it to get the data from Reddit. I don't have Reddit API Keys and also the question how to maybe get them. What do you think?
 
You don't necessarily need API keys to get started. In Redditomatic, one of my WordPress plugins, I pull subreddit data directly from Reddit's public JSON endpoints by requesting URLs like:

https://www.reddit.com/r/subreddit/.json

...and then parsing the returned JSON. It works for fetching posts, titles, content, scores, authors, images, videos, permalinks and other metadata.

Just keep in mind that Reddit changes things from time to time, so if you're building a serious SaaS around this, I'd build also the API method besides JSON parsing, also verify the current API and rate-limit before investing too much development time in this.
 
Start with Reddit's official API if available to you, or use publicly accessible Reddit search results where permitted by Reddit's terms. Track keywords that indicate intent (e.g., "looking for", "recommend", "alternative to", "problem with") and score posts/comments based on relevance. Make sure your data collection complies with Reddit's policies and rate limits.
 
For the API route — Reddit gives you a personal-use script key for free at reddit.com/prefs/apps, then PRAW (Python Reddit API Wrapper) handles auth and pagination. The catch: their API gives you only the latest ~1000 posts per subreddit/search query, so for historical buyer-intent mining you'll need Pushshift archives or a Reddit search proxy on top. The actual lead-quality work isn't the scraping though — it's the classifier on top. Build a regex/keyword pre-filter for buying signals ("looking for", "anyone recommend", "tired of using X"), then run the hits through an LLM with a strict prompt that returns intent score + a short reason. Most lead-gen "buyer intent" tools fail because they treat scraping as the product instead of the filter pipeline.
 
Bro, I tested this manually before coding anything, and the useful leads always came from comment chains, not just the OP text
 
The mistake I see people make with this is they start scraping “reddit” as one big source. Nah. Pick 10-20 subs where your buyer actually hangs out and build around those first.

For data, public json is fine for MVP, but grab the comment json too, not only /r/sub/.json. A lot of intent is like someone replying “same issue here, did you find a tool?” and that never shows if you only score the post title.

I’d store raw thread + comments first, then run your own scoring after. Something like keyword hit, negative/positive pain words, recent activity, number of people agreeing, and whether the person is asking for a product/tool/vendor. Don’t let the LLM read everything from scratch or it gets expensive and noisy... prefilter hard, then use AI only on the maybe-good stuff.

API keys are not hard, reddit apps page gives you one, but for early testing I wouldnt wait on that. Just make sure you cache, slow down requests, and expect stuff to break sometimes. Reddit is messy data, the value is in filtering not collecting.
 
one thing nobody mentioned... the .json endpoint has a hidden gem, you can append ?limit=500 and also hit /r/sub/comments/.json to get the firehose of new comments across the whole sub instead of going post by post. way faster for catching intent early.

what worked for me was watching velocity not just keywords. a thread thats getting replies fast in the first hour usually means real pain, people pile on when something resonates. so i score recency + comment growth rate before the LLM even sees it.

on the keys thing, agree dont wait on it for testing but get the oauth app made anyway because the unauthenticated json limit is brutal once you scale, like 10 req per min and they'll start 429ing you randomly. with oauth you get 100/min which changes everything.

also old.reddit json is sometimes more stable than www for parsing, idk why but it breaks less for me.
 
You can use Gemini or Claude to learn most of these get answers and the results you want.

All the best
Yeah Claude is ideal for coding on my opinion. Gemini is a bit meh for this job.
 
Start with Reddit's official API if available to you, or use publicly accessible Reddit search results where permitted by Reddit's terms. Track keywords that indicate intent (e.g., "looking for", "recommend", "alternative to", "problem with") and score posts/comments based on relevance. Make sure your data collection complies with Reddit's policies and rate limits.
Unfortunately the API is not available to me, how do I get it?
 
For the API route — Reddit gives you a personal-use script key for free at reddit.com/prefs/apps, then PRAW (Python Reddit API Wrapper) handles auth and pagination. The catch: their API gives you only the latest ~1000 posts per subreddit/search query, so for historical buyer-intent mining you'll need Pushshift archives or a Reddit search proxy on top. The actual lead-quality work isn't the scraping though — it's the classifier on top. Build a regex/keyword pre-filter for buying signals ("looking for", "anyone recommend", "tired of using X"), then run the hits through an LLM with a strict prompt that returns intent score + a short reason. Most lead-gen "buyer intent" tools fail because they treat scraping as the product instead of the filter pipeline.
The site you mentioned doesn't give any more API keys, do you have some spare ones by any chance?
 
The mistake I see people make with this is they start scraping “reddit” as one big source. Nah. Pick 10-20 subs where your buyer actually hangs out and build around those first.

For data, public json is fine for MVP, but grab the comment json too, not only /r/sub/.json. A lot of intent is like someone replying “same issue here, did you find a tool?” and that never shows if you only score the post title.

I’d store raw thread + comments first, then run your own scoring after. Something like keyword hit, negative/positive pain words, recent activity, number of people agreeing, and whether the person is asking for a product/tool/vendor. Don’t let the LLM read everything from scratch or it gets expensive and noisy... prefilter hard, then use AI only on the maybe-good stuff.

API keys are not hard, reddit apps page gives you one, but for early testing I wouldnt wait on that. Just make sure you cache, slow down requests, and expect stuff to break sometimes. Reddit is messy data, the value is in filtering not collecting.
Which reddit app page is that - all I tried don't work anymore.
 
one thing nobody mentioned... the .json endpoint has a hidden gem, you can append ?limit=500 and also hit /r/sub/comments/.json to get the firehose of new comments across the whole sub instead of going post by post. way faster for catching intent early.

what worked for me was watching velocity not just keywords. a thread thats getting replies fast in the first hour usually means real pain, people pile on when something resonates. so i score recency + comment growth rate before the LLM even sees it.

on the keys thing, agree dont wait on it for testing but get the oauth app made anyway because the unauthenticated json limit is brutal once you scale, like 10 req per min and they'll start 429ing you randomly. with oauth you get 100/min which changes everything.

also old.reddit json is sometimes more stable than www for parsing, idk why but it breaks less for me.
Do you have any idea how I can get a Reddit API Key?
 
Back
Top