The mistake I see people make with this is they start scraping “reddit” as one big source. Nah. Pick 10-20 subs where your buyer actually hangs out and build around those first.
For data, public json is fine for MVP, but grab the comment json too, not only /r/sub/.json. A lot of intent is like someone replying “same issue here, did you find a tool?” and that never shows if you only score the post title.
I’d store raw thread + comments first, then run your own scoring after. Something like keyword hit, negative/positive pain words, recent activity, number of people agreeing, and whether the person is asking for a product/tool/vendor. Don’t let the LLM read everything from scratch or it gets expensive and noisy... prefilter hard, then use AI only on the maybe-good stuff.
API keys are not hard, reddit apps page gives you one, but for early testing I wouldnt wait on that. Just make sure you cache, slow down requests, and expect stuff to break sometimes. Reddit is messy data, the value is in filtering not collecting.