Solid breakdown, especially around rate control and IP longevity — that’s often underestimated.Curious though: once you start dealing with stronger JS challenges or long-lived sessions, do you still find pure HTTP + IP rotation sufficient, or do you usually end up moving to full browser-based flows?
No, you should right away go for headful browsing, it sounds counterintuitive, but headless is easily sniffed out and you will be dealing with invisible captchas and be debugging taking screenshots and such , it's not pretty. And you will consequently facing many more captchas and ip bans, the more certai. The site can graph your stuff, the father you'll be digging your accounts grave.
You want to encounter as little captchas as possible, you do cover all the potential use cases but you want to sail in a way where you can scrape unobstructed. The captcha, once presented, is trivial to resolve, except the twitter one, that one is really special, if website would be using that, the scraping days would be well over. But that is a different topic altogether, as most websites don't use that one.
Regarding your question about the paid for platforms, no mate, they are really not fit for large scraping, they will also refuse to scrape some websites which are known litigation starters. And they....don't work, think of it, at best they can do something that one person does rolling it's own, but it won't come close and since they're middlemen of middlemen, the price will triple, and the proxy pools are shared.
Every single scraping request I've seen is because these tools can't do the Jon or stopped working.
And even if there's a tool, it will not be doing the detective work for you, enumeration of the url structure , pagination etc.
And you know what, it's actually not that hard.
People say learn a hello world in every language first, lol, why is that, will someone's brains fall out and hands if they learn to start up a nodejs server and console log some errors and we'll, events.
There's 2 kinds of people when it comes to programming, some think you need some sort formal education to even get started and they're super cautious etc, I mean, it's good to understand all the concepts etc, but there's I believe billions of programmers out there and they get to professional level faster than most toddlers learn a language.
And the others think they can vibe code a hft engine the first week haha.
I come from a time when If you wanted to host your own website, you had to sometimes go on the premises, you had to know dB, server side, front end all of it a bit.
I remember when I did first nodejs auth flow, as in sign up, forgot password etc and a socket based chat bot, all before these would be available as libraries.
You had to ask Google or you had to go to stack overflow, where the older folks would roast you to the bones before you get any worth wile responses and you better came with logs and code showing what you've tried etc.
I do not believe I have learned anything faster back then because I had to debug stuff myself, I was stuck trying to debug some passport.js issues for days, nowadays, LLM can help a great deal, they are basically stack overflow minus the rude senior behaviour. You can ask them how to install node, how to start a server how to connect a database and they will always reply mostly correct and always polite. You just can't let llm generate a full app without supervision.
Everyone is doing some sort of todo app as first project and they learn hardly anything doing so, and hardly anything useful for a career in particular. Since once employed, it's about adding new features to todo and crud apps which are based on borderline legacy ware.
A scraper is actually perhaps a pretty good first project, I am not even joking.
And if you use llm help and it tells you, now we are touching legally touchy territory, it's a decent indicator that you're doing fine.