Crawl data from expireddomains.net with Python

dechorf

Newbie
Joined
Dec 13, 2023
Messages
20
Reaction score
3
I'm trying to crawl data from https://www.expireddomains.net/ using Python
They require an account to see all data, so signed up and used Chrome to inspect what headers were sent to their server. I copied all that headers to my Python script including Cookie. I even put a random 3 to 5 minutes break between requests. I could get up to around 10 pages and they suspended my account. How can they know that I'm crawling from their site? The only reason I can think of is that I'm crawling data for 1 type of domain (*.sg) using paging, and that's the data I need
 
I'm trying to crawl data from https://www.expireddomains.net/ using Python
They require an account to see all data, so signed up and used Chrome to inspect what headers were sent to their server. I copied all that headers to my Python script including Cookie. I even put a random 3 to 5 minutes break between requests. I could get up to around 10 pages and they suspended my account. How can they know that I'm crawling from their site? The only reason I can think of is that I'm crawling data for 1 type of domain (*.sg) using paging, and that's the data I need
follow these steps:

  1. Set real request headers to mimic a regular user.
  2. Use proxies to rotate IP addresses.
  3. Slow down your scraping by adding delays between requests.
  4. Adapt to changes in website layout.
  5. Consider using a headless browser.
  6. Avoid honeypot traps.
  7. Automate CAPTCHA solving if needed. Good luck with your data extraction!
 
There's a fairly simple security code in there.
They have average user behavior on domain list pages.

You should apply sorting, do not go further than the third page (the user will never go to see the garbage that is displayed there, he will change the filter).

Observe the time between page transitions, users can long check the received domains from one page.
 
Thanks for the information.
I did place 10 minutes break among requests.
If I change the filter, then it's kind of hard to keep track of domains gathered. Do you have any suggestion on this?
 
Back
Top