How can I clone website with 1 Million images?

RatSeaExclusive

Junior Member
Joined
Dec 18, 2021
Messages
122
Reaction score
60
I want to clone website and upload it on my own that has 1 million images with descriptions.
Descriptions need to be attached to a photo, like attached to database or something.

For example: https://www.freepik.com/photos/animals

I consider this: https://github.com/alexksikes/mass-scraping
or Puppeteer with Node.js
or possibly find a hidden api on website

How would I even start to make this happen? And what is the smartest way to do that?
 

RatSeaExclusive

Junior Member
Joined
Dec 18, 2021
Messages
122
Reaction score
60
httrack will do it.
Are proxies needed?

Get a high powered VPS and just ctrl + s the webpage?
How can I do this from Ubuntu VPS command line?

EDIT: And what if images get updated overtime? How I can bring new images with descriptions at the time something new appears with the automated way?
 
Last edited:

Ndiqi

Jr. VIP
Jr. VIP
Joined
Apr 12, 2009
Messages
726
Reaction score
331
How can I do this from Ubuntu VPS command line?

EDIT: And what if images get updated overtime? How I can bring new images with descriptions at the time something new appears with the automated way?

That is beyond the scope of what you initially wanted to do, hence that requires more advanced steps.

You will need a cron jobs just for checking if content is updated from the origin site, you will need to setup a bunch of tools unrelated to the task of downloading the images with descriptions.

You can use either windows or Linux
 

uncutu

Elite Member
Joined
Aug 6, 2010
Messages
2,685
Reaction score
2,124
since you want to keep an index of images that will be refreshed when the original host updates
youll likely need to spend some time in deep thought comparing and test all the made-for-you solutions that exist in this space.
proxies it varies. some (most) sites will rate-limit you after so many requests.
 
Top