ASK ME ANYTHING about web automation | web scraping

SeasonedCode

Senior Member
Jr. Executive VIP
Jr. VIP
Joined
Oct 24, 2022
Messages
1,090
Reaction score
1,293
the title said everything :)
 
any advice for puppeteer + dolphin anty browser automation?

I have no knowledge about Node.js but just started with chat gpt.
 
any advice for puppeteer + dolphin anty browser automation?

I have no knowledge about Node.js but just started with chat gpt.
for puppeteer use puppeteer extra stealth plugin helps with browser fingerprint stuff, make sure to save profile/sessions usually people dont set directory and keep creating new session every time which is risky
anty doest most of the stuff by default which is why service like these exists so just follow their docs and blog guides on your specific scraping area
 
for puppeteer use puppeteer extra stealth plugin helps with browser fingerprint stuff, make sure to save profile/sessions usually people dont set directory and keep creating new session every time which is risky
anty doest most of the stuff by default which is why service like these exists so just follow their docs and blog guides on your specific scraping area
thanks! I was already using the extra strealth plugin, do you where the directory paths for each browser profile is located?

I want to specify the profile and use my exported cookie. The goal is to auto post youtube videos to my channel and then scale that.
 
thanks! I was already using the extra strealth plugin, do you where the directory paths for each browser profile is located?

I want to specify the profile and use my exported cookie. The goal is to auto post youtube videos to my channel and then scale that.

sure here is my browser function you can define directory in "userDataDir"


JavaScript:
async function launchBrowser(req, proxy = null) {
    !fs.existsSync('./data') && fs.mkdirSync('./data');
    puppeteer.use(pluginStealth());
    return await puppeteer.launch({
        headless: true,
        ignoreHTTPSErrors: true,
        // waitUntil: ["domcontentloaded"],
        args: [
            "--window-size=1920,1080",
            "--window-position=000,000",
            '--ignore-certificate-errors',
            '--ignore-certificate-errors-spki-list',
            "--no-sandbox", "--incognito",
            "--disable-web-security",
            `--user-agent=${req.userAgent}`,
            `--proxy-server=${proxy}`
        ],
        userDataDir: "./data",
        slowMo: 15
    });
}

let browser = await launchBrowser(req)

and this is how im storing cookies for reuse


JavaScript:
const saveCookies = async (page, accId) => {
    let cookies = await page.cookies();
    let cookieJson = JSON.stringify(cookies);
    fs.writeFileSync(`data/login_${accId}.json`, cookieJson)
    return {
        [cookies[0].name]: cookies[0].value,
        [cookies[1].name]: cookies[1].value
    };
}

const updateCookies = async (reqCookies, accId) => {
    if (!fs.existsSync(`data/login_${accId}.json`)) return;
    var cookies = fs.readFileSync(`data/login_${accId}.json`, 'utf8')
    cookies = JSON.parse(cookies);
    for (i in cookies) cookies[i].value = reqCookies[cookies[i].name] ?? null
    let cookieJson = JSON.stringify(cookies);
    fs.writeFileSync(`data/login_${accId}.json`, cookieJson)
}

const useCookies = async (page, accId) => {
    if (!fs.existsSync(`data/login_${accId}.json`)) return;
    var cookies = fs.readFileSync(`data/login_${accId}.json`, 'utf8')
    cookies = JSON.parse(cookies);
    await page.setCookie(...cookies);
}

modify as per your need :)
 
Last edited by a moderator:
Do You really think that extra-stealth-plugin is enough?
For me it stopped working on one website at some point (datadome blocking me), and I got to move to another library.

Do You think that running puppeteer on same device every time can cause, for example, our mac address, to be detected?
 
How can I increase the word count in my Python script for AI content automation using OpenAI? Currently, I am only able to generate around 700 to 800 words.
 
Do You really think that extra-stealth-plugin is enough?
nothing is ever enough, add more extensions/plugins, top it up with bunch of code itll break one day, you have to keep updating your code base as they update their detection methods, im just saying this should be enough for basic scraping stuff, for tech giants like instagram you will always have to improvise

use selenium with residential rotating proxies
@matek697 do you think this ^ is enough ? probably not, still its something we all could get started with :)
 
Last edited by a moderator:
any advice for puppeteer + dolphin anty browser automation?

I have no knowledge about Node.js but just started with chat gpt.
I advise you to go with Selenium if you are just getting started I know it is hard to learn but if you learn it + Scrapy
believe me, nothing can stop you
 
Would like to know,
- How to automate complicated task? like doing Cloudflare registration, add domain.
- Your best provider for captcha, socks5, and proxy?
- How you do fake user agent thing?
- your most complicated task in web scrapping and automation and how you approach it?
 
Would like to know,
- How to automate complicated task? like doing Cloudflare registration, add domain.
- Your best provider for captcha, socks5, and proxy?
- How you do fake user agent thing?
- your most complicated task in web scrapping and automation and how you approach it?
1. such tasks can be done but tbh I never think of these such tasks :)
2. captcha try 2captcha API for proxies I use oxylabs as they provide a high user experience but they are expensive a little bit
3. it is the easiest task on selenium
here is an ex using java
public static void main(String[] args) { System.setProperty("webdriver.chrome.driver", "chromedriver.exe"); ChromeOptions options = new ChromeOptions(); options.addArguments("here you will put your fake user agent"); WebDriver driver = new ChromeDriver(options); }
and here is an ex using python
options = Options() options.add_argument("--user-agent= her you will add your fake user agent") driver = webdriver.Chrome(executable_path="chromedriver.exe", options=options)
4. The most complicated task is the dynamically loaded page but you can bypass this with Selenium with ease
 
How to make a webscraper that scrapes email adresses from google search results.
For example, I search: mechanics in London and that I can choose from how many results I want the email adresses scraped, should I choose 100 for example, then from result 1-100 the email adresses will be scraped and exported to a csv file.
 
How to make a webscraper that scrapes email adresses from google search results.
For example, I search: mechanics in London and that I can choose from how many results I want the email adresses scraped, should I choose 100 for example, then from result 1-100 the email adresses will be scraped and exported to a csv file.
no there is a smart way than that
go manually and config the search link that you want to give to the scraper by opening the advanced setting on your Google and put infinite scroll
then get the link and put it on your scraper
 
Back
Top