Scraping pages and making a search on them

billy67

Registered Member
Joined
Mar 5, 2018
Messages
76
Reaction score
17
Hey guys

I am trying to figure out the way to scrape url links from specific page. Then go into all the scraped links and search for some Regex words on those pages.

I tried to do this with headless selenium on python, but got stuck with problem initiating browser session.

Is there any possiblity to perform such scraping and regex lookup?

Thanks for your time.
 
If web pages isn't javascript rendering you could use direct http request without using third-party like selenium that's much faster .
You need to get page source by normal http request then fetch it using your regex .
 
If web pages isn't javascript rendering you could use direct http request without using third-party like selenium that's much faster .
You need to get page source by normal http request then fetch it using your regex .
Thanks for input.
I thought about scraping with js selectors.
But I want to do in 'background'. Without a need to have it running in browser.
 
https://www.pythonforbeginners.com/beautifulsoup/beautifulsoup-4-python

theres a package for parsing html. you dont need a browser to make http requests in python
 
okay. thanks guys fo input.
I have decided to move with writing browser extension that will run JS that will scrape urls and lookup by regex for entries.
 
You also don't need a whole browser extension for this. Things like Greasemonkey and similar existing extensions effectively let you run "javascript" within a browser and ability to access things you can't do from console
 
Back
Top