Thinking of creating a Google scraper

2makemoney · Nov 29, 2019

I am thinking about building a Google scraper to learn programming. I am curious of what features people would want in a Google scraper.

1. Software (local install .exe) or web based?
2. Do people want other search engine too like Bing?

Deleted member 1333509 · Nov 29, 2019

GL, what kind of scraper?

You can make high quality scraping programs with Python/beautifulsoup2 and lxml, I'd advise that as a good way to go if you're just learning about web scraping

2makemoney · Nov 29, 2019

VSYNC said:
GL, what kind of scraper?

You can make high quality scraping programs with Python/beautifulsoup2 and lxml, I'd advise that as a good way to go if you're just learning about web scraping

I would like to replicate scrapebox result scraper without the need for proxies. Maybe take in a list of keywords and footprint and return a list of deduped links. Not sure if there is a use case for such tool.

edit: added more details

MetDark · Nov 29, 2019

1. Web-based to cater for different OS, device. Less complicated?

2. Don't know.
But I think people want great search engines.

A Google scrapper is not a search engine?
Unless we are talking about DuckDuckGo "It emphasizes returning the best results, rather than the most results, generating those results from over 400 individual sources, including crowdsourced sites such as Wikipedia, and other search engines like Bing, Yahoo!, and Yandex."

2makemoney · Nov 29, 2019

MetDark said:
1. Web-based to cater for different OS, device. Less complicated?

2. Don't know.
But I think people want great search engines.

A Google scrapper is not a search engine?
Unless we are talking about DuckDuckGo "It emphasizes returning the best results, rather than the most results, generating those results from over 400 individual sources, including crowdsourced sites such as Wikipedia, and other search engines like Bing, Yahoo!, and Yandex."

Great point about catering to different OS. I really like that idea.

Never heard of DuckDuckGo. I will look that up. Thanks

theRevolt · Nov 29, 2019

2makemoney said:
I would like to replicate scrapebox result scraper without the need for proxies. Maybe take in a list of keywords and footprint and return a list of deduped links. Not sure if there is a use case for such tool.

edit: added more details

Sorry to burst your bubble, but Google with no proxies will get you blocked for anything more than the couple of lookups that you may as well do manually.

Find something more useful if you want someone to use it, or just go ahead if the main focus is just to learn by doing...

rafark · Nov 29, 2019

You'll need proxies. If you want lo learn scraping start with a site that doesn't require proxies.

2makemoney · Nov 29, 2019

theRevolt said:
Sorry to burst your bubble, but Google with no proxies will get you blocked for anything more than the couple of lookups that you may as well do manually.

Find something more useful if you want someone to use it, or just go ahead if the main focus is just to learn by doing...

rafark said:
You'll need proxies. If you want lo learn scraping start with a site that doesn't require proxies.

Interesting. Would it be useful to find ways of bypassing the proxy issue?

BassTrackerBoats · Nov 29, 2019

2makemoney said:
Interesting. Would it be useful to find ways of bypassing the proxy issue?

That is an age old question as we all (an absolute I know so kick me for saying that) use proxies when we scrape unless we set things up to rest our scraping silly times between scrapes and that makes it almost useless.

Novita Rizki · Nov 29, 2019

2makemoney said:
Interesting. Would it be useful to find ways of bypassing the proxy issue?

Crowd sourcing the IP list from user mobile device would be an idea(if possible), just like some free VPN provider ways to collect IP from its user

2makemoney · Nov 29, 2019

Novita Rizki said:
Crowd sourcing the IP list from user mobile device would be an idea(if possible), just like some free VPN provider ways to collect IP from its user

That is actually a good idea. Maybe abstract the proxies from the users.

t-machine · Nov 29, 2019

Even with proxies, you also have to deal with recaptcha... Google is a serious pita for that, not just the search engine, even more niche stuff like google scholar, or google flights, literally everything is so messed up that if you want to get around captcha, you need to simulate browser behavior, introduce action delays, accept cookie headers, etc, etc, etc, etc, etc... I absolutely hate google in that regard. Imo the effort is not worth it as a business model, but who am I to stop you.

itz_styx · Nov 29, 2019

as long as you use proxies its fairly easy to scrape google results, no need for browser emulation, you can use pure http requests.

Deleted member 1333509 · Nov 30, 2019

+1 Selenium

turelink · Nov 30, 2019

#1 rotating proxies （residential IP is better than datacenter IP）
#2 Headless browser such as Selenium or Puppeteer
#3 Good web scraper, with captcha solver or develop a captcha scraper by your own.

FNTK · Dec 3, 2019

This seems promising.

masterofall · Dec 23, 2019

Why don't you just use the APIs Bing or Google has for this? You won't have enough traffic at the start, so should work just fine. And it's totally within terms of service.

Mr MOrocco · Dec 24, 2019

you need a lot of proxies if you want to scrape google ! maybe you can use one of them Api instead of scraping Google directly

FahadCip · Dec 24, 2019

-For scrap data from Google, you have to use high-quality proxy must.
-Without proxy maybe it is possible by adding Captcha solving service in it.
-Software will be easy to build I think rather than web-based.

wisewarden · Dec 26, 2019

i was looking tol code a proxy ip generator script that rotates ips on lolcal server for scraping

Thinking of creating a Google scraper

Regular Member

Deleted member 1333509

Guest

Regular Member

Registered Member

Regular Member

Jr Vip

Elite Member

Regular Member

Support Admin

Newbie

Regular Member

Junior Member

Elite Member

Deleted member 1333509

Guest

Elite Member

Power Member

Newbie

Newbie

Newbie

Newbie

Main Menu

Marketplace

Making Money

BlackHat World