Simple Scraper or Bot Question - Pls Help!

Rich77ard

Registered Member
Joined
Mar 20, 2010
Messages
70
Reaction score
45
I'm trying to scrape or harvest .com.au websites that have their robots txt file set to block everything - Disallow: /

I know these sites still show up in the search results with "A description for this result is not available because of this site's robots.txt" in their organic search results.

Does anyone know a simple Scrapebox search query that could harvest these site domains, or do I need to create a bot to do this?

I could probably process hundreds of urls through Xenu or Screaming Frog and check for blocked robots txt that way, but that seems a bit backwards. I'm sure there's an easier way where I can just type 'keyword' and 'the blocked robots query' and harvest domains that way.

I'm hoping I don't have to dig up my Ubot Studio and start from scratch. Thanks in Advance.
 
Difficult, I would see if there's any search engines around which don't respect the robots.txt file and harvest that search engine...?
 
Hmm, the problem with that is I won't know if the site has a blocked robots.txt file just by looking at the search results. With Google at least I can see that it's blocked in the results, problem is I'd have to sort through 100's of search results relative to a keyword before I found one saying "A description for this result is not available because of this site's robots.txt"
 
well u can try this footprint
intext:"description for this result is not available because of this site's robots.txt"
 
well u can try this footprint
intext:"description for this result is not available because of this site's robots.txt"

It's very rough and not targeted enough. I get all sorts of mixed results and 98% of the sites do actually have a normal robots.txt file that's not blocking the whole site.

I'm trying to use this strategy to get new Seo clients. When you call a business up that has their whole site unintentionally blocked by a robots txt file it's an easy way to get your foot in the door and provide them with some immediate assistance which can easily lead to a monthly Seo contract.
 
well for that you probably need a automated bot that scrapes robots.txt urls.. and than scan them for their values
 
Back
Top