Rich77ard
Registered Member
- Mar 20, 2010
- 70
- 45
I'm trying to scrape or harvest .com.au websites that have their robots txt file set to block everything - Disallow: /
I know these sites still show up in the search results with "A description for this result is not available because of this site's robots.txt" in their organic search results.
Does anyone know a simple Scrapebox search query that could harvest these site domains, or do I need to create a bot to do this?
I could probably process hundreds of urls through Xenu or Screaming Frog and check for blocked robots txt that way, but that seems a bit backwards. I'm sure there's an easier way where I can just type 'keyword' and 'the blocked robots query' and harvest domains that way.
I'm hoping I don't have to dig up my Ubot Studio and start from scratch. Thanks in Advance.
I know these sites still show up in the search results with "A description for this result is not available because of this site's robots.txt" in their organic search results.
Does anyone know a simple Scrapebox search query that could harvest these site domains, or do I need to create a bot to do this?
I could probably process hundreds of urls through Xenu or Screaming Frog and check for blocked robots txt that way, but that seems a bit backwards. I'm sure there's an easier way where I can just type 'keyword' and 'the blocked robots query' and harvest domains that way.
I'm hoping I don't have to dig up my Ubot Studio and start from scratch. Thanks in Advance.