Simple Scraper or Bot Question - Pls Help!

Rich77ard · Feb 2, 2015

I'm trying to scrape or harvest .com.au websites that have their robots txt file set to block everything - Disallow: /

I know these sites still show up in the search results with "A description for this result is not available because of this site's robots.txt" in their organic search results.

Does anyone know a simple Scrapebox search query that could harvest these site domains, or do I need to create a bot to do this?

I could probably process hundreds of urls through Xenu or Screaming Frog and check for blocked robots txt that way, but that seems a bit backwards. I'm sure there's an easier way where I can just type 'keyword' and 'the blocked robots query' and harvest domains that way.

I'm hoping I don't have to dig up my Ubot Studio and start from scratch. Thanks in Advance.

seeplusplus · Feb 2, 2015

Difficult, I would see if there's any search engines around which don't respect the robots.txt file and harvest that search engine...?

Rich77ard · Feb 2, 2015

Hmm, the problem with that is I won't know if the site has a blocked robots.txt file just by looking at the search results. With Google at least I can see that it's blocked in the results, problem is I'd have to sort through 100's of search results relative to a keyword before I found one saying "A description for this result is not available because of this site's robots.txt"

innocent_kid · Feb 2, 2015

well u can try this footprint
intext:"description for this result is not available because of this site's robots.txt"

Rich77ard · Feb 2, 2015

innocent_kid said:
well u can try this footprint
intext:"description for this result is not available because of this site's robots.txt"

It's very rough and not targeted enough. I get all sorts of mixed results and 98% of the sites do actually have a normal robots.txt file that's not blocking the whole site.

I'm trying to use this strategy to get new Seo clients. When you call a business up that has their whole site unintentionally blocked by a robots txt file it's an easy way to get your foot in the door and provide them with some immediate assistance which can easily lead to a monthly Seo contract.

innocent_kid · Feb 8, 2015

well for that you probably need a automated bot that scrapes robots.txt urls.. and than scan them for their values

Simple Scraper or Bot Question - Pls Help!

Rich77ard

Registered Member

seeplusplus

Power Member

Rich77ard

Registered Member

innocent_kid

Power Member

Rich77ard

Registered Member

innocent_kid

Power Member

Main Menu

Marketplace

Making Money

BlackHat World