Quick Scrapebox Question about inurl:

Thesiege84

Regular Member
Joined
May 5, 2009
Messages
376
Reaction score
106
Sorry if this is posted somewhere but ive just searched for 15min and cannot for the life of me find the answer im looking for.

Ok, so we know with scrapebox if we want a list of websites with "dogs" in the url we use: inurl:dogs....

Which might bring up:

www.dogsarefun.com

however its also bringing the results from web 2.0's, article dir's etc which are useless to me:

www.articlespam.com/dogs-are-a-mans-best-friend.html

What is the google search for scraping websites that only have the keyword in the TLD.

so it would only return top level domains with the keyword in instead of including inner page results?

I.e.

www.*****dogs**.com
www.**dogs************.nl
www.***dogs****.co

etc etc

At the moment i have to scrape the list, then run through excel to grab what i want and keep doing that but i have a funny feeling if i searched correctly i could get alot more tailored results.

Hope someone can enlighten me quickly :)
 
I just had a quick try with
Code:
inurl:.*dogs*.com/
which brings up all .com's. You could add other tld and in this way make sure you only get the actual domains. You may as well add a "www" in front to disable subdomains with "dogs".
Hope this helps. :)
 
Thanks, that semi worked, its still bringing up some inner pages but its certainly better than the results i was getting.

I didnt know you could use wildcards for google.

Also how would you do it for 2 keywords?

inurl:. *dog*kennels*.com ??? (also, is the . required after the : )
 
I was only looking for the same thing yesterday and came across this thread. I didn't have the patience to continue on testing with it but it seems it does work :)
 
Also how would you do it for 2 keywords?
inurl:. *dog*kennels*.com ??? (also, is the . required after the : )

If you only have two different keywords, I would just go for both possibilities, i.e.
Code:
inurl:*dog*kennels*.com
and
Code:
inurl:*kennels*dog*.com
as different footprints.

If you have a lot more to check for, maybe a little excel worksheet is what you would like to use.
There you should have the possibility to generate all relevant combinations using a randomize function.

The . is not required, you should probably even leave it out to be certain to get all domains without subdomain, i.e. dogs.c0m.
 
I was only looking for the same thing yesterday and came across this thread. I didn't have the patience to continue on testing with it but it seems it does work :)

Thanks, i took a look and it looks a very longwinded way of doing it with zone files etc.

If you only have two different keywords, I would just go for both possibilities, i.e.
Code:
inurl:*dog*kennels*.com
and
Code:
inurl:*kennels*dog*.com
as different footprints.

If you have a lot more to check for, maybe a little excel worksheet is what you would like to use.
There you should have the possibility to generate all relevant combinations using a randomize function.

The . is not required, you should probably even leave it out to be certain to get all domains without subdomain, i.e. dogs.c0m.

I just tried: inurl:*kennels*dog*.com

However if you try it you will see the second result is: www.retailmenot.com ? Pets ? Dog Supplies ? Dog Crates

im trying to idolate only tld's, dogs or kennels isnt in the url. Maybe you could have another look for me?

I'd REALLY appreciate it!
 
Dear Thesiege84,

the following weird situation occurs for me and I can now fully understand your problem:
If I am searching for
Code:
inurl:*dogs*kennels.com
, I get very decent results:
dogs-kennels-g-test.gif
However, if I am searching for
Code:
inurl:*dogs*kennels*.com
with one additional *, I get basically the url's to the keywords kennels and dogs. I really don't get why but what works in all cases is just one of the keywords, i.e.
Code:
inurl:*kennels*.com
kennels-g-test.gif

So there seems to be a problem with too many *. To solve this problem, I suggest you search for each keyword alone and sort out duplicates using excel (or via scrapebox' duplicate filter). I can understand that g puts a stop on too many wildcards etc. But what I absolutely not understand is the following. If I put exactly the same footprint into scrapebox and try to harvest (without proxies, to be comparable), this is what I get:
kennels-scrapeb0x-test.png

The result has absolutely nothing to do with my exact same search above - I think you experience the same problem.
Maybe there is a problem with scrapebox or my usage of it. Hopefully someone of the more experienced scrapebox users can give us a hint here?!
 
What if i wanted to use YouTube to do the same thing
Hey BHMack, what do you mean by doing the same thing? Do you want to scrape youtube vid urls?
Here you could just insert the footprint of youtube (see image, cannot post links):
youtube-scrape.png
and you are ready to go. But be sure to disable Options -> Automatically Remove Duplicate Domains since you will otherwise always get only one link :)
 
Thanks for putting so much effort into helping me.

Ive gone down the route of each single word instead of keyword. After combining all the results and processing them in excel it actually doesnt look that bad. Im going to use these for guest blogging so someone will have to manually go through the list anyway so its not too bad.

Thanks for the help!

EDIT: Shame Google don't have intld:
 
Last edited:
Hey,

Has anyone noticed a change with this?

The same results dont come up anymore and im trying to use the same method to extract lists of domains with words ONLY in the domain name, i get stuff like www.eurobreeder.com showing up which doesnt have the word kennels in the domain... :(

Does anyone know of a solution?

inurl:*kennels*.com
 
Back
Top