1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Quick Scrapebox Question about inurl:

Discussion in 'Black Hat SEO Tools' started by Thesiege84, Jan 24, 2013.

  1. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Sorry if this is posted somewhere but ive just searched for 15min and cannot for the life of me find the answer im looking for.

    Ok, so we know with scrapebox if we want a list of websites with "dogs" in the url we use: inurl:dogs....

    Which might bring up:

    www.dogsarefun.com

    however its also bringing the results from web 2.0's, article dir's etc which are useless to me:

    www.articlespam.com/dogs-are-a-mans-best-friend.html

    What is the google search for scraping websites that only have the keyword in the TLD.

    so it would only return top level domains with the keyword in instead of including inner page results?

    I.e.

    www.*****dogs**.com
    www.**dogs************.nl
    www.***dogs****.co

    etc etc

    At the moment i have to scrape the list, then run through excel to grab what i want and keep doing that but i have a funny feeling if i searched correctly i could get alot more tailored results.

    Hope someone can enlighten me quickly :)
     
  2. Stufferizer

    Stufferizer Regular Member

    Joined:
    Nov 6, 2012
    Messages:
    228
    Likes Received:
    68
    I just had a quick try with
    Code:
    inurl:.*dogs*.com/
    which brings up all .com's. You could add other tld and in this way make sure you only get the actual domains. You may as well add a "www" in front to disable subdomains with "dogs".
    Hope this helps. :)
     
    • Thanks Thanks x 1
  3. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Thanks, that semi worked, its still bringing up some inner pages but its certainly better than the results i was getting.

    I didnt know you could use wildcards for google.

    Also how would you do it for 2 keywords?

    inurl:. *dog*kennels*.com ??? (also, is the . required after the : )
     
  4. Rua999

    Rua999 Power Member

    Joined:
    Jun 25, 2011
    Messages:
    626
    Likes Received:
    406
    I was only looking for the same thing yesterday and came across this thread. I didn't have the patience to continue on testing with it but it seems it does work :)
     
    • Thanks Thanks x 1
  5. Stufferizer

    Stufferizer Regular Member

    Joined:
    Nov 6, 2012
    Messages:
    228
    Likes Received:
    68
    If you only have two different keywords, I would just go for both possibilities, i.e.
    Code:
    inurl:*dog*kennels*.com
    and
    Code:
    inurl:*kennels*dog*.com
    as different footprints.

    If you have a lot more to check for, maybe a little excel worksheet is what you would like to use.
    There you should have the possibility to generate all relevant combinations using a randomize function.

    The . is not required, you should probably even leave it out to be certain to get all domains without subdomain, i.e. dogs.c0m.
     
    • Thanks Thanks x 2
  6. BlackHatMack

    BlackHatMack Newbie

    Joined:
    Nov 12, 2012
    Messages:
    3
    Likes Received:
    0
    Occupation:
    Wouldn't you like to now! lol
    Location:
    Earth
    What if i wanted to use YouTube to do the same thing
     
  7. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Thanks, i took a look and it looks a very longwinded way of doing it with zone files etc.

    I just tried: inurl:*kennels*dog*.com

    However if you try it you will see the second result is: www.retailmenot.com ? Pets ? Dog Supplies ? Dog Crates

    im trying to idolate only tld's, dogs or kennels isnt in the url. Maybe you could have another look for me?

    I'd REALLY appreciate it!
     
  8. Stufferizer

    Stufferizer Regular Member

    Joined:
    Nov 6, 2012
    Messages:
    228
    Likes Received:
    68
    Dear Thesiege84,

    the following weird situation occurs for me and I can now fully understand your problem:
    If I am searching for
    Code:
    inurl:*dogs*kennels.com
    , I get very decent results:
    dogs-kennels-g-test.gif
    However, if I am searching for
    Code:
    inurl:*dogs*kennels*.com
    with one additional *, I get basically the url's to the keywords kennels and dogs. I really don't get why but what works in all cases is just one of the keywords, i.e.
    Code:
    inurl:*kennels*.com
    kennels-g-test.gif

    So there seems to be a problem with too many *. To solve this problem, I suggest you search for each keyword alone and sort out duplicates using excel (or via scrapebox' duplicate filter). I can understand that g puts a stop on too many wildcards etc. But what I absolutely not understand is the following. If I put exactly the same footprint into scrapebox and try to harvest (without proxies, to be comparable), this is what I get:
    kennels-scrapeb0x-test.png

    The result has absolutely nothing to do with my exact same search above - I think you experience the same problem.
    Maybe there is a problem with scrapebox or my usage of it. Hopefully someone of the more experienced scrapebox users can give us a hint here?!
     
    • Thanks Thanks x 1
  9. Stufferizer

    Stufferizer Regular Member

    Joined:
    Nov 6, 2012
    Messages:
    228
    Likes Received:
    68
    Hey BHMack, what do you mean by doing the same thing? Do you want to scrape youtube vid urls?
    Here you could just insert the footprint of youtube (see image, cannot post links):
    youtube-scrape.png
    and you are ready to go. But be sure to disable Options -> Automatically Remove Duplicate Domains since you will otherwise always get only one link :)
     
  10. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Thanks for putting so much effort into helping me.

    Ive gone down the route of each single word instead of keyword. After combining all the results and processing them in excel it actually doesnt look that bad. Im going to use these for guest blogging so someone will have to manually go through the list anyway so its not too bad.

    Thanks for the help!

    EDIT: Shame Google don't have intld:
     
    • Thanks Thanks x 1
    Last edited: Jan 25, 2013
  11. Thesiege84

    Thesiege84 Regular Member

    Joined:
    May 5, 2009
    Messages:
    360
    Likes Received:
    97
    Occupation:
    Money Churner
    Location:
    Churning Money
    Hey,

    Has anyone noticed a change with this?

    The same results dont come up anymore and im trying to use the same method to extract lists of domains with words ONLY in the domain name, i get stuff like www.eurobreeder.com showing up which doesnt have the word kennels in the domain... :(

    Does anyone know of a solution?

    inurl:*kennels*.com