It is possible to know from which site the bot/crawler is coming when following a link?

Guybrushthepirate

Regular Member
Dec 23, 2015
280
167
Pretty much what I am asking in the title. I'd like to know if there is any form to know from which source a crawler is coming when reaching a URL, linked to another.

Example: in site1.com there is a link pointing to site2.com.

Is there a chance for the owner of site2.com, to know that the crawler is reaching its site following the link in site1.com?

Of course the crawler will find and index site2.com from many different sources, but I'd like to know if there is a way to know, every time a crawler visit a URL because of a link pointing to that specific URL in another site, the origin/source from which a bot comes from.
 
If they are crawling with a full browser then you might get lucky and have the referral header set, otherwise no if I understand your question correctly
 
Entirely depends on the bot. I'm not sure what kind of bot you're thinking about but the answer is probably not.
 
We might be able to help more if you explain why you want to know this :)
 
Pretty much what I am asking in the title. I'd like to know if there is any form to know from which source a crawler is coming when reaching a URL, linked to another.

Example: in site1.com there is a link pointing to site2.com.

Is there a chance for the owner of site2.com, to know that the crawler is reaching its site following the link in site1.com?

Of course the crawler will find and index site2.com from many different sources, but I'd like to know if there is a way to know, every time a crawler visit a URL because of a link pointing to that specific URL in another site, the origin/source from which a bot comes from.

Yes, Guybrushthepirate, in WP under heading Tool > Redirection > Log It will give you the source URL and User-Agent (where the bot/Crawler came from and ip address.)
 

Attachments

  • Screen Shot 2019-11-20 at 1.32.44 PM.png
    Screen Shot 2019-11-20 at 1.32.44 PM.png
    267.5 KB · Views: 2
If they are crawling with a full browser then you might get lucky and have the referral header set, otherwise no if I understand your question correctly

We might be able to help more if you explain why you want to know this :)

Thanks jamie3000, I am afraid I can't express myself properly as English is not my native language, but I'll try my best. Basically I'd like to be able to set some sort of rule that says: if Google bot is coming to my URL from that specific link, then I'd like to [ACTION TO BE DEFINED]. In order to do that, I need to know how (from which linking URL) the bot is finding my destination URL.

Entirely depends on the bot. I'm not sure what kind of bot you're thinking about but the answer is probably not.

Thanks for taking time to answer. I guess, as your says, that probably it is not possible. I am referring to Google bot anyway.
 
I just realized that I didn't provide probably important information. This type of verification should be done in real time, so that whenever the Google bot is trying to visit a specific URL of my website, coming from a link placed in the www.example.com website, I can decide whether to let it crawl the URL or not.
 
Can you provide a little more information, please? How would you do that?

No, he can't. The Google bot does not have the referer header.

If I remember rightly (it's been a while) you can technically see this by removing the page and looking at the crawl errors since Google will tell you the pages linking to that page. It does mean breaking that page for a while but I can't think of another way unless backlink spiders have found the link as well.

In general, most bots won't include the referer unless they're trying to pretend to be a real browser.
 
No, he can't. The Google bot does not have the referer header.

If I remember rightly (it's been a while) you can technically see this by removing the page and looking at the crawl errors since Google will tell you the pages linking to that page. It does mean breaking that page for a while but I can't think of another way unless backlink spiders have found the link as well.

In general, most bots won't include the referer unless they're trying to pretend to be a real browser.

Thanks SEOMadHatter, I'll do the test you suggest, but it makes sense that bots won't include header referer. I thought it would be difficult to do what I had in mind.
 
Unless you control the link on example.com adding a custom header X-something could identify what you need other than that is a bad use case like other mentioned you cannot trust external headers
 
Unless you control the link on example.com adding a custom header X-something could identify what you need other than that is a bad use case like other mentioned you cannot trust external headers

Is not the case. Example.com is a site that I don't control directly. "Simply" I'm trying to figure out if there's a way to identify the cases in which Google bot tries to access my URL, coming through a link on example.com.
 
Well it's a fundamental part how the internet works but it's not something that would work out on a short base
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock