1. This website uses cookies to improve service and provide a tailored user experience. By using this site, you agree to this use. See our Cookie Policy.
    Dismiss Notice

Logfiles: What kind of URL is /news.xml/16851/[Kategoriename].html ?

Discussion in 'Black Hat SEO' started by Aygen, Sep 4, 2018.

  1. Aygen

    Aygen Newbie

    Sep 4, 2018
    Likes Received:
    Hi, while analyzing the log files (Kibana+Excel) of my website, I noticed that the Google Desktop Bot crawls most often "monster URLs" that look like this: /news.xml/16851/[category name]. html
    I can't explain to myself how such URLs are created and why the bot looks at them so often. The mobile bot never crawls these URLs, for example.

    I can only explain parts of such URLs:
    /news.xml -> The website has a news sitemap
    16851 -> The site runs on Drupal 7, but the node (a news article) was deindexed months ago and set to 404
    Category names -> The categories are still active

    The URLs are nowhere internally linked (I think) and also not in the index

    Thanks for all opinions in advance.
    Last edited: Sep 4, 2018