1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to remove duplicate domains but treat subdomains as different domains?

Discussion in 'Black Hat SEO' started by firstnamelastname, Jun 21, 2017.

  1. firstnamelastname

    firstnamelastname Regular Member

    Joined:
    Jun 20, 2015
    Messages:
    203
    Likes Received:
    34
    I have a list of 1000 urls like

    ebay.com/iue
    ebay.com/aieruf
    cnn.com/asdfh
    subs.ebay.com/sdfhsdf
    cnn.com/ahjdf

    I want to remove all duplicate domains. This is possible to do using scrapebox. But the problem is, I want to keep all unique subdomains. meaning I want to treat subdomains as unique domains. for example, in the above list out of
    ebay.com/iue
    ebay.com/aieruf
    subs.ebay.com/sdfhsdf

    one of the first 2 would remain, the other one would be removed. the third one would remain.

    How can I do this?
     
  2. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    5,019
    Likes Received:
    3,192
    Location:
    Europe
    Home Page:
    Subdomains are treated as unique domains by Scrapebox.

    I read your post 3 times, but I have no idea what you want to achieve honestly. Try to re-formulate it and add an example that's easier to understand.
     
  3. firstnamelastname

    firstnamelastname Regular Member

    Joined:
    Jun 20, 2015
    Messages:
    203
    Likes Received:
    34

    oh really? sweet, I didn't know that. That solves my problem. ugh...I have been sitting here removing duplicates by hand :(
    I think you must have understood my question since you answered it?

    I mean I don't know how else to explain it?
    cnn.com/news
    cnn.com/henry
    editor.cnn.com/henry
    editor.cnn.com/mo
    fuck.cnn.com/lso

    when I remove duplicate domains, I want one of
    cnn.com/news
    cnn.com/henry

    to remain and the other one to be deleted. I want one of
    editor.cnn.com/henry
    editor.cnn.com/mo

    to remain and the other one to be deleted

    I want
    fuck.cnn.com/lso
    to remain
     
  4. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    5,019
    Likes Received:
    3,192
    Location:
    Europe
    Home Page:
    I see.

    And can't you just trim them to root and remove duplicate URLs?

    Do you necessarily need that "/whatever" and not just root of the subdomain?
     
    • Thanks Thanks x 1
  5. firstnamelastname

    firstnamelastname Regular Member

    Joined:
    Jun 20, 2015
    Messages:
    203
    Likes Received:
    34
    you know, I thought about that, but I would like to keep the /whatever just in case it's needed.
     
  6. Nargil

    Nargil Jr. VIP Jr. VIP

    Joined:
    May 10, 2012
    Messages:
    5,019
    Likes Received:
    3,192
    Location:
    Europe
    Home Page:
    I don't think you can force Scrapebox to randomly pick URL and remove it, especially if it's not duplicate. Either go with all the URLs or trim to root and go from there.
     
  7. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:
    Scrapebox already does what you want when you choose to remove duplicate domains. I also did not fully understand your first post. But based on your followup, scrapebox already does everything you want when you click remove/filter >> remove duplicate domains.
     
  8. living2xl

    living2xl Jr. VIP Jr. VIP

    Joined:
    Dec 9, 2011
    Messages:
    1,734
    Likes Received:
    415
    Occupation:
    Sippin dat juice - Shout it louder!
    Location:
    Not sleeping!
    Home Page:
    but it doesnt allow you to keep only unique root domains
    it always keeps the subdomains!
    I have to keep resorting to another site to do this filtering
     
  9. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:
    just click the "Trim to root" button to trim it to root before you remove duplicate domains.
     
  10. living2xl

    living2xl Jr. VIP Jr. VIP

    Joined:
    Dec 9, 2011
    Messages:
    1,734
    Likes Received:
    415
    Occupation:
    Sippin dat juice - Shout it louder!
    Location:
    Not sleeping!
    Home Page:
    doesnt remove subdomains

    The issue remains scrapebox is unable to filter a list of urls to just ROOT domains and no subdomains

    Example:
    abc.blogspot.com/
    abc.com/aaff

    trim to root

    abc.blogspot.com
    abc.com

    remove duplicate domain filter ( no change to subdomains!!!)
    abc.blogspot.com
    abc.com

    DESIRED RESULT ( not possible with sb)
    blogspot.com
    abc.com

    Not everyone wants the subdomains included!
     
  11. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:

    The below will work, with the exception of blogspot:

    Just click remove/filter >> remove subdomains from domains

    then

    trim to root

    then remove duplicate domains

    Maybe you want to just read thru all the menu items? Probably quicker then your exclamatory post. ;)


    ~~~~

    as for blogspot.com

    abc.blogspot.com

    is a domain. blogspot.com is a top level domain, blog spot registered it. At least as far as https://publicsuffix.org/ is concerned, which is what scrapebox uses to remove sub domains from domains.
     
    • Thanks Thanks x 1
  12. living2xl

    living2xl Jr. VIP Jr. VIP

    Joined:
    Dec 9, 2011
    Messages:
    1,734
    Likes Received:
    415
    Occupation:
    Sippin dat juice - Shout it louder!
    Location:
    Not sleeping!
    Home Page:
    Am using so for years I think this was added with 2.+ thanks for the help
     
  13. loopline

    loopline Jr. VIP Jr. VIP

    Joined:
    Jan 25, 2009
    Messages:
    3,875
    Likes Received:
    2,058
    Gender:
    Male
    Home Page:
    Yes it was added in 2015 I think when V2 came out.