1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Remove root urls from a big list

Discussion in 'Black Hat SEO' started by adward, Jan 19, 2012.

  1. adward

    adward Power Member

    Joined:
    Nov 22, 2007
    Messages:
    722
    Likes Received:
    221
    I think there is no way to remove root urls using Scrapebox. I don't know which tool can help? Thanks for your input. :)
     
  2. philionaire

    philionaire Regular Member

    Joined:
    Mar 20, 2010
    Messages:
    212
    Likes Received:
    180
    Location:
    Vanland
    why not put list into note pad.

    replace URL with nothing (leave blank)

    HTH
     
  3. adward

    adward Power Member

    Joined:
    Nov 22, 2007
    Messages:
    722
    Likes Received:
    221
    Maybe you don't get my point. Say you scrape a big list and don't want to waste time posting on root urls. I don't think notepad can help in this case.
     
  4. sfidirectory

    sfidirectory Senior Member

    Joined:
    Mar 29, 2010
    Messages:
    899
    Likes Received:
    483
    Occupation:
    Web developer/BTC enthusiast
    Location:
    php artisan make:migration
    Home Page:
    If you know PHP, Java or Javascript programming you could use a regular expression to just output non-root urls to the screen? If you can give me some examples of your problem maybe I can code something up if I have time :).
     
    • Thanks Thanks x 1
  5. philionaire

    philionaire Regular Member

    Joined:
    Mar 20, 2010
    Messages:
    212
    Likes Received:
    180
    Location:
    Vanland
    Ah, sorry, not sure how to do that. I usually look at peoples posts/time theyve been here before commenting, but this time I never. If I had, I doubt I would have replied as im sure you know how to do more than most.

    Thought it was a simple problem with a simple solution > on to the programmers!
     
  6. Busrunner

    Busrunner Junior Member

    Joined:
    Nov 26, 2011
    Messages:
    130
    Likes Received:
    27
    I tried it in excel without VBA, but it was a bit too tacky, because h there could be subdomains and TLD's with longer extensions. But yeah, build a script for it. Wth some search you can find a list with most common extensions like .com, .co.uk, .net, etc. Ten the rest is easy.
     
  7. wannabie

    wannabie Elite Member

    Joined:
    Mar 11, 2009
    Messages:
    3,807
    Likes Received:
    2,954
    Occupation:
    Seo and Marketing Suprisingly
    Location:
    Your bedroom window
    Home Page:
    text to columns? in excel?
     
  8. Busrunner

    Busrunner Junior Member

    Joined:
    Nov 26, 2011
    Messages:
    130
    Likes Received:
    27
    With what seperator? A dot?
    What happens to:
    Google.com
    Www.google.co.uk
     
  9. SEOWhizz

    SEOWhizz Power Member

    Joined:
    Oct 22, 2011
    Messages:
    606
    Likes Received:
    432
    Location:
    Lat: 38N 43' 11.298" Long: 27W 12' 7.733"
    The Hi Speed Duplicate domain and Root url remover should help:
    Code:
    http://scrapeboxmarketplace.com/scrapebox-helper-tools
    :smokin:
     
    • Thanks Thanks x 2
  10. sfidirectory

    sfidirectory Senior Member

    Joined:
    Mar 29, 2010
    Messages:
    899
    Likes Received:
    483
    Occupation:
    Web developer/BTC enthusiast
    Location:
    php artisan make:migration
    Home Page:
    I see SEOWhizz has provided a tool that could simply do this for you, it might be worth checking out :). Am not sure if I will do the script today as I've got to do some I.M work on one of my sites and then do a massive cleanup of my hard drives - have got shares from here that havnt been touched in over 3 months lol.

    To do such a script wouldn't be too hard... All the script needs to do is to read lines from a file (.txt files would be simpler), check if the regex matches that line (in this case a URL - regex formatted to block root urls as requested) and then output all non-root urls to another file or the screen (if you are copying and pasting the urls into a textfield).
     
  11. adward

    adward Power Member

    Joined:
    Nov 22, 2007
    Messages:
    722
    Likes Received:
    221
    Thanks mate. This will be really helpful for me. Cheers.
     
  12. SEOWhizz

    SEOWhizz Power Member

    Joined:
    Oct 22, 2011
    Messages:
    606
    Likes Received:
    432
    Location:
    Lat: 38N 43' 11.298" Long: 27W 12' 7.733"
    Hey guys,

    I tried the link above and I didn't receive the tool, so I contacted MAtt aka loopline. He very kindly provided the following links:

    Scrapebox Classroom Domain Cleaner:
    Allows you to keep only a certain number of urls from a given domain, so you don't over spam that domain. Also allows you to remove root urls from your list.

    Tutorial:
    http://www.youtube.com/watch?v=j6znY88iOqs

    Download:
    Code:
    http://www.scrapeboxclassroom.com/tools/scrapebox-classroom-domain-cleaner.zip
    Thanks MAtt :)
     
    • Thanks Thanks x 1
  13. wannabie

    wannabie Elite Member

    Joined:
    Mar 11, 2009
    Messages:
    3,807
    Likes Received:
    2,954
    Occupation:
    Seo and Marketing Suprisingly
    Location:
    Your bedroom window
    Home Page:
    Trailing slash normally works considering sb spits out http:// and most sites structure pages with / then all you need to do is remove duplicates, a 5 minute job