1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to automate removal of parameters from scraped urls

Discussion in 'Link Building' started by igor1, Dec 17, 2010.

  1. igor1

    igor1 Newbie

    Joined:
    Jan 3, 2010
    Messages:
    35
    Likes Received:
    7
    I am scraping urls with scrapebox and have stumbling block on the road to automation...
    I have a bunch of urls like
    Code:
    http://site.com/forums/profile.php?mode=viewprofile&u=958
    http://site2.com/board/profile.php?mode=viewprofile&u=11319
    http://site3.com/board/profile.php?mode=viewprofile&u=2355&sid=33caba4492dc667bc5380629c6b34cde
    
    How do I remove parameters after profile.php using Excel or whatever?
    Thank you for your help in advance.
     
  2. givemelove

    givemelove Junior Member

    Joined:
    Feb 27, 2010
    Messages:
    107
    Likes Received:
    87
    you can use notepad++ , regex replace feature
     
    • Thanks Thanks x 1
  3. igor1

    igor1 Newbie

    Joined:
    Jan 3, 2010
    Messages:
    35
    Likes Received:
    7
    can to help, please?
    I tried to find profile.php[^a-zA-Z0-9] and replace to profile.php but it only replaced one symbol, and I need to replace all symbols in the line...
     
  4. igor1

    igor1 Newbie

    Joined:
    Jan 3, 2010
    Messages:
    35
    Likes Received:
    7
    Thanks for guidance, I figured it: profile.php?([a-zA-Z0-9+=+&+]+)
    BHW is a great place to learn!
     
  5. movieman32

    movieman32 Regular Member

    Joined:
    Aug 6, 2008
    Messages:
    371
    Likes Received:
    346
    There is an easier way in Excel.
    Use find and replace

    Find .php*
    Replace .php

    This will remove everything after the .php
     
  6. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,249
    Likes Received:
    3,498
    Occupation:
    Full time IM
    regex: (\?.*)
    replace with nothing