How to automate removal of parameters from scraped urls

Discussion in 'Link Building' started by igor1, Dec 17, 2010.

  1. igor1

    igor1 Newbie

    Joined:
    Jan 3, 2010
    Messages:
    35
    Likes Received:
    7
    I am scraping urls with scrapebox and have stumbling block on the road to automation...
    I have a bunch of urls like
    Code:
    http://site.com/forums/profile.php?mode=viewprofile&u=958
    http://site2.com/board/profile.php?mode=viewprofile&u=11319
    http://site3.com/board/profile.php?mode=viewprofile&u=2355&sid=33caba4492dc667bc5380629c6b34cde
    
    How do I remove parameters after profile.php using Excel or whatever?
    Thank you for your help in advance.
     
  2. givemelove

    givemelove Junior Member

    Joined:
    Feb 27, 2010
    Messages:
    119
    Likes Received:
    90
    you can use notepad++ , regex replace feature
     
    • Thanks Thanks x 1
  3. igor1

    igor1 Newbie

    Joined:
    Jan 3, 2010
    Messages:
    35
    Likes Received:
    7
    can to help, please?
    I tried to find profile.php[^a-zA-Z0-9] and replace to profile.php but it only replaced one symbol, and I need to replace all symbols in the line...
     
  4. igor1

    igor1 Newbie

    Joined:
    Jan 3, 2010
    Messages:
    35
    Likes Received:
    7
    Thanks for guidance, I figured it: profile.php?([a-zA-Z0-9+=+&+]+)
    BHW is a great place to learn!
     
  5. movieman32

    movieman32 Regular Member

    Joined:
    Aug 6, 2008
    Messages:
    371
    Likes Received:
    346
    There is an easier way in Excel.
    Use find and replace

    Find .php*
    Replace .php

    This will remove everything after the .php
     
  6. madoctopus

    madoctopus Supreme Member

    Joined:
    Apr 4, 2010
    Messages:
    1,270
    Likes Received:
    3,527
    Occupation:
    Full time IM
    regex: (\?.*)
    replace with nothing