Link scraping question

Discussion in 'Link Building' started by jackfitz, Jun 9, 2010.

  1. jackfitz

    jackfitz Newbie

    Joined:
    May 2, 2009
    Messages:
    27
    Likes Received:
    1
    I am scraping wordpress blogs. This generates a list of URLS which are not the top domain e.g. http://www.someblog.com. they are posts e.g. http://www.somblog.com/somepost.php what is the best way to remove the /some post from the url so I am left with just the domain http://www.somedomain.com
    The reason is I can then run them through a pr checker.
     
  2. lincolnave

    lincolnave Regular Member

    Joined:
    Dec 5, 2008
    Messages:
    393
    Likes Received:
    184
    Occupation:
    Building Bots and Arduinos
    Location:
    Outside NYC
    I believe scrape box does this, or something like it. Check it out.
     
  3. suzieq

    suzieq Junior Member

    Joined:
    Sep 2, 2009
    Messages:
    115
    Likes Received:
    16
    Occupation:
    Web promoter, director
    Location:
    UK - Paradise on earth! ;-)
    Scrapebox will do it - I can do it for you if you want to pm me the list.
    Otherwise you can probably manipulate the data fairly simply in a spreadsheet.
     
  4. jackzard

    jackzard Newbie

    Joined:
    Dec 23, 2009
    Messages:
    8
    Likes Received:
    1
    for better maybe use delete domain,. scrape box can do that
     
  5. haverox

    haverox Regular Member

    Joined:
    Oct 6, 2009
    Messages:
    282
    Likes Received:
    151
    Occupation:
    Internet marketer/entrepenuer
    Location:
    The US Federal Reserve
    Just click on trim to root