1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Link scraping question

Discussion in 'Link Building' started by jackfitz, Jun 9, 2010.

  1. jackfitz

    jackfitz Newbie

    Joined:
    May 2, 2009
    Messages:
    27
    Likes Received:
    1
    I am scraping wordpress blogs. This generates a list of URLS which are not the top domain e.g. http://www.someblog.com. they are posts e.g. http://www.somblog.com/somepost.php what is the best way to remove the /some post from the url so I am left with just the domain http://www.somedomain.com
    The reason is I can then run them through a pr checker.
     
  2. lincolnave

    lincolnave Jr. VIP Jr. VIP Premium Member

    Joined:
    Dec 5, 2008
    Messages:
    392
    Likes Received:
    184
    Occupation:
    Building Bots and Arduinos
    Location:
    Outside NYC
    I believe scrape box does this, or something like it. Check it out.
     
  3. suzieq

    suzieq Junior Member

    Joined:
    Sep 2, 2009
    Messages:
    115
    Likes Received:
    16
    Occupation:
    Web promoter, director
    Location:
    UK - Paradise on earth! ;-)
    Scrapebox will do it - I can do it for you if you want to pm me the list.
    Otherwise you can probably manipulate the data fairly simply in a spreadsheet.
     
  4. jackzard

    jackzard Newbie

    Joined:
    Dec 23, 2009
    Messages:
    8
    Likes Received:
    1
    for better maybe use delete domain,. scrape box can do that
     
  5. haverox

    haverox Regular Member

    Joined:
    Oct 6, 2009
    Messages:
    270
    Likes Received:
    144
    Occupation:
    Internet marketer/entrepenuer
    Location:
    The US Federal Reserve
    Just click on trim to root