Any way to extract links from these kind of pages

Discussion in 'Black Hat SEO Tools' started by Knoxgates, Mar 27, 2012.

  1. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    919
    As the title says anyway to extract links from these kind of pages. Like i have a links from ezine articles
    Code:
    http://ezinearticles.com/?Find-Out-How-To-Recover-Deleted-Files&id=6951780
    When i m trying to extract links from this page i get only these links using scrapebox link extractor

    Code:
    http://ezinearticles.com/
    http://ezinearticles.com/?id=6965016&The-Importance-Of-Having-A-Server-Backup=
    http://ezinearticles.com/?type=experts
    http://ezinearticles.com/about.html
    http://ezinearticles.com/advertise/
    http://ezinearticles.com/affiliates/
    http://ezinearticles.com/author-terms-of-service.html
    http://ezinearticles.com/benefits/
    http://ezinearticles.com/cartoons/
    http://ezinearticles.com/contact.html
    http://ezinearticles.com/editorial-guidelines/
    http://ezinearticles.com/endorsements/
    http://ezinearticles.com/faq/
    http://ezinearticles.com/premium/
    http://ezinearticles.com/privacy-policy.html
    http://ezinearticles.com/rss/
    http://ezinearticles.com/sitemap.html
    http://ezinearticles.com/submit/
    http://ezinearticles.com/subscribe/
    http://ezinearticles.com/terms-of-service.html
    http://ezinearticles.com/training/
    http://ezinearticles.com/videos/ 
    But there are other URL's in this format like this[which i need to extract]

    Code:
    <a href="/?Store-Sensitive-Information-Using-Windows-Server-Backup&id=6712577">Store Sensitive Information Using Windows Server Backup</a>
    <a href="/?What-to-Look-for-in-a-Server-Backup-Program&id=6169284">What to Look for in a Server Backup Program</a>
    <a href="/?Exchange-Server-Backup---A-Need-for-All-Businesses&id=6404457">Exchange Server Backup - A Need for All Businesses</a>
    <a href="/?Linux-Server-Backup---Unmatched-Server-Client-Backup&id=6368706">Linux Server Backup - Unmatched Server-Client Backup</a>
    <a href="/?Why-You-Need-a-Windows-Server-Backup&id=6617088">Why You Need a Windows Server Backup</a>
    <a href="/?Speed-Benefits-of-Image-Based-Server-Backups&id=2063515">Speed Benefits of Image Based Server Backups</a>
    <a href="/?Top-10-PC-and-Server-Backup-Features&id=5654062">Top 10 PC and Server Backup Features</a>
    <a href="/?Windows-Server-Backup-Top-Features&id=6408791">Windows Server Backup Top Features</a>
    <a href="/?How-An-Online-Server-Backup-Works?&id=4024999">How An Online Server Backup Works?</a>
    <a href="/?All-About-Server-Backup-Services&id=5572304">All About Server Backup Services</a>
    <a href="/?How-To-Prevent-Data-Loss-From-Occurring&id=6965139">How To Prevent Data Loss From Occurring</a>
    <a href="/?Todays-Password-Problems-and-Solutions&id=6956004">Today's Password Problems and Solutions</a>
    <a href="/?Effective-Backup-and-Recovery-Solution-for-Your-Business&id=6955804">Effective Backup and Recovery Solution for Your Business</a>
    <a href="/?All-About-An-Online-Backup&id=6962822">All About An Online Backup</a>
    <a href="/?The-Importance-of-Backup-Solutions&id=6950660">The Importance of Backup Solutions</a>
    <a href="/?Why-You-Need-Backup-Email-Service&id=6957007">Why You Need Backup Email Service</a>
    <a href="/?Understanding-the-Role-of-Differential-Backup&id=6956997">Understanding the Role of Differential Backup</a>
    <a href="/?How-To-Choose-A-Good-Online-Storage-Site&id=6949891">How To Choose A Good Online Storage Site</a>
    <a href="/?Find-Out-How-To-Recover-Deleted-Files&id=6951780">Find Out How To Recover Deleted Files</a>
    <a href="/?Cheap-Online-Backup-Solutions-for-Your-Computer&id=6948789">Cheap Online Backup Solutions for Your Computer</a>
    
    
    http://ezinearticles has been stripped out in the source code for these links.

    Anyway to scrape these kind of pages. I have thousands of pages, It's very time consuming to do it manually.

    Please Help....
     
  2. kokoloko75

    kokoloko75 Elite Member

    Joined:
    Jan 1, 2011
    Messages:
    1,628
    Likes Received:
    1,943
    Occupation:
    Design director
    Location:
    Paris (France)
    You can use a regular expression with a regex extractor, like this one :
    Code:
    http://codecanyon.net/item/regex-extractor-extract-everything-simply-/1327433
    I tested, you should use this regular expression :
    Code:
    http://pastebin.com/b0GRy3j9
    You'll got that :

    [​IMG]

    Export results in text file, and open it with Notepad.
    Use search-and-replace, like :
    Code:
    Search : <a href="
    Replace : http://ezinearticles.com
    
    Search (without space) : & amp;
    Replace : &
    
    Search : ">
    Replace : [I]nothing[/I]
    Finally, you'll get that :

    [​IMG]

    Easy, right ?

    Beny
     
    • Thanks Thanks x 2
  3. bulldawg88

    bulldawg88 Junior Member

    Joined:
    Jan 13, 2012
    Messages:
    166
    Likes Received:
    106
    Location:
    San Diego, CA
    Code:
    http://www.webmaster-toolkit.com/link-extractor.shtml
    Bulldawg
     
  4. Knoxgates

    Knoxgates Supreme Member

    Joined:
    Aug 9, 2008
    Messages:
    1,266
    Likes Received:
    919
    @kokoloko75: Yes this should work . Rep Added

    have thousand of url's, extracting links 1 by 1 is too time consuming.
     
  5. An71qu3

    An71qu3 Junior Member

    Joined:
    Apr 26, 2009
    Messages:
    190
    Likes Received:
    167
    I can make u a custom bot for that .. add me on skype:an71qu3