1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Filter links

Discussion in 'Black Hat SEO' started by rostonix, May 20, 2012.

  1. rostonix

    rostonix Senior Member

    Joined:
    Dec 20, 2009
    Messages:
    897
    Likes Received:
    1,446
    Occupation:
    Developer
    Location:
    Russia
    Home Page:
    Well, lets say I have a scraped list of 200000 backlinks.

    I need to filter web 2.0, common profiles and freehosts.
    At the end i wanna get list which will include that kind of info:

    200 sites have *wordpress.com* in URL
    100 sites have *freehost100.com* in URL
    30 sites have *flickr.com* in URL

    Or any similar.
    Any tool for this?
     
  2. xpleet

    xpleet Regular Member

    Joined:
    Jan 18, 2010
    Messages:
    377
    Likes Received:
    327
    Location:
    Morocco
    Use Notpad++
     
  3. rostonix

    rostonix Senior Member

    Joined:
    Dec 20, 2009
    Messages:
    897
    Likes Received:
    1,446
    Occupation:
    Developer
    Location:
    Russia
    Home Page:
    I bet you didnt understand what i meant )
    English is not my first language )

    The main purpose of the task is not to calculate number of wordpress links (for example) but to find that the pretty big amount of URLS are of unknown for me freehost blablabla.com :cool:

    Not filter. But research.
     
  4. xpleet

    xpleet Regular Member

    Joined:
    Jan 18, 2010
    Messages:
    377
    Likes Received:
    327
    Location:
    Morocco
    I'm still not sure what do you mean by "filter links".
    Anyway, if you want to find what platform is used in each URL, you can try the plugin "Blog Analyser" in SB to find what blog platform is there, but unfortunately this plugin only support blog platforms.
     
  5. Dang3r81

    Dang3r81 Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 18, 2011
    Messages:
    301
    Likes Received:
    235
    Location:
    Germany
    Home Page:
    @webmadter

    When i understand right, you will insert your 200k list and the tool should extract all different domainnames.

    So all different domains should save in one file.

    one file wordpress in url
    one file tumblr in url
    one file fiverr in url
    ....

    When this is what you need, then i can code you a small tool for free.

    But i must go now out with family and we'll be back in 2 hours.

    When you until then have no solution, i code it asap and send it you.

    Regards,
    Manuel
     
    • Thanks Thanks x 1
  6. rostonix

    rostonix Senior Member

    Joined:
    Dec 20, 2009
    Messages:
    897
    Likes Received:
    1,446
    Occupation:
    Developer
    Location:
    Russia
    Home Page:
    @Dang3r81

    I really haven't found the solution yet.
    I play with regex for now but still no luck.

    Lets say we have a list of 5 URLs:

    Code:
    http://www.thebeautifulgirls.com/board/profile.php?id=178690
    http://thebest-foryou.com.ua/user/FernandoRC/
    http://thebestresourcesite.com/forum/profile.php?mode=viewprofile&u=256661
    http://sub1.zwp2zietek.pl/memberlist.php?mode=viewprofile&u=214
    http://samantha.wordpress.com/2010/hello-here
    http://sub2.wordpress.com/2011/la-la-la
    As an output i wish to see something like:

    Code:
    http://www.thebeautifulgirls.com/board/profile.php?id=178690   thebeautifulgirls.com
    http://thebest-foryou.com.ua/user/FernandoRC/   thebest-foryou.com.ua
    http://thebestresourcesite.com/forum/profile.php?mode=viewprofile&u=256661   thebestresourcesite.com
    http://sub1.zwp2zietek.pl/memberlist.php?mode=viewprofile&u=214   zwp2zietek.pl
    http://samantha.wordpress.com/2010/hello-here   wordpress.com
    http://sub2.wordpress.com/2011/la-la-la   wordpress.com
    I mean 1 txt file as result:
    URL - [tabulation or couple of spaces] - domain
    URL2 - [tabulation or couple of spaces] - domain
    It will be easily to input this data in excel and identify most common domains and freehosts with web2.0 properties and backlinks opportunities.

    The main problem for me right now is to identify the right regex which will exclude domains :croc:
     
    Last edited: May 20, 2012
  7. Dang3r81

    Dang3r81 Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 18, 2011
    Messages:
    301
    Likes Received:
    235
    Location:
    Germany
    Home Page:
    so i am back now,

    i start now to code it. your example was good enough for me :)
     
    • Thanks Thanks x 1
  8. rostonix

    rostonix Senior Member

    Joined:
    Dec 20, 2009
    Messages:
    897
    Likes Received:
    1,446
    Occupation:
    Developer
    Location:
    Russia
    Home Page:
    i really appreciate your help so much! :flypig:
     
  9. Dang3r81

    Dang3r81 Jr. VIP Jr. VIP Premium Member

    Joined:
    Jan 18, 2011
    Messages:
    301
    Likes Received:
    235
    Location:
    Germany
    Home Page:
    Hi webmadter,

    i'd pm'ed you the listfilter Tool.

    Its easy to use. In the left box put the Urls and then press the Button. In the right Box you become the Output.

    Its nothing Special, but it works. I tested it with a 160k list. Takes about 1 - 2 mins.

    Regards,
    Manuel
     
    • Thanks Thanks x 1