1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How do I clean my harvested links?

Discussion in 'BlackHat Lounge' started by peter73, Nov 6, 2010.

  1. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
  2. ghprod

    ghprod Regular Member

    Joined:
    Mar 18, 2009
    Messages:
    230
    Likes Received:
    40
    Home Page:
    maybe using trim list function built in SB to get root domain, and add behind them "string" to register path :)
     
  3. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    I tried that. But SB trims the URL up to the root domain. And some forums have http://vbulletinf0rum.net/forum/

    ScrapeBox will also remove /forum/ so if I add the register.php? it'll just come out as an invalid link.

    Crap. I'm going crazy with this :(
     
  4. relaxin

    relaxin Junior Member

    Joined:
    Aug 13, 2007
    Messages:
    100
    Likes Received:
    25
    Occupation:
    CEO
    Do you know how to run php scripts on a server or
    local machine? If so I can code a small script for you
    to clean your list.
     
  5. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Check your pm mate :)

     
  6. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Anyone?
     
  7. mrmidjam

    mrmidjam Regular Member

    Joined:
    Sep 17, 2008
    Messages:
    438
    Likes Received:
    134
    1. trim to root in SB
    2. add list to spreadsheet column A (I use openoffice calc).
    3. add /forum/register.php? to column B.
    4. fill column B down the list.
    5. copy all filled fields.
    6. paste into text editor.
    7. remove spaces using find and replace.

    DONE
     
  8. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    My problem is, not all of the links has the format:
    domain.com/forum/register
    some has...
    forum.domain.com/register
    domain.com/vbulletin/register
    etc...

    So If I trim the root, and just add a the "register" a large percentage will come out as an invalid link.

    What I was hoping was a way to remove a part of the list after the desired word like:

    domain.com/forum/member.php?u=45743
    forum.domain.com/member.php?u=45743

    A way to specify the words in the url to be removed, in this case, any word/number/character that appears after member.php? will be removed.

    So after I filter out the links to only:
    domain.com/forum/member.php?
    forum.domain.com/member.php?

    I could easily do a "search" member.php? and "replace" with register.php
    That would make my life sweeter, but I don't think that it is possible :(







     
  9. mrmidjam

    mrmidjam Regular Member

    Joined:
    Sep 17, 2008
    Messages:
    438
    Likes Received:
    134
    think you need to go search for info on search & replace wildcards. I did some research on them a while back and it seems to be something that could help.

    Notepad++ has some advanced search capabilities.
     
  10. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Believe me I have, if I put the time and effort on manually cleaning the links, I would be done by now. But that would be counter-productive for the next time that I do this. So I'm not touching that list until I think of a way of doing it easily :D
     
  11. bakxos

    bakxos Regular Member

    Joined:
    Aug 8, 2010
    Messages:
    498
    Likes Received:
    292
    Location:
    Scotland
    if you manage to get rid off the part of the url you dont want, go to excel or openoffice and do this:

    1. Column A the list of urls
    2. Column B register.php or /register.php (depends)
    3. Column C =CONCATENATE(A1;B1)
    4 Drag the formula from C1 to the whole C column
     
  12. mrmidjam

    mrmidjam Regular Member

    Joined:
    Sep 17, 2008
    Messages:
    438
    Likes Received:
    134
    Been there, wildcards confused the hell outta me, so I gave up. Spent way too much time on it.

    Maybe if all the urls had ?u= in you could do a paste special within calc and split by "?u=" which would split the urls into 2 columns.

    Anyway good luck buddy
     
  13. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Thanks for the tip. But getting rid off the part of the URL that I don't want is the main thing that I'm trying to do.

    Actually, putting register.php at the end of my URL's is easy to do with notepad's search and replace option.

    Thanks anyway mate.

     
  14. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Do you mean splitting the url? Can I do that?? For example, can I specify...

    www.domain.com/forum/member.php?u=4358

    And splitting it from member.php?>>><<<<u=????

    If that is possible, please what program, can I use to do that?

    You're giving me a rush mate! Hope this one works! :D

     
  15. mrmidjam

    mrmidjam Regular Member

    Joined:
    Sep 17, 2008
    Messages:
    438
    Likes Received:
    134
    have the list in your clipboard and within Open Office Calc right click and choose paste special. You get the text import menu, you can use your own separator in the "Other" text field.

    Not sure how you do a phrase search i.e "?u=" though.
     
  16. bezopravin

    bezopravin BANNED BANNED

    Joined:
    May 11, 2010
    Messages:
    461
    Likes Received:
    3,471
    1. Open your List in Notepad++
    2. Press Ctrl+F
    3. Enter \php.* in Find What Text Box and php in Replace with Text Box
    4. Press Replace All Button or Alt+A(Shortcut Key)
    5. Now replace
    member.php
    showthread.php
    index.php
    forum.php
    panel.php
    new_topic.php
    newreply.php
    newthread.php
    topic.php
    etc... with register.php
    6. Your List is Clean!

    Hope you find this useful... :)

    Guide Shot :

    [​IMG]
     
    • Thanks Thanks x 1
  17. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Thanks mate!

    But I tried doing

    3. Enter \php.* in Find What Text Box and php in Replace with Text Box
    4. Press Replace All Button or Alt+A(Shortcut Key)


    It says 0 occurences was replaced.

    Are you sure its \php.*

    Thank you for helping me to the next level mate.

     
  18. bezopravin

    bezopravin BANNED BANNED

    Joined:
    May 11, 2010
    Messages:
    461
    Likes Received:
    3,471

    Yep, its \php.*

    I think that you have selected normal or extended option under search mode which will show you 0 occurrences was replaced.

    Make sure to Select Regular Expression under Search Mode in Replace Tab and press replace all.

    It should Work!
    :cool:

    Yet another Guide Shot :

    [​IMG]
     
    • Thanks Thanks x 1
  19. peter73

    peter73 Regular Member

    Joined:
    Jun 27, 2010
    Messages:
    341
    Likes Received:
    90
    Location:
    X marks the spot----------------------------X
    Ha! What can I say? bezopravin did it. He solved my problem that was making me crazy for 2 days now. Thanks and +rep is not enough to thank you Sir!

    You rock!

     
  20. relaxin

    relaxin Junior Member

    Joined:
    Aug 13, 2007
    Messages:
    100
    Likes Received:
    25
    Occupation:
    CEO