1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to extract links/URLs from pdf files?

Discussion in 'Black Hat SEO' started by kkvsam, Jan 24, 2011.

  1. kkvsam

    kkvsam Senior Member

    Joined:
    Oct 11, 2009
    Messages:
    936
    Likes Received:
    569
    Occupation:
    SYS ADMIN
    Home Page:
    Does anyone know how to extract links/URLs from pdf files?
    Greatly appreciated if you can help me.
     
  2. xthoms

    xthoms Regular Member

    Joined:
    Sep 14, 2010
    Messages:
    280
    Likes Received:
    99
    you copy the whole thing to front page, take the code that it gives u nd get somebody to make u a regex i think they're called that cant take all the link values.. i u sed to have one but unfortunately i accidently deleted it.
     
  3. kkvsam

    kkvsam Senior Member

    Joined:
    Oct 11, 2009
    Messages:
    936
    Likes Received:
    569
    Occupation:
    SYS ADMIN
    Home Page:
    I have lots of pdf file. So it is difficult to do manually. I'm searching for software that make easier...
    Anyone having these type of software?
     
  4. kkvsam

    kkvsam Senior Member

    Joined:
    Oct 11, 2009
    Messages:
    936
    Likes Received:
    569
    Occupation:
    SYS ADMIN
    Home Page:
    Does anyone have any software to do this?????
     
  5. chicken_dev

    chicken_dev Newbie

    Joined:
    Jan 13, 2011
    Messages:
    36
    Likes Received:
    8
    Home Page:
    Export pdf files to word files ==> copy url, links to other word file. Save him to html and import to Firefox.
     
  6. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2009
    Messages:
    5,544
    Likes Received:
    6,907
    Gender:
    Male
    Location:
    Ghosted
    almost :)

    if your pdfs are copy protected, you need to start with step 1, if they're free to copy, you can start with step 2

    step 1: convert your pdfs into word .doc: use Adobe Acrobat Pro or an online pdf to word converter:
    Code:
    http://www.pdfonline.com/pdf2word/index.asp
    step 2: copy-paste the whole document into the input window here, you can also download the lightweight html tool:
    Code:
    http://www.surf7.net/services/value-added-services/free-web-tools/email-extractor-lite/
    select 'url' as 'Type of address to extract', select your separator, hit extract and that's it
     
    • Thanks Thanks x 5
  7. kkvsam

    kkvsam Senior Member

    Joined:
    Oct 11, 2009
    Messages:
    936
    Likes Received:
    569
    Occupation:
    SYS ADMIN
    Home Page:
    Thank you bro. But which means there's no software we can extract URLs from pdf files?
    I have more then 100 pdf files and I think there will be more than 100 URLs per file....
     
  8. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2009
    Messages:
    5,544
    Likes Received:
    6,907
    Gender:
    Male
    Location:
    Ghosted
    i haven't seen any yet, but it doesn't mean there isn't any, because i haven't searched for any

    and i forgot to mention that you cannot convert every type of protected pdfs to word with that tool above or with any other which is capable to do that
    (you can protect a pdf different kind of ways/on different levels)
     
  9. Cloaks

    Cloaks Regular Member

    Joined:
    Mar 20, 2010
    Messages:
    298
    Likes Received:
    90
    I know of one. It's originally a tool designed for penetration testing, but it'll do what you want. It should be on one of DefCon's pages.
    Google for "FOCA" it's made by some Spanish guys. Hope that helps :)
     
  10. 4i4i78

    4i4i78 Registered Member

    Joined:
    Jan 6, 2010
    Messages:
    83
    Likes Received:
    10
    The best tool for this is bareGrep. Just google it.
     
    • Thanks Thanks x 1
  11. omaigadlol

    omaigadlol Registered Member

    Joined:
    Oct 25, 2008
    Messages:
    65
    Likes Received:
    18
    I can make a tool to extract links from PDF files, if anyone is interested.
     
  12. Esgrimidor

    Esgrimidor Newbie

    Joined:
    Jun 26, 2009
    Messages:
    7
    Likes Received:
    0
    I am interested.
    Would you like now ?

    Best Regards
     
  13. bosmolskate

    bosmolskate Jr. VIP Jr. VIP Premium Member

    Joined:
    Jul 27, 2010
    Messages:
    1,396
    Likes Received:
    339
    Occupation:
    None
    Location:
    California
    I have a PDF with images and hundreds of links, is there a way to extract the links only without the images?