Discussion in 'Black Hat SEO' started by kkvsam, Jan 24, 2011.
Does anyone know how to extract links/URLs from pdf files?
Greatly appreciated if you can help me.
you copy the whole thing to front page, take the code that it gives u nd get somebody to make u a regex i think they're called that cant take all the link values.. i u sed to have one but unfortunately i accidently deleted it.
I have lots of pdf file. So it is difficult to do manually. I'm searching for software that make easier...
Anyone having these type of software?
Does anyone have any software to do this?????
Export pdf files to word files ==> copy url, links to other word file. Save him to html and import to Firefox.
if your pdfs are copy protected, you need to start with step 1, if they're free to copy, you can start with step 2
step 1: convert your pdfs into word .doc: use Adobe Acrobat Pro or an online pdf to word converter:
step 2: copy-paste the whole document into the input window here, you can also download the lightweight html tool:
select 'url' as 'Type of address to extract', select your separator, hit extract and that's it
Thank you bro. But which means there's no software we can extract URLs from pdf files?
I have more then 100 pdf files and I think there will be more than 100 URLs per file....
i haven't seen any yet, but it doesn't mean there isn't any, because i haven't searched for any
and i forgot to mention that you cannot convert every type of protected pdfs to word with that tool above or with any other which is capable to do that
(you can protect a pdf different kind of ways/on different levels)
I know of one. It's originally a tool designed for penetration testing, but it'll do what you want. It should be on one of DefCon's pages.
Google for "FOCA" it's made by some Spanish guys. Hope that helps
The best tool for this is bareGrep. Just google it.
I can make a tool to extract links from PDF files, if anyone is interested.
I am interested.
Would you like now ?
I have a PDF with images and hundreds of links, is there a way to extract the links only without the images?
Separate names with a comma.