There are a million python libraries and tools to do some overlapping subset of the things you'd want to do with a pdf.
There are no doubt another million in other languages.
These are each basically bundles of some of the transformations you'd want to make to the same underlying data structure.
So, complex pdf scripts often need two or three different libraries to get their thing done, which is wasteful at borh a dev effort and computational level.
The ecosystem would be greatly improved if someone made a great (probably rust based) in-memory low level pdf reading and writing data structure.
PDF libraries in any language could switch to using that structure and library internally, with the carrot that the switch would result in needing less code, and likely being some combination of faster and safer.
And then if they just exposed get_structure_pointer() and set_structure_pointer(), they could all interoperate for free. (Another carrot for joining -- small libraries could usefully add features and be adopted without needing to pick an existing popular library to glom onto.)
Not sure what would economically cause this to happen, but it would be great.
curiously poppler doesn't mention that anywhere on their website, but the library comes with a similar suite of tools, typically available in linux distributions.
How do you use qpdf for extraction when its README states “qpdf does not render PDFs or perform text extraction, and it does not contain higher-level interfaces for working with page contents.”
Not the person you're replying to, but when they said "extraction" I believe they're talking about extracting pages from a PDF (like "splitting" the PDF apart, page-wise), not text. At least that's a thing I've used qpdf for in the past.
I found PDF SAM basic ("split and merge") well done: https://pdfsam.org/en/pdfsam-basic/. That one is open-source and multi-platform, they have more features in a paying superset project.
i've tried 'pdfcpu images list' on a random pdf i've had lying around and the tool unexpectedly started downloading some font from unspecified internet location to my local disk.
I had to bash my head against the wall and submit myself to paying for a creative cloud license. At least acrobat just works. Although I wish there was a reasonable alternative.
Not the same thing but just want to shoutout https://www.pdfgear.com/ as one of the only viable alternatives to adobe for intermediate level PDF tinkering. It’s free and available for everything except Linux.
Why wouldn’t a company sign documents they create automatically? This is about a cryptographic signature that lets the user verify authorship, not a visual signature in the PDF, right? So it would still be useful to be able to verify that a bank statement is really from my bank, even if it was generated without human interaction.
CEOs often need to sign changes to employment terms or options/vesting terms and have hundreds if not thousands of employees. They don't have the time to go through and sign all of those contracts.
Its no different than the analog ages where a secretary would go through and stamp all the contracts with the CEOs signature.
Imagine you need to sign 25 pdf documents. You read them on the screen and then batch-sign them (instead of signing them with the vewing software). This is just an example.
My bank can issue a signed certificate for any of you account movements if you need to provide proof of them. They come signed both digitally and handwritten by the branch's director. But you wouldn't expect the director to be there sitting and signing all certificate requests that arrive, right?
It’s not open-source, so practically the question is equivalent to “why reinvent the wheel by creating libreoffice when there’s a perfectly good Microsoft office suite out there”
This was my first thought, but after reading the comments here, I see I had no idea how many other alternatives already existed, so why not add another one.
Yeah, I've been expecting someone to work up a system where:
- source file is .md
- file is compiled to .pdf _and_ the .md source file is included as an attachment
- when working with the file beyond viewing as a .pdf the .md is extracted and used instead of the .pdf
The LaTeX folks have a similar system ages ago where the .tex source would be included in a .pdf made from a .tex file for embedding in documents so that it could be sent in say an e-mail and then edited by the recipient --- absolutely awesome for discussing math via e-mail.
This is totally an aside, but I wonder how long the "Swiss army knife" metaphor will hang on in popular culture. People generally use it to indicate that something does a variety of things, but I'd say many of younger generation have never touched if even seen such a knife in their life, and even among older generations it doesn't have a positive connotation.
Like when I hear something is the Swiss army knife of something, my take is that it does a lot of things poorly and there are better specific tools for every need. Like if you need a really terrible knife or bottle opener or screwdriver or saw, a Swiss Army knife has you covered. But it should be a tool of last resort when you have no other options.
Swiss Army knives seem to be as popular as ever. What do you mean, doesn't have a positive connotation?
They're great hiking, camping, traveling, in backpacks and bags.
What's wrong with it as a knife? It's perfectly sharp. Obviously it's not a full-sized chef's knife, but it will cut your apple or twine or packing tape. It's a multitool. It does lots of things. A tool of "last resort" seems to miss the point -- it's not meant to use at home, when you have a full-size screwdriver and bottle opener and corkscrew. It's for traveling with you. And it's great at that.
SAK's are iconic. I don't think your take is a common one.
Obviously it's not the only game in town ever since Leatherman made the pliers-style tool popular as well.
But you can just look up the various brands on Amazon to see that SAK's continue to sell very well, by "x bought in the last month."
It's nowhere near 1%, I don't know where you're getting that.
Edit: according to [1] Victorinox has the #1 spot in market share in multitools. The share is a bit higher than it is for SOG and Leatherman, though they're both close.
Lots of cheap (and good) Chinese alternatives entered the market recently but I'd say Victorinox is still going strong. In Poland it's sold everywhere and the brand is very recognizable.
It isn't as popular as ever, at least not in the Western world. I don't know what your frame of reference is, but it is positively non-existent compared to a couple of decades ago. Approximately zero kids, give or take a few, put one on their Christmas list, where when I was a kid it was many kid's dream item. I would say the most common buyer today are middle-aged men who buy it just as a thing to own because they remember how desirable they were when they were in Scouts in their teens.
>A tool of "last resort" seems to miss the point
It is quite literally a tool of last resort, and in practice people who actually own one (such as myself) have often never, ever actually used any of the options available on it because they're terrible options and we always have something better available.
Like a legitimate folding camping knife, which we all have in our camping supplies. An infinitely better knife. A tiny multi-screwdriver kit. The Leatherman brand went big by making a legitimately good, well constructed pair of pliers that they add some "in a pinch" options.
Serious campers who portage and go deep country have a proper assortment of gear and never lean on their SAK. The rest of us usually get there in a car and have a...proper assortment of gear.
But again, if you're in a situation where you have to use one of the tools on a SAK, you probably screwed up and it's a serious compromise. It just isn't a compelling metaphor for software tooling.
>Victorinox is literally the #1 multitool brand by market share
This doesn't repudiate anything I said, and it's a particularly weird canard.
>That's why it's a popular one
Increasingly the only ones I see leveraging the metaphor are English as a second language writers who perhaps came across it somewhere. I would hardly call it "popular", and I pointed out the reality that many readers, such as myself, find it a negative description, similar to someone calling themselves a "jack of all trades". Your defensiveness of SAK does not change this, and your attempts at invalidating my statement borders on bizarre.
> are English as a second language writers who perhaps came across it somewhere
Your prejudice is showing. Where would you even get an idea like that?
I hope you understand that people whose first language isn't English also use SAKs. It's not just an English thing. They're not trying to repeat some unknown object they've only encountered in metaphor. The tools are literally Swiss. And popular around the entire world.
Opinion from 10 years ago, I suspect still valid:
There are a million python libraries and tools to do some overlapping subset of the things you'd want to do with a pdf.
There are no doubt another million in other languages.
These are each basically bundles of some of the transformations you'd want to make to the same underlying data structure.
So, complex pdf scripts often need two or three different libraries to get their thing done, which is wasteful at borh a dev effort and computational level.
The ecosystem would be greatly improved if someone made a great (probably rust based) in-memory low level pdf reading and writing data structure.
PDF libraries in any language could switch to using that structure and library internally, with the carrot that the switch would result in needing less code, and likely being some combination of faster and safer.
And then if they just exposed get_structure_pointer() and set_structure_pointer(), they could all interoperate for free. (Another carrot for joining -- small libraries could usefully add features and be adopted without needing to pick an existing popular library to glom onto.)
Not sure what would economically cause this to happen, but it would be great.
[dead]
curiously poppler doesn't mention that anywhere on their website, but the library comes with a similar suite of tools, typically available in linux distributions.
i have found them very helpful.
https://en.wikipedia.org/wiki/Poppler_(software)#poppler-uti...
I use these all the time. They are great.
For low-level work, qpdf can be quite useful: https://github.com/qpdf/qpdf
Came here to say this. Qpdf is my go-to for manipulating pdf files on the command line. Encrypting, decrypting, extracting and merging pages.
It's Apache-licensed and written in C++.
How do you use qpdf for extraction when its README states “qpdf does not render PDFs or perform text extraction, and it does not contain higher-level interfaces for working with page contents.”
Not the person you're replying to, but when they said "extraction" I believe they're talking about extracting pages from a PDF (like "splitting" the PDF apart, page-wise), not text. At least that's a thing I've used qpdf for in the past.
There is also: https://pdfcpu.io/
That said, if you're looking for a GUI app to do simple PDF mutations it's often hard to fine a simple solid open source cross platform app.
At least I haven't found one :)
I found PDF SAM basic ("split and merge") well done: https://pdfsam.org/en/pdfsam-basic/. That one is open-source and multi-platform, they have more features in a paying superset project.
Pdfsam and pdfxchange are my gotos
If self hosting is an option, I've found Signature PDF to be quite good.
https://github.com/24eme/signaturepdf?tab=readme-ov-file#sig...
i've tried 'pdfcpu images list' on a random pdf i've had lying around and the tool unexpectedly started downloading some font from unspecified internet location to my local disk.
sorry, too spooky even for october. :-)
I had to bash my head against the wall and submit myself to paying for a creative cloud license. At least acrobat just works. Although I wish there was a reasonable alternative.
How about this: https://tools.pdf24.org/en
It allows installation for offline use too.
TIL: there are numerous swiss army knifes for pdf files available already
Not the same thing but just want to shoutout https://www.pdfgear.com/ as one of the only viable alternatives to adobe for intermediate level PDF tinkering. It’s free and available for everything except Linux.
I found it suspicious, they formerly sent stuff too their cloud without it being obvious, and the company seems to mod their own subreddit.
Is this an alternative to acrobat?
I’m curious: what good would automating signing a PDF through a utility do?
The whole purpose of a signature is that a person signed and agreed to something. That cannot be done automatically.
Why wouldn’t a company sign documents they create automatically? This is about a cryptographic signature that lets the user verify authorship, not a visual signature in the PDF, right? So it would still be useful to be able to verify that a bank statement is really from my bank, even if it was generated without human interaction.
Also allowing you to detect whether any changes have been made since the signature was applied.
CEOs often need to sign changes to employment terms or options/vesting terms and have hundreds if not thousands of employees. They don't have the time to go through and sign all of those contracts.
Its no different than the analog ages where a secretary would go through and stamp all the contracts with the CEOs signature.
Those don’t need certified signatures. They just need pdf stamps.
It depends on jurisdiction
Pdf stamps have zero security.
Signing can be cryptographic.
Imagine you need to sign 25 pdf documents. You read them on the screen and then batch-sign them (instead of signing them with the vewing software). This is just an example.
My bank can issue a signed certificate for any of you account movements if you need to provide proof of them. They come signed both digitally and handwritten by the branch's director. But you wouldn't expect the director to be there sitting and signing all certificate requests that arrive, right?
In addition to the already mentioned, there is also pdfcpu[0], "a Go PDF processor and CLI"
[0]: https://github.com/pdfcpu/pdfcpu
I though Swiss Army knife for PDF are Didier Stevens PDF tools:
https://blog.didierstevens.com/programs/pdf-tools/
What about https://www.ghostscript.com ?
https://github.com/LibrePDF/OpenPDF
Pdftk has been been around for many years, and does exactly the same things. Why reinvent the wheel?
https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
> Every time someone reinvents the wheel, it becomes a little rounder.
Not sure if this particular library is an improvement, but even if it serves nothing but the author’s enjoyment, or education, it’s a win.
It makes sense to me. They made a PDF library for Python first. Having a PDF library for your preferred language is a good thing.
And it’s natural to then build a cli tool on top of the library they already made.
It’s not open-source, so practically the question is equivalent to “why reinvent the wheel by creating libreoffice when there’s a perfectly good Microsoft office suite out there”
The server component is under GNU GPL: https://www.pdflabs.com/docs/pdftk-license/
This was my first thought, but after reading the comments here, I see I had no idea how many other alternatives already existed, so why not add another one.
due to the nature of PDF, none of the tools mentioned here can do things as simple as detecting tables on pages with high accuracy
PDF is absolutely mint for display but it really suffers when parsing is involved
Yeah, I've been expecting someone to work up a system where:
- source file is .md
- file is compiled to .pdf _and_ the .md source file is included as an attachment
- when working with the file beyond viewing as a .pdf the .md is extracted and used instead of the .pdf
The LaTeX folks have a similar system ages ago where the .tex source would be included in a .pdf made from a .tex file for embedding in documents so that it could be sent in say an e-mail and then edited by the recipient --- absolutely awesome for discussing math via e-mail.
This is totally an aside, but I wonder how long the "Swiss army knife" metaphor will hang on in popular culture. People generally use it to indicate that something does a variety of things, but I'd say many of younger generation have never touched if even seen such a knife in their life, and even among older generations it doesn't have a positive connotation.
Like when I hear something is the Swiss army knife of something, my take is that it does a lot of things poorly and there are better specific tools for every need. Like if you need a really terrible knife or bottle opener or screwdriver or saw, a Swiss Army knife has you covered. But it should be a tool of last resort when you have no other options.
And thus the Leatherman(tm) was born from its ashes.
And too quickly smothered in copycats for its name to become the new metaphor.
Swiss Army knives seem to be as popular as ever. What do you mean, doesn't have a positive connotation?
They're great hiking, camping, traveling, in backpacks and bags.
What's wrong with it as a knife? It's perfectly sharp. Obviously it's not a full-sized chef's knife, but it will cut your apple or twine or packing tape. It's a multitool. It does lots of things. A tool of "last resort" seems to miss the point -- it's not meant to use at home, when you have a full-size screwdriver and bottle opener and corkscrew. It's for traveling with you. And it's great at that.
SAK's are iconic. I don't think your take is a common one.
Be serious. If someone in 2025 has a pocket multitool, there's about a 1% chance it is red with a white cross on it.
??
Obviously it's not the only game in town ever since Leatherman made the pliers-style tool popular as well.
But you can just look up the various brands on Amazon to see that SAK's continue to sell very well, by "x bought in the last month."
It's nowhere near 1%, I don't know where you're getting that.
Edit: according to [1] Victorinox has the #1 spot in market share in multitools. The share is a bit higher than it is for SOG and Leatherman, though they're both close.
[1] https://www.marketreportanalytics.com/reports/swiss-army-kni...
Lots of cheap (and good) Chinese alternatives entered the market recently but I'd say Victorinox is still going strong. In Poland it's sold everywhere and the brand is very recognizable.
>Swiss Army knives seem to be as popular as ever.
It isn't as popular as ever, at least not in the Western world. I don't know what your frame of reference is, but it is positively non-existent compared to a couple of decades ago. Approximately zero kids, give or take a few, put one on their Christmas list, where when I was a kid it was many kid's dream item. I would say the most common buyer today are middle-aged men who buy it just as a thing to own because they remember how desirable they were when they were in Scouts in their teens.
>A tool of "last resort" seems to miss the point
It is quite literally a tool of last resort, and in practice people who actually own one (such as myself) have often never, ever actually used any of the options available on it because they're terrible options and we always have something better available.
Like a legitimate folding camping knife, which we all have in our camping supplies. An infinitely better knife. A tiny multi-screwdriver kit. The Leatherman brand went big by making a legitimately good, well constructed pair of pliers that they add some "in a pinch" options.
Serious campers who portage and go deep country have a proper assortment of gear and never lean on their SAK. The rest of us usually get there in a car and have a...proper assortment of gear.
But again, if you're in a situation where you have to use one of the tools on a SAK, you probably screwed up and it's a serious compromise. It just isn't a compelling metaphor for software tooling.
See my other comment for its popularity statistics. Victorinox is literally the #1 multitool brand by market share. These are facts.
Your take is idiosyncratic. Using a SAK doesn't mean "you probably screwed up". That's truly a bizarre thing to say.
A SAK is a perfectly fine metaphor. That's why it's a popular one. It's a small tool that does lots of things. I think you're overthinking this.
>Victorinox is literally the #1 multitool brand by market share
This doesn't repudiate anything I said, and it's a particularly weird canard.
>That's why it's a popular one
Increasingly the only ones I see leveraging the metaphor are English as a second language writers who perhaps came across it somewhere. I would hardly call it "popular", and I pointed out the reality that many readers, such as myself, find it a negative description, similar to someone calling themselves a "jack of all trades". Your defensiveness of SAK does not change this, and your attempts at invalidating my statement borders on bizarre.
Feel free to continue. I'm done here.
> are English as a second language writers who perhaps came across it somewhere
Your prejudice is showing. Where would you even get an idea like that?
I hope you understand that people whose first language isn't English also use SAKs. It's not just an English thing. They're not trying to repeat some unknown object they've only encountered in metaphor. The tools are literally Swiss. And popular around the entire world.
9/11 killed them. They used to be sold in airports.
[dead]