Ads by Google

Tuesday, May 24, 2011

A simple way to extract specific PDF pages

Today, I received a humongous PDF with about 300 pages of documentation which had to be shared with lots of people who had to review each section independently.  Instead of simply forwarding the entire document to them and asking them to wade through it themselves, I thought I'd split the pages out and send only the relevant bits.

That should be easy, right?

Well, I forgot what the tool was.  A few minutes of google search turned up...pdftk which was what I was looking for.  Turned out that I had installed it long time ago and when I tried it, it dumped core on the cygwin installation I had.

This happens to me.  A lot. Just when I have deadline and I think of the solution, the carpet gets pulled under me.  :-)

Wait, I did remember doing something using LaTeX and another quick search revealed pdfpages on CTAN.  Downloaded and installed it, read the documentation and it was  a breeze to get things sorted.  The smallest example that I can create to get a specific set of pages is shown below.

\documentclass[a4paper]{scrartcl}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages={ 9-14,27}]{RFP.pdf}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End:


That's it.  LaTeXing the file gave me just the pages I needed.  If you have a TeX installation, this works for most cases.  Please read the documentation if you want to something fancy but the above is enough to get the pages you need.

4 comments:

Joost said...

See also pdfjam, which is actually a shell script front-end to pdfpages:

http://freshmeat.net/projects/pdfjam/

So instead of creating latex files, you can just type something like:

# pdfjam 2-3 -o file-p2-3.pdf

:-)

Joost said...

Ok, that should really be:

# pdfjam file.pdf 2-3 -o file-p2-3.pdf

Anonymous said...

http://sourceforge.net/projects/pdfshuffler/

Peter said...

Or you can just use ghostscript, like this:
http://pastebin.com/uPgxdf9E