Convert PDF to Images From the Linux Command Line

0
291
Shutterstock/Tetiana Yurchenko

Converting a PDF file to an image can be done easily at the Linux command line using a single command. Discover how to do install the utility, how to use it, and how to automate your setup.

What Is poppler-utils ?

As alluded to in the introduction for this article, we need to install a small utility set named poppler-utils to help us convert PDF files to images.

The poppler-utils utility set allows us to convert images to PDF, and PDF to images.

Installing poppler-utils

To install poppler-utils on your Debian/Apt based Linux distribution (Like Ubuntu and Mint), do:

sudo apt install poppler-utils

To install poppler-utils on your RedHat/Yum based Linux distribution (Like RedHat and Fedora), do:

sudo yum install poppler-utils

Converting PDF to images

The command required is simple and straightforward:

pdftoppm -png test.pdf test

With the pdftoppm command we can convert PDF to images. We specify that we want a PNG file for the output format (by using -png) and that our input file is test.pdf.

The output file we specify as test. pdftoppm will automatically add a page number suffix (like -1) and an extension (based on the earlier -png option passed).

The output file name will thus be test-1.png, as we can verify next:

ls test-1.png
eog test-1.png

Any subsequent pages would be test-2.png etc. The eog command (if eog is installed) will open the file for you so you can review the output, though you can use any other image handling program you like.

Batch Processing of PDF Files to Images

We can make a one-liner command to do batch processing of all PDF files with a given name to images. We could then simply add this line to a small script .sh file and automate it further, or we can just use it at the command line whenever we need to convert a large amount of PDF files to images.

ls –color=never test*.pdf | sed ‘s|.pdf||’ | xargs -I{} pdftoppm {}.pdf -png {}

In this command, we first obtain a directory listing for all PDF files which have a name that starts with test and ends with .pdf, using the ls –color=never test*.pdf.

The –color=never is important, as shell color coding symbols (if active, as they are by default) may sometimes confuse xargs.

Next we use a simple sed substitute command to replace a literal dot followed by pdf to nothing. In other words, we remove the .pdf file extension.

This gives us the benefit of adding it back later only where needed, i.e. when specifying the input file for pdftoppm, but not when specifying the output file for the same pdftoppm command, much alike to our earlier example above.

Finally, we use xargs to sent each pdf filename (minus the .pdf) to pdftoppm one by one. We use the -I option to xargs which allows us to specify any input received (i.e. the shortened pdf filenames) by simply using {} in the command that follows.

As you can see, our pdftoppm command now looks much alike to the first example, with each individual pdf file name as input (re-suffixed with .pdf), and as output the pdf filename without .pdf.

Let’s execute it:

This worked fine: the three PDF files, all with one page each, were converted to three individual .png files (one image per page and in this case per PDF as each PDF had only one page), all aptly named and suffixed correctly.

As an alternative to the -png option, one can also use -jpeg to generate JPEG files instead. Use pdftoppm –help or man pdftoppm to see a full list of options.

Wrapping up

In this article we saw how easy and straightforward it can be to convert PDF files to image files, and that directly from the Linux command line! We also look at a straightforward way to automate this process. Enjoy!