Aug 022011
 

A few days ago, I had what I expect is a common problem: I needed to print off some PDFs, sign them, and then scan them back into PDF format to return. However, I didn’t have any software for scanning directly to PDF, and most solutions on the internet appear to be either sketchy or cost money. The scanner functionality that is provided with Windows for working with TWAIN devices can create TIFF format images of the scanner output, so all I really needed was a solution that would allow me to create a PDF and insert a TIFF image. Luckily, there is a PECL package for creating PDF files using a PHP script, so I wrote a small command line PHP script to take a series of TIFF images and create a PDF, with one image per page.

Setting up PHP’s PDFLib Front End

Because the front-end for PDFLib is (now) a PECL package, it is not bundled with PHP, so you must retrieve it using the PEAR tool. Unfortunately, the PHP frontend requires a separate installation of PDFLib, so I began by obtaining a copy of the free for non-commercial use of PDFLib-Lite. As a home user only interested in converting some things that I scanned to email to a friend, I believe that I qualify for non-commercial use. Because there is no RPM of PDFLib Lite for Amazon’s Basic Linux AMI, I had to build from source, which was routine except that the initial image does not include gcc for some reason, so I had to install it as well. For reference, or for those new to Linux:

# wget http://www.pdflib.com/binaries/PDFlib/705/PDFlib-Lite-7.0.5.tar.gz
# tar -zxvf PDFlib-Lite-7.0.5
# cd PDFlib-Lite-7.0.5
# ./configure
# make
# make install

Now that PDFlib-Lite is installed to /usr/local it is necessary to download and install the PECL extension to expose it to PHP. On an Amazon virtual server instance it is necessary to install the php-devel package as well as the php-pear package first with yum, in order to have the necessary source and headers to build extensions, along with the tools to access PEAR; others may need to install it as well, depending on their setup. After that, installing new PHP extensions via PEAR is easy:

# pear install pecl/pdflib

Finally, add the extension to your configuration file, /etc/php.ini. Some configurations recommend putting extensions in their own files, which I think is kind of silly, but I’ll play along, so an alternative is to create a file in /etc/php.d/ named pdf.ini with only one line:

extension=pdf.so

In the event that after installing pdflib your extension has a different name, PEAR will tell you what to use after it is done installing.

Using PHP to Create PDFs

With all the libraries installed, I just needed a simple script that would create a new PDF, load up some TIFF-format images, and then insert them as full pages. So, I wrote this small command line script that accepts a list of input files and produces a PDF using one image per page, with each image resized to occupy the entire 8.5″x11″ page. It can be run in Windows or Linux using the PHP CLI, given that the appropriate software is installed, because it should be invoked with the php -f pdf_convert.php [arguments ...] syntax from any prompt, rather than relying on a #!/bin/php at the top of the file.

<?php
/**
 * Convert a series of TIFF files given on the command line to pages in
 * a single PDF
 */

if ($argc < 3) {
	echo ('Convert a series of TIFF images as individual pages in a PDF');
	echo ("\n".'Usage: php -f pdf_convert.php <output name> <input 1> <input 2> ...');
	exit(0);
}

// Get file names from command line
$output_file = $argv[1];
$input_files = array();
for ($i = 2; $i < $argc; ++$i) {
	$input_files[] = $argv[$i];
}

// Initialize PDF File
$pdf = new PDFLib();
if ($pdf->begin_document('', '') == 0) {
	die('Error: '.PDF_get_errmsg($pdf));
}

// For each of the input images, load them, create an 8.5" x 11" page, place them as
// the full page, and then close the page
foreach ($input_files as $in_file) {
	echo ('Now processing '.$in_file.'...'."\n");
	$pdf->begin_page_ext(612, 792, '');
	$image = $pdf->load_image('auto', $in_file, '');
	$pdf->fit_image($image, 0, 0, 'boxsize {612 792} fitmethod meet');
	$pdf->close_image($image);
	$pdf->end_page_ext('');
}

$pdf->end_document('');

$raw_pdf = $pdf->get_buffer();

file_put_contents($output_file, $raw_pdf);

echo ('Done!'."\n");

?>

This doesn’t do much in the way of error checking or proper handling, because it didn’t seem necessary for something I’m only going to use once or twice and isn’t really a packaged product. Feel free to use this or modify it for whatever you like, under the terms of the GPL.