PDF (Portable Document Format) has been developed to provide a document
format, which can be presented software and system independent. Because of
this it is often used as a pre-print document exchange format.
The document componen can generate PDF document from all other input formats
and offers a language very similar to CSS to apply custom styling to the
generated output. Additionally it supports adding custom parts, like footers
and headers, to the PDF document.
Writing PDF basically works like writing any other format supported by the
document component, like the basic example shows:
- <?php
-
- require 'tutorial_autoload.php';
-
- // Convert some input RSTfile to docbook
- $document = new ezcDocumentRst();
- $document->loadFile( './article/introduction.txt' );
-
- $pdf = new ezcDocumentPdf();
- $pdf->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
- $pdf->createFromDocbook( $document->getAsDocbook() );
-
- file_put_contents( __FILE__ . '.pdf', $pdf );
-
- ?>
First we include some RST file to create a Docbook file from it, because, like
described before, Docbook is the central conversion format.
Afterwards the Docbook document is loaded by the PDF class and saved. When
converting the document to a string the PDF is renderer using the default
options and the default driver. The result of this rendering call can be
watched here: 04_01_create_pdf.pdf.
Output writers
Since there are numerous different PDF renderers in the PHP world and the
available ones might depend on the current environment, the document component
supports different PDF driver, as wrapper around different existent libraries.
For now two implementation exist for pecl/har and TCPDF, but it is fairly easy
to write another one, for another PDF class.
Haru
libharu is a open source PDF generation library, written in C, and wrapped
by the haru PHP extension, available from PECL. If PEAR is correctly setup
on your machine it should install as easy as:
pear install pecl/haru
The Haru driver is pretty fast, but currently has issues with some special
characters. It is the default driver, but can be explicitly used by setting
the driver option on the PDF class, like:
$pdf = new ezcDocumentPdf();
$pdf->options->driver = new ezcDocumentPdfHaruDriver();
TCPDF
TCPDF is a pure PHP based PDF generation library, available from
tcpdf.org. To use the TCPDF driver you need to download and include its
main class before rendering the PDF. It supports all aspects of PDF rendering
required by the document component, but has some bad coding practices, like:
- Throws lots of warnings and notices, which you might want to silence by
temporarily changing the error reporting level
- Reads and writes several global variables, which might or might not
interfere with your application code
- Uses eval() in several places, which results in non-cacheable OP-Codes.
The TCPDF driver can be used after including the TCPDF source code, using:
$pdf = new ezcDocumentPdf();
$pdf->options->driver = new ezcDocumentPdfTcpdfDriver();
Styling the PDF
The PDF output can be styled using a CSS like language, which assigns styles
based on the Docbook XML structure. The default styling rules are defined in
the default.css.
The first most relevant part are the general layout options, which can be
defined for the common article root node in the Docbook XML file. You can set
global font options there, like:
article {
// Basic font style definitions
font-size: "12pt";
font-family: "serif";
font-weight: "normal";
font-style: "normal";
line-height: "1.4";
text-align: "left";
// Basic page layout definitions
text-columns: "1";
text-column-spacing: "10mm";
// General text layout options
orphans: "3";
widows: "3";
}
The meaning of the first set of options should be obvious from CSS. We require
each value to be wrapped by quotes for easier parsing, though.
The second set of options defines options for multi-column layouts, which are
not available in the web, but quite common in generated PDF documents. You can
specify the number of text columns, as well as the distance between the text
columns here.
The third set in this example defines lesser known text layout options like
the handling of orphans and widows, which specify the handling of
overlapping parts of paragraphs on page wrapping.
You can, of course, apply those styles to any elements in your document, using
the common CSS addressing rules, like:
// Emphasis node anywhere in the document
emphasis { ... }
// Title element directly below a section element
section > title { ... }
// Title element anywhere below a section element
section title { ... }
// Title element with the ID "first_title"
title#first_title { ... }
// Title element with the class "foo"
title.foo { ... }
// emphasis node directly below a title with class "foo", anywhere in a
// section with the ID "first"
section#first title.foo > emphasis { ... }
The values and measures for the properties are very similar to the
properties in CSS. For example the margin and padding properties accept one-
to four-tuples of values, with the same respective meaning like in CSS.
Another central formatting element, which is special to the PDF generation, is
the virtual element "page":
page {
page-size: "A4";
page-orientation: "portrait";
padding: "22mm 16mm";
}
The page-size property accepts several known page size identifiers and the
page-orientation defines the orientation of a page. You can also address any
page directly by its ID, which will be 'page_1' for the first page, or its
class, which will be "right", or "left", depending on the current page number.
A detailed description of all available PDF style options is available
here.
Measures
The properties in the PDF component accept different measures, which are:
- "mm", Millimeters, the default measure, if none is specified
- "pt", Points, 72 points per inch
- "px", Pixel, depends on the set resolution, by default also 72 points per
inch
- "in", Inch
The unit "Points" is most common for font sizes, while millimeters or inches
will probably more useful for page paddings. You are free to choose any of
them and can even combine different units in one tuple, like:
para {
// Top margin: 12 mm; Right margin: .1 inch; Bottom margin: 10 points,
// Left margin: 1 pixel
margin: "12 .1in 10pt 1px";
}
PDF parts
PDF parts are additional parts in a rendered document, like headers and
footers. You can implement and register them yourself, and they are activated
by different triggers, like:
- on document creation
- on page creation
- when a document has been finished
The default implementation for headers and footers is triggered on page
creation and renders the title of the document, its author and a page number
in the header or the footer. To develop a custom PDF part you should extend
from the ezcDocumentPdfPart class.
For the following document we are using a set of custom styles, as well as a
header and a footer to customize the rendered PDF document. The additional
custom CSS changes the default font and the page border:
article {
font-family: "sans-serif";
font-size: "10pt";
}
page {
padding: "15mm 30mm";
}
The code using the custom CSS and headers and footers then looks like:
- <?php
-
- require 'tutorial_autoload.php';
-
- // Convert some input RSTfile to docbook
- $document = new ezcDocumentRst();
- $document->loadFile( './article/introduction.txt' );
-
- // Load the docbook document and create a PDF from it
- $pdf = new ezcDocumentPdf();
- $pdf->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
-
- // Load a custom style sheet
- $pdf->loadStyles( 'custom.css' );
-
- // Add a customized footer
- $pdf->registerPdfPart( new ezcDocumentPdfFooterPdfPart(
- new ezcDocumentPdfFooterOptions( array(
- 'showDocumentTitle' => false,
- 'showDocumentAuthor' => false,
- 'height' => '10mm',
- ) )
- ) );
-
- // Add a customized header
- $pdf->registerPdfPart( new ezcDocumentPdfHeaderPdfPart(
- new ezcDocumentPdfFooterOptions( array(
- 'showPageNumber' => false,
- 'height' => '10mm',
- ) )
- ) );
-
- $pdf->createFromDocbook( $document->getAsDocbook() );
- file_put_contents( __FILE__ . '.pdf', $pdf );
-
- ?>
The first part, the creation of a Docbook document from a RST document is just
the same like in the first example.
Afterwards we load the above mentioned custom.css as an additional style. You
can load as many styles as you want. If multiple styles are loaded, the latter
ones always (partly) redefine the first styles.
After that two custom PDF parts are registered using their respective option
class to configure their skin. The footer should only show the page number,
while the header should display all parts (title and author), but the page
number.
At the end of the example the document is created as usual, and looks like
this: 04_02_create_pdf_styled.pdf Since the source document does not
include any author information, this information is also not rendered in the
header.
Hyphenating
Proper hyphenation is crucial for nice text rendering especially for justified
paragraph formatting. Since hyphenation is highly language dependent you can
create and use your own custom hyphenator - the default one doesn't do any
hyphenation by default, but just keeps every word as it is.
Custom hyphenators can be implemented by extending from the abstract class
ezcDocumentPdfHyphenator. The only need to implement one Method,
`splitWord()`, which should return possible splitting points of the given
word, as documented in the ezcDocumentPdfHyphenator class.
The custom hyphenator can be configured in the ezcDocumentPdfOptions class,
like this:
$pdf = new ezcDocumentPdf();
$pdf->options->hyphenator = new myHyphenator();
The hyphenator will then be used by all text renderers during the rendering
process.