The DocumentInformation Class

class PyPDF2.pdf.DocumentInformation

A class representing the basic document metadata provided in a PDF File. This class is accessible through getDocumentInfo()

All text properties of the document metadata have two properties, eg. author and author_raw. The non-raw property will always return a TextStringObject, making it ideal for a case where the metadata is being displayed. The raw property can sometimes return a ByteStringObject, if PyPDF2 was unable to decode the string’s text encoding; this requires additional safety in the caller and therefore is not as commonly accessed.

author

Read-only property accessing the document’s author. Returns a unicode string (TextStringObject) or None if the author is not specified.

author_raw

The “raw” version of author; can return a ByteStringObject.

creator

Read-only property accessing the document’s creator. If the document was converted to PDF from another format, this is the name of the application (e.g. OpenOffice) that created the original document from which it was converted. Returns a unicode string (TextStringObject) or None if the creator is not specified.

creator_raw

The “raw” version of creator; can return a ByteStringObject.

producer

Read-only property accessing the document’s producer. If the document was converted to PDF from another format, this is the name of the application (for example, OSX Quartz) that converted it to PDF. Returns a unicode string (TextStringObject) or None if the producer is not specified.

producer_raw

The “raw” version of producer; can return a ByteStringObject.

subject

Read-only property accessing the document’s subject. Returns a unicode string (TextStringObject) or None if the subject is not specified.

subject_raw

The “raw” version of subject; can return a ByteStringObject.

title

Read-only property accessing the document’s title. Returns a unicode string (TextStringObject) or None if the title is not specified.

title_raw

The “raw” version of title; can return a ByteStringObject.

Example Usage:

>>> from PyPDF2 import PdfFileReader
>>> inputPdf = PdfFileReader(open("test.pdf", "rb"))
>>> docInfo = inputPdf.getDocumentInfo()
>>> docInfo.author
Anonymous
>>> docInfo.creator
Hewlett Packard MFP
>>> docInfo.producer
Acrobat Distiller 10.0.0 (Windows)
>>> docInfo.title
A Test
>>> docInfo.subject
testing