Public API

The main factory

The main factory function you should use in your application to get a openxmllib.document.Document subclass:

  • openxmllib.wordprocessing.WordprocessingDocument typically built from MS Word.
  • openxmllib.presentation.PresentationDocument typically built from MS Powerpoint.
  • openxmllib.spreadsheet.SpreadsheetDocument typically built from MS Excel.

If you’re missusing this factory, you’ll get a ValueError exception that says what’s wrong.

openxmllib.openXmlDocument(path=None, file_=None, data=None, url=None, mime_type=None)[source]

Factory function

Will guess what document type is best suited and return the appropriate document type. User must provide either path, file_, data or url parameter.

Parameters:
  • path – file path in the local filesystem to a document.
  • file – a file (like) object to a document (must be opened in ‘rb’ mode’)
  • data – the binary data of a document
  • url – the URL of a document
  • mime_type – mime type if known. One of the known MIME types from openxmllib.contenttypes.

Note that mime_tyype parameter must be provided if you provide the Open XML document through the data parameter. Otherwise, if you don’t provide one, we’ll try to guess which is the most appropriate using the file extension.

Returns:A subclass of openxmllib.document.Document.

The document classes

Base class

All documents classes inherit from openxmllib.document.Document.

class openxmllib.document.Document(file_, mime_type=None)[source]

Base class for handling Open XML document (all types)

Must be subclassed for various types of documents (word processing, ...)

Parameters:
  • file – An opened file(like) object of the document that must be opened in ‘rb’ mode
  • mime_type – the MIME type for the file, potentially found by openxmllib.openXmlDocument()
allProperties[source]

Helper that merges core, extended and custom properties

Returns:mapping of all properties
classmethod canProcessFilename(filename)[source]

Check if we can process such file based on name

Parameters:
  • filename – File name as ‘mydoc.docx’
Returns:

True if we can process such file

classmethod canProcessMime(mime_type)[source]

Check if we can process such mime type

Parameters:
  • mime_type – Mime type as ‘application/xxx’
Returns:

True if we can process such mime

content_types

A openxmllib.contenttypes.ContentTypes object for this document

coreProperties[source]

Document core properties (author, ...) similar to DublinCore

Returns:mapping of standard metadata like {'title': 'blah', 'language': 'fr-FR', ...}
customProperties[source]

Document custom properties added by the document author.

We canot convert the properties as indicated with the http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes namespace

Returns:mapping of metadata
extendedProperties[source]

Additional document automatic properties provided by the office app

Returns:mapping of metadata like {'Pages': '14', ...}
filename

The file mane of the document

indexableText(include_properties=True)[source]

Words found in the various texts of the document.

Parameters:
  • include_properties – Adds words from properties
Returns:

Space separated words of the document.

mimeType[source]

The official MIME type for this document, guessed from the extensions of the openxmllib.document.Document.filename attribute, as opposed to the openxmllib.document.Document.mime_type attribute.

Returns:application/xxx for this file
mime_type

The MIME type of the document

Other attributes

Document._extpattern_to_mime

A mapping like {glob-expr: mime-type, ...} must be overriden by subclasses

Document._text_extractors

A sequence of extractor objects for text extraction must be overriden by subclasses

Hint

Metadata

The various metadata provided by openxmllib.document.Document.coreProperties, openxmllib.document.Document.extendedProperties and openxmllib.document.Document.customProperties depend on the application used to build the document. You can use the Command line: openxmlinfo to see what properties / metadata are applied to your document using the command: openxmlinfo -vv metadata your-file.

Table Of Contents

Previous topic

mimetypes additional types

Next topic

Future features and bugfixes

This Page