The main factory function you should use in your application to get a openxmllib.document.Document subclass:
If you’re missusing this factory, you’ll get a ValueError exception that says what’s wrong.
Factory function
Will guess what document type is best suited and return the appropriate document type. User must provide either path, file_, data or url parameter.
Parameters: |
|
---|
Note that mime_tyype parameter must be provided if you provide the Open XML document through the data parameter. Otherwise, if you don’t provide one, we’ll try to guess which is the most appropriate using the file extension.
Returns: | A subclass of openxmllib.document.Document. |
---|
All documents classes inherit from openxmllib.document.Document.
Base class for handling Open XML document (all types)
Must be subclassed for various types of documents (word processing, ...)
Parameters: |
|
---|
Helper that merges core, extended and custom properties
Returns: | mapping of all properties |
---|
Check if we can process such file based on name
Parameters: |
|
---|---|
Returns: | True if we can process such file |
Check if we can process such mime type
Parameters: |
|
---|---|
Returns: | True if we can process such mime |
A openxmllib.contenttypes.ContentTypes object for this document
Document core properties (author, ...) similar to DublinCore
Returns: | mapping of standard metadata like {'title': 'blah', 'language': 'fr-FR', ...} |
---|
Document custom properties added by the document author.
We canot convert the properties as indicated with the http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes namespace
Returns: | mapping of metadata |
---|
Additional document automatic properties provided by the office app
Returns: | mapping of metadata like {'Pages': '14', ...} |
---|
The file mane of the document
Words found in the various texts of the document.
Parameters: |
|
---|---|
Returns: | Space separated words of the document. |
The official MIME type for this document, guessed from the extensions of the openxmllib.document.Document.filename attribute, as opposed to the openxmllib.document.Document.mime_type attribute.
Returns: | application/xxx for this file |
---|
The MIME type of the document
A mapping like {glob-expr: mime-type, ...} must be overriden by subclasses
A sequence of extractor objects for text extraction must be overriden by subclasses
Hint
Metadata
The various metadata provided by openxmllib.document.Document.coreProperties, openxmllib.document.Document.extendedProperties and openxmllib.document.Document.customProperties depend on the application used to build the document. You can use the Command line: openxmlinfo to see what properties / metadata are applied to your document using the command: openxmlinfo -vv metadata your-file.