Do various operations on HWPv5 files.
Usage:
hwp5proc <command> [<args>...]
hwp5proc [--version]
hwp5proc [--help]
hwp5proc [--help-commands]
--version Show version and copyright information.
-h --help Show help messages.
--help-commands Show available commands.
Print HWP file format version of <hwp5file>.
Usage:
hwp5proc version [options] <hwp5file>
hwp5proc version --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
Print HWP file header.
Usage:
hwp5proc header [options] <hwp5file>
hwp5proc header -h
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
Print summary information of <hwp5file>.
Usage:
hwp5proc summaryinfo [options] <hwp5file>
hwp5proc summaryinfo --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
List streams in the <hwp5file>.
Usage:
hwp5proc ls [--loglevel=<loglevel>] [--logfile=<logfile>]
[--vstreams | --ole]
<hwp5file>
hwp5proc ls --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--vstreams Process with virtual streams (i.e. parsed/converted
form of real streams)
--ole Treat <hwpfile> as an OLE Compound File. As a
result, some streams will be presented as-is. (i.e.
not decompressed)
Example: List without virtual streams:
$ hwp5proc ls sample/sample-5017.hwp
\x05HwpSummaryInformation
BinData/BIN0002.jpg
BinData/BIN0002.png
BinData/BIN0003.png
BodyText/Section0
DocInfo
DocOptions/_LinkDoc
FileHeader
PrvImage
PrvText
Scripts/DefaultJScript
Scripts/JScriptVersion
Example: List virtual streams too:
$ hwp5proc ls --vstreams sample/sample-5017.hwp
\x05HwpSummaryInformation
\x05HwpSummaryInformation.txt
BinData/BIN0002.jpg
BinData/BIN0002.png
BinData/BIN0003.png
BodyText/Section0
BodyText/Section0.models
BodyText/Section0.records
BodyText/Section0.xml
BodyText.xml
DocInfo
DocInfo.models
DocInfo.records
DocInfo.xml
DocOptions/_LinkDoc
FileHeader
FileHeader.txt
PrvImage
PrvText
PrvText.utf8
Scripts/DefaultJScript
Scripts/JScriptVersion
Extract out the specified stream in the <hwp5file> to the standard output.
Usage:
hwp5proc cat [--loglevel=<loglevel>] [--logfile=<logfile>]
[--vstreams | --ole]
<hwp5file> <stream>
hwp5proc cat --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--vstreams Process with virtual streams (i.e. parsed/converted
form of real streams)
--ole Treat <hwpfile> as an OLE Compound File. As a
result, some streams will be presented as-is. (i.e.
not decompressed)
Example:
$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg | file -
$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg > BIN0002.jpg
$ hwp5proc cat samples/sample-5017.hwp PrvText | iconv -f utf-16le -t utf-8
$ hwp5proc cat --vstreams samples/sample-5017.hwp PrvText.utf8
$ hwp5proc cat --vstreams samples/sample-5017.hwp FileHeader.txt
ccl: 0
cert_drm: 0
cert_encrypted: 0
cert_signature_extra: 0
cert_signed: 0
compressed: 1
distributable: 0
drm: 0
history: 0
password: 0
script: 0
signature: HWP Document File
version: 5.0.1.7
xmltemplate_storage: 0
Extract out streams in the specified <hwp5file> to a directory.
Usage:
hwp5proc unpack [--loglevel=<loglevel>] [--logfile=<logfile>]
[--vstreams | --ole]
<hwp5file> [<out-directory>]
hwp5proc unpack --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--vstreams Process with virtual streams (i.e. parsed/converted
form of real streams)
--ole Treat <hwpfile> as an OLE Compound File. As a
result, some streams will be presented as-is. (i.e.
not decompressed)
Example:
$ hwp5proc unpack samples/sample-5017.hwp
$ ls sample-5017
Example:
$ hwp5proc unpack --vstreams samples/sample-5017.hwp
$ cat sample-5017/PrvText.utf8
Print the record structure.
Usage:
hwp5proc records [--simple | --json | --raw | --raw-header | --raw-payload]
[--treegroup=<treegroup> | --range=<range>]
[--loglevel=<loglevel>] [--logfile=<logfile>]
<hwp5file> <record-stream>
hwp5proc records [--simple | --json | --raw | --raw-header | --raw-payload]
[--treegroup=<treegroup> | --range=<range>]
[--loglevel=<loglevel>] [--logfile=<logfile>]
hwp5proc records --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--simple Print records as simple tree
--json Print records as json
--raw Print records as is
--raw-header Print record headers as is
--raw-payload Print record payloads as is
--range=<range> Print records specified in the <range>.
--treegroup=<treegroup>
Print records specified in the <treegroup>.
<hwp5file> HWPv5 files (*.hwp)
<record-stream> Record-structured internal streams.
(e.g. DocInfo, BodyText/*)
<range> Specifies the range of the records.
N-M means "from the record N to M-1 (excluding M)"
N means just the record N
<treegroup> Specifies the N-th subtree of the record structure.
Example:
$ hwp5proc records samples/sample-5017.hwp DocInfo
Example:
$ hwp5proc records samples/sample-5017.hwp DocInfo --range=0-2
If neither <hwp5file> nor <record-stream> is specified, the record stream is read from the standard input with an assumption that the input is in the format version specified by -V option.
Example:
$ hwp5proc records --raw samples/sample-5017.hwp DocInfo --range=0-2 > tmp.rec
$ hwp5proc records < tmp.rec
Print parsed binary models in the specified <record-stream>.
Usage:
hwp5proc models [--simple | --json | --format=<format> | --events]
[--treegroup=<treegroup> | --seqno=<seqno>]
[--loglevel=<loglevel>] [--logfile=<logfile>]
(<hwp5file> <record-stream> | -V <version>)
hwp5proc models --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--simple Print records as simple tree
--json Print records as json
--format=<format> Print records as formatted
--treegroup=<treegroup>
Print records in the <treegroup>.
<treegroup> specifies the N-th subtree of the
record structure.
--seqno=<seqno> Print a model of <seqno>-th record
-V <version>, --file-format-version=<version>
Specifies HWPv5 file format version
<hwp5file> HWPv5 files (*.hwp)
<record-stream> Record-structured internal streams.
(e.g. DocInfo, BodyText/*)
Example:
$ hwp5proc models samples/sample-5017.hwp DocInfo
$ hwp5proc models samples/sample-5017.hwp BodyText/Section0
$ hwp5proc models samples/sample-5017.hwp docinfo
$ hwp5proc models samples/sample-5017.hwp bodytext/0
Example:
$ hwp5proc models --simple samples/sample-5017.hwp bodytext/0
$ hwp5proc models --format='%(level)s %(tagname)s\n' \
samples/sample-5017.hwp bodytext/0
Example:
$ hwp5proc models --simple --treegroup=1 samples/sample-5017.hwp bodytext/0
$ hwp5proc models --simple --seqno=4 samples/sample-5017.hwp bodytext/0
If neither <hwp5file> nor <record-stream> is specified, the record stream is read from the standard input with an assumption that the input is in the format version specified by -V option.
Example:
$ hwp5proc cat samples/sample-5017.hwp BodyText/Section0 > Section0.bin
$ hwp5proc models -V 5.0.1.7 < Section0.bin
Find record models with specified predicates.
Usage:
hwp5proc find [--model=<model-name> | --tag=<hwptag>]
[--incomplete] [--dump] [--format=<format>]
[--loglevel=<loglevel>] [--logfile=<logfile>]
(--from-stdin | <hwp5files>...)
hwp5proc find --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--from-stdin get filenames fro stdin
--model=<model-name> filter with record model name
--tag=<hwptag> filter with record HWPTAG
--incomplete filter with incompletely parsed content
--format=<format> record output format
%(filename)s %(stream)s %(seqno)s %(type)s
--dump dump record
<hwp5files>... HWPv5 files (*.hwp)
Example: Find paragraphs:
$ hwp5proc find --model=Paragraph samples/*.hwp
$ hwp5proc find --tag=HWPTAG_PARA_TEXT samples/*.hwp
$ hwp5proc find --tag=66 samples/*.hwp
Example: Find and dump records of HWPTAG_LIST_HEADER which is parsed incompletely:
$ hwp5proc find --tag=HWPTAG_LIST_HEADER --incomplete --dump samples/*.hwp
Transform an HWPv5 file into an XML.
Note
This command is experimental. Its output format is subject to change at any time.
Usage:
hwp5proc xml [--embedbin]
[--no-xml-decl]
[--output=<file>]
[--format=<format>]
[--loglevel=<loglevel>] [--logfile=<logfile>]
<hwp5file>
hwp5proc xml --help
Options:
-h --help Show this screen
--loglevel=<level> Set log level.
--logfile=<file> Set log file.
--embedbin Embed BinData/* streams in the output XML.
--no-xml-decl Don't output <?xml ... ?> XML declaration.
--output=<file> Output filename.
<hwp5file> HWPv5 files (*.hwp)
<format> "flat", "nested" (default: "nested")
Example:
$ hwp5proc xml samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml
With --embedbin option, you can embed base64-encoded BinData/* files in the output XML.
Example:
$ hwp5proc xml --embedbin samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml