hwp5proc: HWPv5 처리기

HWPv5 파일에 여러가지 작업을 수행한다.

사용법:

hwp5proc <command> [<args>...]
hwp5proc [--version]
hwp5proc [--help]
hwp5proc [--help-commands]

   --version        Show version and copyright information.
-h --help           Show help messages.
   --help-commands  Show available commands.

명령: version

<hwp5file>의 HWP 파일 포맷 버전을 출력한다.

사용법:

hwp5proc version [options] <hwp5file>
hwp5proc version --help

옵션:

-h --help              Show this screen
   --loglevel=<level>  Set log level.
   --logfile=<file>    Set log file.

명령: header

HWP 파일 헤더를 출력한다.

사용법:

hwp5proc header [options] <hwp5file>
hwp5proc header -h

옵션:

-h --help              Show this screen
   --loglevel=<level>  Set log level.
   --logfile=<file>    Set log file.

명령: summaryinfo

<hwp5file>의 요약정보(summary information)을 출력한다.

사용법:

hwp5proc summaryinfo [options] <hwp5file>
hwp5proc summaryinfo --help

옵션:

-h --help              Show this screen
   --loglevel=<level>  Set log level.
   --logfile=<file>    Set log file.

명령: ls

<hwp5file>의 스트림 목록을 출력한다.

사용법:

hwp5proc ls [--loglevel=<loglevel>] [--logfile=<logfile>]
            [--vstreams | --ole]
            <hwp5file>
hwp5proc ls --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --vstreams           Process with virtual streams (i.e. parsed/converted
                        form of real streams)
   --ole                Treat <hwpfile> as an OLE Compound File. As a
                        result, some streams will be presented as-is. (i.e.
                        not decompressed)

예: 가상 스트림을 제외한 목록을 출력한다:

$ hwp5proc ls sample/sample-5017.hwp

\x05HwpSummaryInformation
BinData/BIN0002.jpg
BinData/BIN0002.png
BinData/BIN0003.png
BodyText/Section0
DocInfo
DocOptions/_LinkDoc
FileHeader
PrvImage
PrvText
Scripts/DefaultJScript
Scripts/JScriptVersion

예: 가상 스트림을 포함한 목록을 출력한다:

$ hwp5proc ls --vstreams sample/sample-5017.hwp

\x05HwpSummaryInformation
\x05HwpSummaryInformation.txt
BinData/BIN0002.jpg
BinData/BIN0002.png
BinData/BIN0003.png
BodyText/Section0
BodyText/Section0.models
BodyText/Section0.records
BodyText/Section0.xml
BodyText.xml
DocInfo
DocInfo.models
DocInfo.records
DocInfo.xml
DocOptions/_LinkDoc
FileHeader
FileHeader.txt
PrvImage
PrvText
PrvText.utf8
Scripts/DefaultJScript
Scripts/JScriptVersion

명령: cat

<hwp5file>의 지정된 스트림을 표준출력으로 출력한다.

사용법:

hwp5proc cat [--loglevel=<loglevel>] [--logfile=<logfile>]
             [--vstreams | --ole]
             <hwp5file> <stream>
hwp5proc cat --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --vstreams           Process with virtual streams (i.e. parsed/converted
                        form of real streams)
   --ole                Treat <hwpfile> as an OLE Compound File. As a
                        result, some streams will be presented as-is. (i.e.
                        not decompressed)

예:

$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg | file -

$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg > BIN0002.jpg

$ hwp5proc cat samples/sample-5017.hwp PrvText | iconv -f utf-16le -t utf-8

$ hwp5proc cat --vstreams samples/sample-5017.hwp PrvText.utf8

$ hwp5proc cat --vstreams samples/sample-5017.hwp FileHeader.txt

ccl: 0
cert_drm: 0
cert_encrypted: 0
cert_signature_extra: 0
cert_signed: 0
compressed: 1
distributable: 0
drm: 0
history: 0
password: 0
script: 0
signature: HWP Document File
version: 5.0.1.7
xmltemplate_storage: 0

명령: unpack

<hwpfile>의 스트림들을 디렉터리로 풀어낸다.

사용법:

hwp5proc unpack [--loglevel=<loglevel>] [--logfile=<logfile>]
                [--vstreams | --ole]
                <hwp5file> [<out-directory>]
hwp5proc unpack --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --vstreams           Process with virtual streams (i.e. parsed/converted
                        form of real streams)
   --ole                Treat <hwpfile> as an OLE Compound File. As a
                        result, some streams will be presented as-is. (i.e.
                        not decompressed)

예:

$ hwp5proc unpack samples/sample-5017.hwp
$ ls sample-5017

예:

$ hwp5proc unpack --vstreams samples/sample-5017.hwp
$ cat sample-5017/PrvText.utf8

명령: records

레코드 구조를 출력한다.

사용법:

hwp5proc records [--simple | --json | --raw | --raw-header | --raw-payload]
                 [--treegroup=<treegroup> | --range=<range>]
                 [--loglevel=<loglevel>] [--logfile=<logfile>]
                 <hwp5file> <record-stream>
hwp5proc records [--simple | --json | --raw | --raw-header | --raw-payload]
                 [--treegroup=<treegroup> | --range=<range>]
                 [--loglevel=<loglevel>] [--logfile=<logfile>]
hwp5proc records --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --simple             Print records as simple tree
   --json               Print records as json
   --raw                Print records as is
   --raw-header         Print record headers as is
   --raw-payload        Print record payloads as is

   --range=<range>      Print records specified in the <range>.
   --treegroup=<treegroup>
                        Print records specified in the <treegroup>.

<hwp5file>              HWPv5 files (*.hwp)
<record-stream>         Record-structured internal streams.
                        (e.g. DocInfo, BodyText/*)
<range>                 Specifies the range of the records.
                         N-M means "from the record N to M-1 (excluding M)"
                         N means just the record N
<treegroup>             Specifies the N-th subtree of the record structure.

예:

$ hwp5proc records samples/sample-5017.hwp DocInfo

예:

$ hwp5proc records samples/sample-5017.hwp DocInfo --range=0-2

<hwp5file>과 <record-stream>이 주어지지 않으면, 레코드 스트림을 표준 출력에서읽어들인다. 이 때 입력의 포맷 버전은 -V 옵션으로 주어진 값인 것으로 가정한다.

예:

$ hwp5proc records --raw samples/sample-5017.hwp DocInfo --range=0-2 > tmp.rec
$ hwp5proc records < tmp.rec

명령: models

지정된 <record-stream>을 파싱한 바이너리 모델들을 출력한다.

사용법:

hwp5proc models [--simple | --json | --format=<format> | --events]
                [--treegroup=<treegroup> | --seqno=<seqno>]
                [--loglevel=<loglevel>] [--logfile=<logfile>]
                (<hwp5file> <record-stream> | -V <version>)
hwp5proc models --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --simple             Print records as simple tree
   --json               Print records as json
   --format=<format>    Print records as formatted

   --treegroup=<treegroup>
                        Print records in the <treegroup>.
                        <treegroup> specifies the N-th subtree of the
                        record structure.
   --seqno=<seqno>      Print a model of <seqno>-th record

-V <version>, --file-format-version=<version>
                        Specifies HWPv5 file format version

<hwp5file>              HWPv5 files (*.hwp)
<record-stream>         Record-structured internal streams.
                        (e.g. DocInfo, BodyText/*)

예:

$ hwp5proc models samples/sample-5017.hwp DocInfo
$ hwp5proc models samples/sample-5017.hwp BodyText/Section0

$ hwp5proc models samples/sample-5017.hwp docinfo
$ hwp5proc models samples/sample-5017.hwp bodytext/0

예:

$ hwp5proc models --simple samples/sample-5017.hwp bodytext/0
$ hwp5proc models --format='%(level)s %(tagname)s\n' \
        samples/sample-5017.hwp bodytext/0

예:

$ hwp5proc models --simple --treegroup=1 samples/sample-5017.hwp bodytext/0
$ hwp5proc models --simple --seqno=4 samples/sample-5017.hwp bodytext/0

<hwp5file>과 <record-stream>이 주어지지 않으면, 레코드 스트림을 표준 출력에서읽어들인다. 이 때 입력의 포맷 버전은 -V 옵션으로 주어진 값인 것으로 가정한다.

예:

$ hwp5proc cat samples/sample-5017.hwp BodyText/Section0 > Section0.bin
$ hwp5proc models -V 5.0.1.7 < Section0.bin

명령: find

지정된 조건을 만족하는 레코드 모델들을 찾는다.

사용법:

hwp5proc find [--model=<model-name> | --tag=<hwptag>]
              [--incomplete] [--dump] [--format=<format>]
              [--loglevel=<loglevel>] [--logfile=<logfile>]
              (--from-stdin | <hwp5files>...)
hwp5proc find --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --from-stdin         get filenames fro stdin

   --model=<model-name> filter with record model name
   --tag=<hwptag>       filter with record HWPTAG
   --incomplete         filter with incompletely parsed content

   --format=<format>    record output format
                        %(filename)s %(stream)s %(seqno)s %(type)s
   --dump               dump record

<hwp5files>...          HWPv5 files (*.hwp)

예: 문단 찾기:

$ hwp5proc find --model=Paragraph samples/*.hwp
$ hwp5proc find --tag=HWPTAG_PARA_TEXT samples/*.hwp
$ hwp5proc find --tag=66 samples/*.hwp

예: 온전히 파싱되지 않은 HWPTAG_LIST_HEADER 레코드들을 찾아 출력한다:

$ hwp5proc find --tag=HWPTAG_LIST_HEADER --incomplete --dump samples/*.hwp

명령: xml (실험적)

HWPv5 파일을 XML로 변환한다.

주석

이 명령은 실험적입니다. 출력 형식은 언제든 바뀔 수 있습니다.

사용법:

hwp5proc xml [--embedbin]
             [--no-xml-decl]
             [--output=<file>]
             [--format=<format>]
             [--loglevel=<loglevel>] [--logfile=<logfile>]
             <hwp5file>
hwp5proc xml --help

옵션:

-h --help               Show this screen
   --loglevel=<level>   Set log level.
   --logfile=<file>     Set log file.

   --embedbin           Embed BinData/* streams in the output XML.
   --no-xml-decl        Don't output <?xml ... ?> XML declaration.
   --output=<file>      Output filename.

<hwp5file>              HWPv5 files (*.hwp)
<format>                "flat", "nested" (default: "nested")

예:

$ hwp5proc xml samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml

--embedbin 옵션을 주면, BinData/* 의 파일들을 base64로 인코딩하여 출력 XML에 내장한다.

예:

$ hwp5proc xml --embedbin samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml

목차

이전 항목

pyhwp

다음 항목

변환기들 (실험적)

현재 문서