================== Unicode Compliance ================== *jsre* provides level 1 support for Unicode compliant with `Unicode Technical Standard #18, UNICODE REGULAR EXPRESSIONS `_, version 1.7. The module supports all: * Binary Properties (e.g. *\\p{Alphabetic}*). * General Category Properties. (e.g. *\\pP*). * Scripts and Script Extensions. * Line_Break properties (e.g. *\\p{line_break=hyphen}*). * Numeric_Type properties (e.g. *\\p{numeric_type=decimal*). Property specification within the regular expression pattern is flexible; case does not matter, '-' and '_' are interchangable, and general categories and scripts may be referenced by property name. (e.g. *\\p{greek}* as well as *\\p{script=greek}*) Some special properties are supported. Appendix C of UTS #18 recommends a set of properties for use in regular expressions, which provide extensions and combinations of standard character classes. These are: *lower, upper, punct, digit, xdigit, alnum, space, blank, cntrl, graph, print, word* *word* is defined as in UTS #18 and includes digits, *\\w* uses the same definition. The zero width tests *\\b* *\\B* also use this definition to determine word boundaries. (Note that the more extensive algorithm given for word breaks in Unicode Standard Annex #29 is not used.) The *\\X* test for Extended Grapheme Cluster boundaries implements the extended version of the specification given in Unicode Standard Annex #29. Some additional properties defined in UTS #18 1.2.1 and 1.6 are also supported: *any, assigned, ascii* Note that *any* is every code point, unlike '.' which omits newline characters unless the DOTALL flag is set. The property: *newline* is provided to specify the set of new line characters in UTS #18 1.6, ie the familiar \\u000A, \\u000B etc as well as the Unicode characters such as \\u2028. The actual support for Unicode properties depends on the encoding used for searching; for non-Unicode encodings (e.g. *CP1250*) properties are interpreted as the set of code points that can be represented under that encoding.