.. _es-guide-reference-index-modules-analysis-pattern-analyzer: ================ Pattern Analyzer ================ An analyzer of type **pattern** that can flexibly separate text into terms via a regular expression. Accepts the following settings: The following are settings that can be set for a **pattern** analyzer type: =============== ========================================================== Setting Description =============== ========================================================== **lowercase** Should terms be lowercased or not. Defaults to **true**. **pattern** The regular expression pattern, defaults to **\W+**. **flags** The regular expression flags. =============== ========================================================== *IMPORTANT*: The regular expression should match the *token separators*, not the tokens themselves. Flags should be pipe-separated, eg **`CASE_INSENSITIVE|COMMENTS"**. Check "Java Pattern API `_ for more details about **flags** options. Pattern Analyzer Examples ------------------------- In order to try out these examples, you should delete the **test** index before running each example: .. code-block:: js curl -XDELETE localhost:9200/test Whitespace tokenizer """""""""""""""""""" .. code-block:: js curl -XPUT 'localhost:9200/test' -d ' { :ref:`settings ` s ` :ref:`analysis ` is ` { :ref:`analyzer ` er ` { :ref:`whitespace ` e ` :ref:`type ` pe ` "pattern", :ref:`pattern ` -guide-reference-index-modules-analysis-"\\s+">` } } } } }' curl 'localhost:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 'foo,bar baz' # "foo,bar", "baz" Non-word character tokenizer """""""""""""""""""""""""""" .. code-block:: js curl -XPUT 'localhost:9200/test' -d ' { :ref:`settings ` s ` :ref:`analysis ` is ` { :ref:`analyzer ` er ` { :ref:`nonword ` d ` :ref:`type ` pe ` "pattern", :ref:`pattern ` ide-reference-index-modules-analysis-"[^\\w]+">` } } } } }' curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'foo,bar baz' # "foo,bar baz" becomes "foo", "bar", "baz" curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'type_1-type_4' # "type_1","type_4" CamelCase tokenizer """"""""""""""""""" .. code-block:: js curl -XPUT 'localhost:9200/test?pretty=1' -d ' { :ref:`settings ` s ` :ref:`analysis ` is ` { :ref:`analyzer ` er ` { :ref:`camel ` l ` :ref:`type ` pe ` "pattern", :ref:`pattern ` )|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])">` } } } } }' curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d ' MooseX::FTPClass2_beta ' # "moose","x","ftp","class","2","beta" The regex above is easier to understand as: .. code-block:: js ([^\\p{L}\\d]+) # swallow non letters and numbers, | (?<=\\D)(?=\\d) # or non-number followed by number, | (?<=\\d)(?=\\D) # or number followed by non-number, | (?<=[ \\p{L} && [^\\p{Lu}]]) # or lower case (?=\\p{Lu}) # followed by upper case, | (?<=\\p{Lu}) # or upper case (?=\\p{Lu} # followed by upper case [\\p{L}&&[^\\p{Lu}]] # then lower case )