.. _es-guide-reference-index-modules-analysis-standard-tokenizer:

==================
Standard Tokenizer
==================

A tokenizer of type **standard** providing grammar based tokenizer that is a good tokenizer for most European language documents. It splits words at punctuation characters, removing punctuation. However, a  dot that's not followed by whitespace is considered part of a token. It also splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split. It recognizes email addresses and internet hostnames as one token.


The following are settings that can be set for a **standard** tokenizer type:


======================  ==================================================================================================================
 Setting                 Description                                                                                                      
======================  ==================================================================================================================
**max_token_length**    The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to **255**.  
======================  ==================================================================================================================