A set of analyzers aimed at analyzing specific language text. The following types are supported: arabic, armenian, basque, bulgarian, brazilian, catalan, chinese, cjk,, czech, danish, dutch, english, finnish, french, galician, german, greek**, persian, hindi, hungarian, indonesian, italian, norwegian, portuguese, romanian, russian**, spanish, swedish, turkish, thai.
All analyzers support setting custom stopwords either internally in the config, or by using an external stopwords file by setting stopwords_path.
The arabic analyzer is built on top of arabic_letter tokenizer, and lowercase, stop, arabic_normalizer and arabic_stem filters.
The brazilian analyzer is built on top of standard tokenizer, and lowercase, standard, stop, and brazilian_stem filters.
The chinese analyzer is built on top of chinese tokenizer and chinese filter.
The cjk analyzer is built on top of cjk tokenizer and stop filter.
The czech analyzer is built on top of standard tokenizer, and standard, lowercase, stop and czech_stem filters. It comes with default stopwords but they can be set.
The dutch analyzer is built on top of standard tokenizer, and standard, stop and dutch_stem filters.
p .The french analyzer is built on top of standard tokenizer, and standard, stop, french_stem and lowercase filters.
The german analyzer is built on top of standard tokenizer, and standard, lowercase, stop, german_stem filters.
The greek analyzer is built on top of standard tokenizer, and greek_lowercase, stop filters.
The persian analyzer is built on top of arabic_letter tokenizer and lowercase, arabic_normalization, persian_normalization and stop filters.
The russian analyzer is built on top of russian_letter tokenizer and lowercase, stop and russian_stem filters. It comes with default stopwords but they can be set.
The thai analyzer is built on top of standard tokenizer, and standard, thai_word, stop filters.