Browse Source
Fix Thai being skipped from language detection (#13989 )
Thai does not separate words by spaces, so I figured out it should be
in 'reliable characters regexp' that denotes languages that do the same.
Related #13891 .
closed-social-v3
Sasha Sorokin
4 years ago
committed by
GitHub
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with
1 additions and
1 deletions
app/lib/language_detector.rb
@ -4,7 +4,7 @@ class LanguageDetector
include Singleton
include Singleton
WORDS_THRESHOLD = 4
WORDS_THRESHOLD = 4
RELIABLE_CHARACTERS_RE = / [ \ p{Hebrew} \ p{Arabic} \ p{Syriac} \ p{Thaana} \ p{Nko} \ p{Han} \ p{Katakana} \ p{Hiragana} \ p{Hangul}]+ /m
RELIABLE_CHARACTERS_RE = / [ \ p{Hebrew} \ p{Arabic} \ p{Syriac} \ p{Thaana} \ p{Nko} \ p{Han} \ p{Katakana} \ p{Hiragana} \ p{Hangul} \ p{Thai} ]+/m
def initialize
def initialize
@identifier = CLD3 :: NNetLanguageIdentifier . new ( 1 , 2048 )
@identifier = CLD3 :: NNetLanguageIdentifier . new ( 1 , 2048 )