Handling Japanese text

Hello very happy with Algolia and have been using it for lots of my projects lately.

Most of my projects use Japanese text and I was wondering how you handle two byte Japanese text.

There is an area called Ebisu 恵比寿 near my work and I often search with it.

I think Algolia behaves as below as I enter the text

a) 恵 -> any text that has 恵
b) 恵比 -> any text that has 恵比 or 恵
c) 恵比寿 -> any text that has 恵比寿 or 恵比 or 恵

Questions

  1. Is this above assumption correct?
  2. for c) Text contains 恵比寿 has higher scoring than text which has 恵
    only then text with 恵比寿 will show up higher in search result?

I couldn’t find much about searching on Asian two-byte languages. Man y Asian languages including Chinese or Japanese are two bytes and don’ t have spaces between words.

If you can guide me anything about this it will be very much appreciated.

Thanks

Hi there,

Regarding the handling of Japanese inputs, we have two different cases:

  • No decomposition of the input is found in our dictionary, in that case the whole input must be found, in sequence. This is actually the case here: for the query 恵比寿 we will only show results containing 恵比寿 exactly.

  • Some part of the input can be found in our Japanese dictionary, in that case the whole input does not have to be in sequence, only the word found. Note that all words still have to appear by default.

Regards,