Typos for Unsupported Languages/Alphabets

Hello!

So I’m currently working on a project with text in Ge’ez, a language that uses Ethiopic characters (the same as Amharic and Tigrinya). There are a handful of letters that are homophonous (e.g. ሰ and ሠ) and it frequently happens that people will use homophonous letters interchangeably (e.g. ሰላም and ሠላም for the same word).

What is the best way to go about handling these? Adding every possible spelling for every word used would be far too much work, but there’s no way I can think of to handle single character substitutions like these.

Thanks!

Hi Augustine,

In your estimation, Is this something that can be taken care by our TypoTolerance feature. For the specific short word you use as an example, setting the Typo Tolerance to 3 (default minimum is 4) allows for it to be found. Or is the case that a word might be filled with either homophonous letter resulting in many variations? If this is the case, can you offer some harder examples?

12 PM

Typo Tolerance Articles:



Thanks!
Jason

Hi Jason,

So, firstly, part of the difficulty is that a single character represents a consonant+vowel. Even ignoring the issue of homophonous letters, if one mistakenly uses the wrong consonant or vowel (even though the accompanying vowel or consonant is correct) then the entire character would be considered a typo according to Algolia. So, if I write ሠለም (śa-la-m) instead of ሰላም (sa-lā-m), that’s considered two typos in a three character word, though properly speaking only the “ā” in place of “a” is a typo – s and ś are homophonous and the second constant was still an “l”, just “lā” instead of “la”.

On homophonous letters, here’s a full explanation of possible homophones:

  1. there are five sets of homophonous consonants: ሀ/ሐ/ኀ, ሠ/ሰ, አ/ዐ, ጸ/ፀ, and ጰ/ፐ
  2. for the consonants ሀ, ሐ, ኀ, አ, and ዐ, the vowel “a” and “ā” are homophonous (so, ሀ/ሃ, ሐ/ሓ, ኀ/ኃ, አ/ኣ, ዐ/ዓ, but recall the above pairings as well)

So, for example, the possible spellings of the word ኃጢአት (ḫā-ṭi-ʾa-t) would be: ኃጢአት, ኀጢአት, ኃጢኣት, ኀጢኣት, ኃጢዐት, ኃጢዓት, ኀጢዐት, ኀጢዓት, ሓጢአት, ሐጢአት, ሓጢኣት, ሐጢኣት, ሓጢዐት, ሓጢዓት, ሐጢዐት, ሐጢዓት, ሃጢአት, ሀጢአት, ሃጢኣት, ሀጢኣት, ሃጢዐት, ሃጢዓት, ሀጢዐት, ሀጢዓት. These, properly, would be considered synonyms. But, as I said above, if we also have true typos to consider (say ቲ “ti” or ጤ “ṭe” instead of ጢ “ṭi”), plus these homophones, then the number of typos is unmanageable.

I hope that all makes sense!

Cheers,

Augustine