Word and Phrase Completion

I’m building an assistive speaking application for users with voice disorders. We have a stock library of about 3000 phrases which we use to seed a personal phrase database. It is trivial to find phrases that match a few key words, but I would like to build new phrases through prediction without an index, based on spoken English. Is this possible?

@spero welcome to the community and what an awesome-sounding project! Without using an Index and some heavy caveats, this won’t really be possible using Algolia. Algolia only really works using an Index and our recommendation engine works off of items that exist within an Index.

Algolia can’t form a sentence (at least coherently), if we returned words to you that the engine felt were recommended, it would just be a stream of words essentially. It wouldn’t have correct grammar, punctuation, or many of the other requirements.

However, you could potentially have entire sentences recommended to the Users instead. It still won’t be completely fool-proof and depending on how the recommendation events are generated it may not really work conversation to conversation. This is because our conversations wildly vary, we could be talking about summer in one conversation but in a different one be talking about winter. In this case, our engine may recommend a winter-related sentence when it’s actually summer. The engine can be customized around this exact issue, but it’s just one of the many caveats in my opinion.

Happy to help further, let us know! Thanks!

Thanks Michael. I think you may be taking things a level beyond what we need.

In our use case, the end-user has a vocal disorder. They may also have other physical disabilities, making positioning and clicking a mouse more difficult than for the average user. Our goal is to provide a personalized phrase library, with use counts associated with every phrase. A crude prototype, nowhere near fully functional, can be seen at Peri.

In the prototype, dwelling the mouse over “I” then “want” quickly gets me down to 33 available phrases to choose between. Adding “a” knocks me down to “hug.” and “kiss.” because those are the phrases in our library. If what the user really wants is a “turkey sandwich”, they need to type it out, after which the new phrase will become part of their phrase library.

What I am looking for is to automatically generate potential completions once the user has exhausted the personal library. Continuing the example above, “I want a t” should generate completions including “turkey”, “tomato”, “tricycle”, “toboggan”, … based on frequency of occurrence in spoken English.

Is this possible? I guess what I’m asking is for a virtual index corresponding to “spoken English”

@spero I see! Pinging @chuck.meyer to see if he has any ideas as well. I’ll give this some thought and see if I can think of a way to make it work potentially.

this capability exists in the market, but I have other reasons for using Algolia, and would prefer to keep it all in the family if possible

Hi @spero

Algolia Search uses prefix-based keyword matching, so it’s not well suited to the mores semantic search task you’re describing. However, our recent acquisition of search.io was specifically to help s provide the kinds of functionality your are describing using weighted graphs and semantic search capabilities (they call it Neuralsearch.

The plan is to incorporate these semantic capabilities into the existing Algolia APIs early next year. I’m not sure if thats helpful to you for this project though.

@dustin is our resident in-house expert on semantic search (he wrote this article Semantic Search: How It Works & Who It’s For - Algolia Blog | Algolia Blog) – he might be able to chime in with a few more details.

Hi @spero this is a challenging one for Algolia, and I don’t think semantic search would help. But I do think there’s something you might try if we get creative.

It also requires that I correctly understand what you’re looking to do, so if I restate:

  • A user types a word (or uses autocomplete to complete that word)
  • The user then types a character or more, and you want to provide some options of the next word to autocomplete
    • This next option should be ranked based on the word’s usage rate within English
    • I’m also assuming you do not care about the usage rate relative to the previous word (“by the company it keeps”)
  • Finally, I’m also assuming that you only want to go back one word when choosing the next word to suggest

If so, here’s one possible approach:

  • Index every word that you may want to autocomplete, with the following attributes:
    • The word
    • An array of the most common N words that come before that word
    • An integer related to either raw usage or rank, your choice

Then you will set your searchable attributes to be only the attribute with the word that you may want to autocomplete, set your word popularity as your custom ranking, and set your array of the most common N preceding words as a facet.

Then, at query time, you will provide a boost on the preceding word using optional filters, and you will search with prefixing on the word values.

If you want to use more fine-grained ranking of word suggestions (i.e., how common it is to follow the previous word) or N-grams where N is greater than 2, then Algolia won’t be the option to go with. But this is an Algolia-specific implementation that might work.

1 Like