Searching through Algolia hierarchically with version filters

Dear Algolia-team and users,

Within our company, we’re trying to make our documentation searchable through Algolia. So far we’re in love with how this works and how accurate results are.

For one of the requirements, I made an implementation I’m not satisfied with. It works, but it should/could be more efficient. Our use-case is as follows: we have an X number of products, where each product has versions. So, for example, we have Product1 in versions 1, 1.1, 1.2, 2.0, 2.1 and Latest.

When searching, below the input text-field, we have a several rows showing each product, with a dropdown of versions and a checkbox whether this product is enabled/disabled within the search. So if I want to show this, I need to retrieve from Algolia which products are known, and which versions it has.

At the moment, when opening the page, I quickly do an empty search, which will give me the product hierarchical facets, after which I loop through them, which will result in all versions as facets.

I feel this could/should be done within one single (initial) search instead of 7 (for 7 products).

Any suggestions on how this should be done? Please feel free to ask for clarification.

Thanks in advance.

1 Like

Is it possible to see the actual implementation? Or maybe see the code?

1 Like

I’m not awfully proud of this code, so bare with me, since this is also the first time I’m actually working with EmberJS. I’ve filtered out some code that seems unnecessary for what I’m trying to achieve. So the idea is to get the tree of hierarchical facets, so we can show them as I described above.


const algolia = algoliasearch(Discourse.SiteSettings.algolia_application_id, Discourse.SiteSettings.algolia_search_key);
const helper = algoliasearchHelper(algolia, Discourse.SiteSettings.algolia_index_name, {
  hierarchicalFacets: [{
    name: 'categories',
    attributes: ['hierarchicalCategories.lvl0', 'hierarchicalCategories.lvl1'],
    separator: ' > ',
    showParentLevel: false
  }],
  disjunctiveFacets: ['hierarchicalCategories.lvl1'],
  hitsPerPage: 35,
  maxValuesPerFacet: 5
});

export default Ember.Controller.extend({
  hits: [],
  results: {},
  productArray: [],
  versions: [],
  parentsRetrieved: false,
  childrenRetrieved: false,
  numParentsRetrieved: 0,
  consideredParents: [],

  handleParents (hits) {
    this.set('parentsRetrieved', true);

    if (hits.length > 0) {
      for (let hit of hits) {
        if (this.get('consideredParents').indexOf(hit.name) === -1) {
          this.get('consideredParents').pushObject(hit.name);
        }
        hit.versions = [];
        hit.selected = hit.name + ' > latest';
        this.get('productArray').pushObject(hit);
      }
    }
  },

  handleVersions (hits) {
    var parentName = hits[0].path.split('>').map(x => x.trim())[0];

    const index = this.get('productArray').findIndex(x => x.name == parentName); 
    const parent = this.get('productArray').objectAt(index);
    
    // I don't know how to properly add an array to an object in Ember
    // So forgive me for what I'm about to do...
    let newObj = Object.assign({} , parent); // Clone the parent
    newObj['versions'] = hits; // Add the versions

    this.get('productArray').removeAt(index); // Remove the original node
    this.get('productArray').pushObject(newObj) // Add the new one
  },

  handleHits: (function(res) {
    const results = this.get('results');
    this.set('hits', results.hits);

    if (!results || results.hits.lenght === 0) return;

    if (this.get('parentsRetrieved') === false) {
      this.handleParents(results.getFacetValues('categories').data);
      helper.toggleFacetRefinement('categories', this.get('consideredParents')[0]).search();
    }
    else if (this.get('childrenRetrieved') === false) {
      let that = this;
      for (let item of results.getFacetValues('categories').data) {
          if (item.isRefined) {
            if (item.data) {
              this.set('numParentsRetrieved', this.get('numParentsRetrieved')+1);
              this.handleVersions(item.data);
            }

            if (this.get('numParentsRetrieved') < this.get('consideredParents').length) {
              helper.toggleFacetRefinement('categories', this.get('consideredParents')[this.get('numParentsRetrieved')]).search();
            } else {
              this.set('childrenRetrieved', true);
              this.refineSearch();
            }
          }
      };
    }

  }).observes('results'),

  refineSearch: function(product=null) {
      helper.clearRefinements();

      helper.search();
  },

  init: function () {
    this._super( ...arguments );

    // Bind the search results
    helper.on('result', (results) => this.set('results', results));

    // Do a first search to trigger the retrieval of products and versions
    helper.search();
  }

});

So, what I tried to do is as follows:

  • On init, I perform an initial search (init function)
  • Results are stored in “results”, which is being observed by the “handleHits”-function
  • This function checks whether we’ve handled Products and/or Versions. Since it’s init, we have not
  • First run, it will see that the Products have not been filled yet, triggering the handleParens() function
  • It will retrieve the parents from the hits, and push all parents (so the lvl0 of the hierarchical facts) in an array
  • When done, it will trigger a search with the first parent (product), triggering handleHits() again, and checking for all children (versions) of that parent.
  • Afterwards, it removes all filters and searches again.

So what I end up with, is an array with all Products (hierarchicalCategories.lvl0) with a child node “versions” with all it’s versions (hierarchicalCategories.lvl1). These categories are saved in Algolia in “categories” with “product > version” as hierarchical facets.

There must be a better way to achieve this.

Please don’t hesitate to ask for clarification.

Sorry for the late answer @Arjen

Let’s just talk about your use case as I feel like the implementation is misleading me. I understand that your records are products and each product can have multiple versions. Do you have all the different versions in a single product record or do you have a record per product + version?

Then in terms of display you want to display all the versions for each product in your results? Then you’d like to be able to filter out the results per version?

Cheers,

Dear @Bobylito,

Thanks for your reply.
Your interpretation is correct. I have documentation of our products, but documentation per version of a product can (slightly) differ, so our records consist of documents (or rather sections as described here). Each document/record has a “product” and “version” attribute telling where the record belongs to.

A document-record generally looks like this:

{
  "title": "Document Title",
  "content": "...truncated content...",
  "itemType": "documentation",
  "recordType": "section",
  "recordTypeScore": 3,
  "product": "Product 1",
  "version": "2.10.1",
  "sourceArticle": "unique-product-identifier",
  "categories": [
    "Product 1",
    "2.10.1"
  ],
  "hierarchicalCategories": {
    "lvl0": "Product 1",
    "lvl1": "Product 1 > 2.10.1"
  },
  "objectID": "6776451"
}

Let me show you a screenshot from how it should look and function. I think that will make everything a bit more clear.

So we have a search field, and a list of filters. The idea is that you can select in which product and specific version you can search. You’re also able to toggle a product. For example, in this screenshot, when searching we’ll look in the documentation of “Product 1 > latest”, “Product 2 > 2.10.0”, “Product 4 > Latest”, and “Product 5 > Latest” (as shown in the Applied Filters at the bottom). Product 3 is toggled off.

As I mentioned before, we figured we’d save this versioning hierarchically in “hierarchicalCategories” with “>” as separator (as you can see in the Algoliahelper configuration in my previous post.

Hope this clarifies the use case.

1 Like

Thanks for your explanations, it really helps a lot.

From what I see hierarchicalCategories.lvl1 encode both the product and the version. Therefore, I don’t understand why you need to do more than one query to get all the product and the versions. Can you explain?

After spending too much time on this issue, and fixing the code to fit our needs, I am now mainly in the process of smacking myself senselessly against a wall.

Your last reply got me thinking why I didn’t retrieve just the hierarchicalCategories.lvl1 as you suggested, since it seems really obvious. I tested the response from Algolia, and noticed my disjunctiveFacets only contained 5 entries (rather than 15 or so). I’m pretty sure I ran into this earlier, and figured that it was because those 5 are the only products/versions present in the initial search results.

Apparently, I missed this line in the initial configuration: maxValuesPerFacet: 5, meaning I would obviously only see those 5. After removing the hierarchicalFacets from the configuration and the maxValuesPerFacet, I indeed ended up with a nice list of all products and versions.

After a little refactor round and a lot less lines of code and horrible logic, I ended up with exactly what we need in a way that actually makes sense.

@Bobylito, thank you for your replies and help. You released me from one of those moments where you’re stuck within your own head and couldn’t see the entire picture anymore.

2 Likes

Happy to hear that I was able to help :slight_smile:

1 Like