Using Cloud Functions to index subcollections in an Algolia index

Hello everyone,

This question refers to integrating Algolia with Firebase Firestore to search through Firebase documents and subcollections as described over here: https://firebase.google.com/docs/firestore/solutions/search

The challenge which I am trying to solve is how to update the Algolia index for a document with subcollections when either the document fields, or any of the subcollections or documents within them change/are added. To my knowledge, there isn’t a described solution on either the Algolia or Firestore side. After checking in with both support teams (thank you Paul-Louis and Gianluca!) I gained some valuable insights which I hope will help us reach a solution. I am sharing them below.

For reference, my Firestore DB currently looks like this:
Universities -> (have) Departments -> (have) Programs

Departments is a subcollection found in every university document, and Programs is a subcollection found in every department document.

What I want to achieve: Index the whole university object with all its subcollections (departments and programs within departments). Listen for document changes and update the algolia index in exactly the right place depending on whether a document or a subcollection within a document changed (varying nesting levels).

What I have tried:

The third code block in the tutorial referenced above described writing cloud functions, listening to document changes, and updating the Algolia index accordingly.

So right now when I just write:

const university = event.data.data();

I only get the fields (key:value) pairs of that university object stored in the index.

I have been reading the docs here (https://firebase.google.com/docs/reference/functions/functions.Event), here (https://firebase.google.com/docs/reference/functions/functions.firestore.DeltaDocumentSnapshot) and here (https://cloud.google.com/nodejs/docs/reference/firestore/0.8.x/DocumentReference) and it seems there is a way to get all subcollections of a document by calling something along the lines of:

let documentDepartmentRef = functions.firestore.doc(‘universities/’ + university.objectID);

documentDepartmentRef.getCollections().then(collections => {
for (let collection of collections) {
console.log(Found subcollection with id: ${collection.id});
}
});

But how do I add these subcollections to the specific university in the Algolia index?

This is the thread which I posted in Google Groups: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/google-cloud-firestore-discuss/Tt9qYEPEkFI/Oc4ZHaAZBAAJ

If you go through it you could see how the issue evolved by taking into account some new information:

  • The getCollections() method isn’t really built for this purpose according to Sam

  • Listening for changes to a document (in this case exports.onUniversityCreated = functions.firestore.document("universities/{ID}").onWrite(event => { does not catch any events associated with changes to its subcollections. We will need a new listener for every nesting level (subcollection) as shown in the code examples below.

  • We can use wildcards to get the ID, path, and parent of the document which fired the event. We can potentially use this information to store the event in the right place in the Algolia index.

  • When updating the index, we don’t have to always build up an entire University object and fill it out with all of its subcollections before sending it to the Index. it would be much better to just send the piece of information which changed to the exact right place.

I also ran into several major errors when implementing the solution for looping through subcollections found here: https://cloud.google.com/nodejs/docs/reference/firestore/0.8.x/DocumentReference#getCollections
Although we may not need to loop through subcollections for this solution (I am not 100% sure) I think it would be useful to raise the question now and let you guys know what I found.

let documentRef = firestore.doc('col/doc');

documentRef.getCollections().then(collections => {

  for (let collection of collections) {
    console.log(`Found subcollection with id: ${collection.id}`);
  }
});

The errors below are taken from the Cloud Functions error log.

  • I got a TypeError: functions.firestore.doc is not a function error for the first line. I changed it to functions.firestore.document('col/doc') and it works fine.

  • I got a TypeError: documentDepartmentRef.getCollections is not a function error for the documentRef.getCollection() function call. When I change it to a parameter (which seems to be present for some reason?) such as documentRef.getCollection the rest of the function does not compile because of the .then(collections =>{...}); structure. So I am quite certain that there is a function .getCollection() as it’s mentioned in the documents (the same I linked above), but it is not being recognized.

  • If I try to directly iterate though the collections with something like this:
    for (let department of documentDepartmentRef.getCollections) { ... }
    then I get a TypeError: Cannot read property 'Symbol(Symbol.iterator)' of undefined error (and rightfully so…)

Here is my full code below (Don’t worry, I regenerated my API key as soon as I saw my silly mistake of directly copy-pasting it into the Google Group).

const functions = require('firebase-functions');
var algoliasearch = require('algoliasearch');
var client = algoliasearch("YP4YNQ0I01", "XXXXXXXXXXXXXX");
var index = client.initIndex('universities');

// Listening to all university document changes
exports.onUniversityCreated = functions.firestore.document("universities/{ID}").onWrite(event => {
 console.log(`Object that called this is: ${event.resource}`);
 
 const university = event.data.data();
 
 // Give the Algolia record the same ID as the Firestore document. 
 university.objectID = event.data.id;

 // Old attempt to loop through subcollections
/*
 let documentDepartmentRef = functions.firestore.doc('universities/'+university.objectID); 

 documentDepartmentRef.getCollections().then(collections => {
 for (let department of collections) {
 console.log(`Found subcollection with id: ${department.id}`);
 
 let documentProgramRef = functions.firestore.doc('universities/' + department.id + '/programs'); 
 documentProgramRef

 documentProgramRef.getCollections().then(collections => {
 for (let program of collections) {
 console.log(`Found subcollection with id: ${program.id}`);
 
 
 }
 }); 
 */
 
 
 }
 }); 
 

 // Write to the algolia index
 return index.saveObject(university);
});

// Listening to all department document changes
exports.onDepartmentWrite = functions.firestore.document("universities/{universityID}/departments/{departmentID}").onWrite(event => {
  // React to changes in programs...
}

// Listening to all program document changes
exports.onProgramWrite = functions.firestore.document("universities/{universityID}/departments/{departmentID}/programs/{programID}").onWrite(event => {
  // React to changes in programs...
}

Thank you all for your help in advance! I hope we can find a robust and clear solution which we can later include in the docs for everyone to see and easily implement :slight_smile:

Hello and thanks for the detailed question!

I have a bit of experience with Firestore, but never had to synchronize multi-deep objects like this, so I’ll try to answer as best as I can.

The way I would do it is to listen to changes on universities first. If I catch a change, it means that one of my attributes for the university changed. As you mentionned, changes on subcollections are not intercepted, so it means that at that point I’m sure that only root-level attributes of my university have changed, and not the underlying departments and collection. I will just update the matching object in Algolia with the new version.

Then I would listen to universities/*/departments. From there, I would walk up the tree to the parent university object, using the .parent() method of the node, to get its ID. From there I would get a local copy of this university (including its subnodes). You can do that using Firebase, but if you already have the data in Algolia, you can actually find it on the Algolia side by its ID.

From this local copy, I would simply update the department child entry (the one that has triggered the change), and push back this local copy to Algolia.

Now I would listen to universities/*/department/*/programs and apply a similar logic, except that I’ll have to walk all the way up the tree back to the university (two levels) to get the ID. I will then have to update the program based on its it, and push the new version to Algolia.

This is of course all theoretical because I haven’t tried that for real, but this is how I would approach this problem. The main thing is that if it’s too hard to get a multi-level deep tree of data using Firebase, and that you actually just want to update an Algolia object, then maybe it’s easier to grab the Algolia object and update it instead.

You can potentially even go more granular by not updating the whole Algolia object everytime, but only updating the subpart that needs updating. But as you will need to walk all the way up the tree to the university and will have to get a copy of your record from Algolia anyway, it’s pretty much the same trouble.

Hope that helps :slight_smile:

1 Like

Hi,

I am encountering a similar issue as I need to index then keep updated a collection then subcollection.

Do you have any feedback on initial indexing of collections with subcollections?

For example ‘collection1/subcollection’

I currently have these files…

.env

ALGOLIA_APP_ID=xxxxx
ALGOLIA_API_KEY=xxxxx
ALGOLIA_INDEX_NAME='collection1'
FIREBASE_DATABASE_URL=https://xxx.firebaseio.com
FIREBASE_PROJECT_ID=xxx

Then index2agolia.js

// brings all the modules we need
const algoliasearch = require(‘algoliasearch’)
const dotenv = require(‘dotenv’)
const firebase = require(‘firebase’);
const firestore = require(‘firebase/firestore’);
// load values from the .env file in this directory into process.env
dotenv.config();
// initializes the firebase database.
firebase.initializeApp({
projectId: process.env.FIREBASE_PROJECT_ID,
databaseURL: process.env.FIREBASE_DATABASE_URL
})
const db = firebase.firestore();
// configure algolia
const algolia = algoliasearch(
process.env.ALGOLIA_APP_ID,
process.env.ALGOLIA_API_KEY
);
const index = algolia.initIndex(process.env.ALGOLIA_INDEX_NAME);

var docRef = db.collection(process.env.ALGOLIA_INDEX_NAME);
const records = ;
db.collection(process.env.ALGOLIA_INDEX_NAME).get()
.then((snapshot) => {
snapshot.forEach((doc) => {
// get the key and data from the snapshot
const childKey = doc.id;
const childData = doc.data();
// We set the Algolia objectID as the Firebase .key
childData.objectID = childKey;
// Add object for indexing
records.push(childData);
console.log(doc.id, ‘=>’, doc.data());
});

    // Add or update new objects
    index.saveObjects(records).then(() => {
        console.log('Documents imported into Algolia');
        process.exit(0);
    })
    .catch(error => {
        console.error('Error when importing documents into Algolia', error);
        process.exit(1);
    });
})
.catch((err) => {
    console.error('Error getting documents', error);
});

Thanks,

Rog.