[Search] How to retrieve specific post and author?

I have 2 indices: Post and User.
Each post has one user, and each user has many posts
My goal is to search for entries in the Post index, but also return the user information of the author from the User index.

How can I achieve this in instantsearch.js/algolia?

Thank you!

1 Like

Hi @jontan

you might want look at this topic Display data from another indice

I think the problem / solution was relatively similar. Short version : algolia indices doesn’t allow JOIN like you could use in a traditional relational DB but there are workaround.

Cheers :wink:

The answer from @pierre.aurele.martin is perfectly accurate!

Don’t hesitate to reach out if you have more issues!

Thanks for the fast replies!
I’m still having a little trouble understanding the solution for my use case.

Are you suggesting that i include my users in the post index like this:

{
“title”: “XXX”,
“body”: “XXX”,
“author”: “author_full_name”
}

If the author decides to change their name, this means that I will have to change all the posts from the author.

Please let me know what you think.

Indeed, this is the way to proceed!
You would have to do so, but there are some ways that would make it not as painful as it seems at first.

Algolia has a browse method that will allow you, along with some filtering on the author name, to easily update all the corresponding articles.

You can also think about maintaining a list of the objectID (the primary keys of an object for Algolia) of the posts of an author inside the author record, in order to easily find and update the posts if the name of the author changes

Either of those solution should help you updating without too much issue your records in case of a change.

Let me know what you think!

Ah! I think I understand your suggestion a lot more now.
I actually tried to implement the second version where the posts were part of the user object.
Since a post can be quite long, one test user actually had 1000 posts and exceeded the allowed size of the entry.

If I implement option one, editing the author’s name would cost n operations, where n is the number of posts a user has. is that correct?

In fact it probably cost n+1, as you will have the browse operation that will be used to retrieve the actual post before performing the partial update. The browse operation can retrieve up to 1000 items at a time, so if a user as more than a 1000 posts, then it could even be n+2. But the overhead of browse should still stay pretty low.

aww man, that might get pretty expensive.

I have an idea in regard to option 2:
Since i’m limited to the size of a user’s posts, if a user exceeds say, 100 posts, i will create another object with the same username. Two objects with the same username.

If you don’t mind me asking, is changing the author name an operation that is that common? Especially the intersection of having a lot of posts AND changing your name often?

To be sure I understand what you mean by “putting posts inside the user record”, are you talking about inlining the whole content of each post inside the user record?

Nah it’s probably not a very common occurrence. It will mostly be quite rare actually. However, if that is the best way, then i’ll stick to that.

Yes. I was toying with the idea of having the entire post as field in the user record.

I suspect this may be detrimental to the search, can you tell me to what extent?

And you are suspecting right! The issue is that several posts will be considered as the same record, which means that you won’t be able to find post from the same author separately, you would get them all at once as soon as one matches. You would also not be able to rank them one against the other, as they would be considered the same record. I would not recommend going this way :slight_smile:

What I was hinting at was to just put the objectID of the posts inside the author record. This way instead of performing a browse to retrieve the objects and then modifying them with partialUpdate, you would be able to directly perform the partialUpdate as you would already have the objectID thanks to them being inside the author record.
However, if you have almost always less than 1000 posts per user, this will hardly save any operation as you would only have 1 operation by 1000 posts coming from the browse

Thanks Leo,
That was also the conclusion I came to. Thank you so much for the help!

Hi - I’m running into the same relational issue and have read this and similar threads for how to “do a lookup” across what are 2+ source tables. I’d like to summarize my understanding. Can you please confirm that this is true of what you’d recommend myself and others with this same use case should do?

Following the above scenario:
@jontan has two source tables:

  1. Users (info about users including user’s name)
  2. Posts (info about posts users have authored)

The use case: Do a search for a subset of Posts by User name.

Method #1 (Recommended method if <1000 Posts per User)
Have 2 indexes: Users and Posts. Add array-type field to Users index containing a list of the objectIDs for each Post corresponding to that user.

For example:

Each time the user writes a new post, a record is 1st added to the Post Index which creates a new Post_ObjectID. Then a subsequent operation is needed to update the Post_ObjectIDs in the users index to include the new Post_ObjectID.

If the user updates her name, an operation is needed to do a partial update on the Users index for name.

Method #2 (Could handle scenarios where there are >1000 Posts per User)
Have only one index that includes both the Post info and User info.

In this scenario, all new POST records need to include first and last name of the User. If the user changes her name, a combination of browse/filter on users name is needed to update ALL posts for the user.

Is this correct? And is your recommendation to…?:

Go with method #1 if:

  • There are <1000 Posts or whatever is being arrayed
  • The field to be updated, in this case name, is something that will change frequently. I realize in this specific example name changed infrequently, but were there another frequently changing users field that was used for the search, then it would be preferable to store the Post_ObjectIDs in the users table

Go with method #2 if:

  • There are >1000 Post per user and thus it doesn’t work to store the Post_ObjectIDs as an array in the Users table.
  • Or, the search field, in this case name, is unlikely to change. In the event the user updates her name, the mass browse update for all historic posts is costly, so this method is not recommended if method #1 is possible.
  • If name, for some reason, changed frequently, the suggestion would be to limit the frequency of updates (similar to what is suggested here)

Is this summary correct? Am I missing any new developments as the thread is from 2017?

PS - I suspect this is a very common use case. It would be nice to see an example a tutorial for it in docs.