[Search] How to retrieve specific post and author?

I have 2 indices: Post and User.
Each post has one user, and each user has many posts
My goal is to search for entries in the Post index, but also return the user information of the author from the User index.

How can I achieve this in instantsearch.js/algolia?

Thank you!

1 Like

Hi @jontan

you might want look at this topic Display data from another indice

I think the problem / solution was relatively similar. Short version : algolia indices doesn’t allow JOIN like you could use in a traditional relational DB but there are workaround.

Cheers :wink:

The answer from @pierre.aurele.martin is perfectly accurate!

Don’t hesitate to reach out if you have more issues!

Thanks for the fast replies!
I’m still having a little trouble understanding the solution for my use case.

Are you suggesting that i include my users in the post index like this:

{
“title”: “XXX”,
“body”: “XXX”,
“author”: “author_full_name”
}

If the author decides to change their name, this means that I will have to change all the posts from the author.

Please let me know what you think.

Indeed, this is the way to proceed!
You would have to do so, but there are some ways that would make it not as painful as it seems at first.

Algolia has a browse method that will allow you, along with some filtering on the author name, to easily update all the corresponding articles.

You can also think about maintaining a list of the objectID (the primary keys of an object for Algolia) of the posts of an author inside the author record, in order to easily find and update the posts if the name of the author changes

Either of those solution should help you updating without too much issue your records in case of a change.

Let me know what you think!

Ah! I think I understand your suggestion a lot more now.
I actually tried to implement the second version where the posts were part of the user object.
Since a post can be quite long, one test user actually had 1000 posts and exceeded the allowed size of the entry.

If I implement option one, editing the author’s name would cost n operations, where n is the number of posts a user has. is that correct?

In fact it probably cost n+1, as you will have the browse operation that will be used to retrieve the actual post before performing the partial update. The browse operation can retrieve up to 1000 items at a time, so if a user as more than a 1000 posts, then it could even be n+2. But the overhead of browse should still stay pretty low.

aww man, that might get pretty expensive.

I have an idea in regard to option 2:
Since i’m limited to the size of a user’s posts, if a user exceeds say, 100 posts, i will create another object with the same username. Two objects with the same username.

If you don’t mind me asking, is changing the author name an operation that is that common? Especially the intersection of having a lot of posts AND changing your name often?

To be sure I understand what you mean by “putting posts inside the user record”, are you talking about inlining the whole content of each post inside the user record?

Nah it’s probably not a very common occurrence. It will mostly be quite rare actually. However, if that is the best way, then i’ll stick to that.

Yes. I was toying with the idea of having the entire post as field in the user record.

I suspect this may be detrimental to the search, can you tell me to what extent?

And you are suspecting right! The issue is that several posts will be considered as the same record, which means that you won’t be able to find post from the same author separately, you would get them all at once as soon as one matches. You would also not be able to rank them one against the other, as they would be considered the same record. I would not recommend going this way :slight_smile:

What I was hinting at was to just put the objectID of the posts inside the author record. This way instead of performing a browse to retrieve the objects and then modifying them with partialUpdate, you would be able to directly perform the partialUpdate as you would already have the objectID thanks to them being inside the author record.
However, if you have almost always less than 1000 posts per user, this will hardly save any operation as you would only have 1 operation by 1000 posts coming from the browse

Thanks Leo,
That was also the conclusion I came to. Thank you so much for the help!

Hi - I’m running into the same relational issue and have read this and similar threads for how to “do a lookup” across what are 2+ source tables. I’d like to summarize my understanding. Can you please confirm that this is true of what you’d recommend myself and others with this same use case should do?

Following the above scenario:
@jontan has two source tables:

  1. Users (info about users including user’s name)
  2. Posts (info about posts users have authored)

The use case: Do a search for a subset of Posts by User name.

Method #1 (Recommended method if <1000 Posts per User)
Have 2 indexes: Users and Posts. Add array-type field to Users index containing a list of the objectIDs for each Post corresponding to that user.

For example:

Each time the user writes a new post, a record is 1st added to the Post Index which creates a new Post_ObjectID. Then a subsequent operation is needed to update the Post_ObjectIDs in the users index to include the new Post_ObjectID.

If the user updates her name, an operation is needed to do a partial update on the Users index for name.

Method #2 (Could handle scenarios where there are >1000 Posts per User)
Have only one index that includes both the Post info and User info.

In this scenario, all new POST records need to include first and last name of the User. If the user changes her name, a combination of browse/filter on users name is needed to update ALL posts for the user.

Is this correct? And is your recommendation to…?:

Go with method #1 if:

  • There are <1000 Posts or whatever is being arrayed
  • The field to be updated, in this case name, is something that will change frequently. I realize in this specific example name changed infrequently, but were there another frequently changing users field that was used for the search, then it would be preferable to store the Post_ObjectIDs in the users table

Go with method #2 if:

  • There are >1000 Post per user and thus it doesn’t work to store the Post_ObjectIDs as an array in the Users table.
  • Or, the search field, in this case name, is unlikely to change. In the event the user updates her name, the mass browse update for all historic posts is costly, so this method is not recommended if method #1 is possible.
  • If name, for some reason, changed frequently, the suggestion would be to limit the frequency of updates (similar to what is suggested here)

Is this summary correct? Am I missing any new developments as the thread is from 2017?

PS - I suspect this is a very common use case. It would be nice to see an example a tutorial for it in docs.

How would searching be done on here?

Hi mo1,

Can you clarify what you mean by searching on here? Do you mean the community forum in general? Or are you looking for specifics about something in this thread?

Hi Kevin,

This would be specific to the example given by @whitney. I am actually running into the same problem where I have a user has_many posts. I’d like to search posts and return the user that matches the query on posts AND the posts.

So if I search for ‘sneakers’, I’d like to find all users that have posts with ‘sneakers’ in their title AND return all the posts for the given user that match the query.

I looked into usage of distinct to group but it’s not great. I’d still have to do the grouping on the frontend and it’s limited by the number assigned to distinct.

Following @whitney’s suggestions, I am still unsure how I can search for posts when I have 2 indexed that are linked by Post_ObjectIDs field.

Hello @mo1,

If I understand correctly, you’d like to perform a search operation with a sneakers query which returns:

  1. all users who have authored at least one post with sneakers in its title
  2. all posts of a specific user containing sneakers in their title or content

Is that correct?

If so, I think the best way for you to do this would be to perform multiple queries:

  • for 1., you could perform a search against the posts index with distinct=1, restrictSearchableAttributes=["title"] and attributeForDistinct set in your settings to authorId. The response you’ll get will contain all posts containing sneakers in their title, but will be limited to 1 post per author. From there, you can list all the authorIds and then perform a getObjects() single call to the authors index to retrieve all the users you were looking for. Keep in mind that the initial search query against the posts index is paginated with hitsPerPage/page parameters, so you may want to iterate over all the results and/or use a high value of hitsPerPage to limit the number of calls.

  • for 2., this is a bit easier, you can simply perform the sneakers search query against the posts index with a filters="authorId:XXX" to limit the results to the posts authored by a specific user.

For reference, here are all the parameters and methods I’ve mentioned here:

Let me know if something is still unclear or if this cannot fit your use-case, happy to help.

1 Like