Documents not showing up in search after update

Hi there,

I’m having a really weird problem with search results. My employer has a large index with around 35 million documents, and sometimes after a document is updated it seems to disappear from search results for a little while. When it is missing, it does not appear in searches by either text or document id. After a while, it will spontaneously show up again. Sometimes it shows up by object id first, and then in search a few minutes later. Other times it will show up in search first, and then ID a few minutes later.

Is this an expected behavior? My understanding was that a document would not go missing when updated, and the older version of the document would persist until a new one appears.

If this is not an expected behavior, do you have any advice for troubleshooting?

Hello @rebecca,

Could you tell us a bit more about your indexing process i.e. how does your document got updated? It does not sound normal that the document would be not available, for some minutes or even for few seconds. You should always see either the previous version or the new version. The fact that you have a gap may be related to an issue with the indexing logic itself, and we would be happy to help you there.

Hi Anthony,

We sync our documents via a python service that uses the index.save_objects method to push documents. My understanding is that this will overwrite any existing documents with the same object_id and create a new document if none exists. We are assigning our internal vendorVariantID as the objectID and this has been working fine for years.

Is there a way to attach a video to this post? I captured a video of attempting to search for an object by objectID, and retrieving nothing, then searching for the object by title and finding it, then returning to search by objectID, pasting the same id number, and finding the document that had been missing less than a minute before. I don’t understand how this could be caused by an indexing error.

Your understanding is correct, the .save_objects() method will indeed override any existing object or create new ones if they don’t already exist, provided that you correctly set the objectID field, which you did.

It’s weird that the problem only happen recently though.

As a next step, could you share with us the video that you did (by uploading it to YouTube or Vimeo since Discourse does not support video uploading), as well as the exact indexing code that you use? It would also help us if you could share your application ID and index name.

Hi Anthony,

Here is a link to a video of a document being returned inconsistently via search by objectID: https://youtu.be/6vyiv0Zpmg0 . This is not a new document, but one that has been found in that index before.

Here is the python function that we use to push documents to Algolia:

async def algolia_push(
    indices: List[SearchIndex], payloads: List[schema.ProductData]
) -> Tuple[List, List]:
    """Push a list of payloads to multiple Algolia indices."""
    json_payloads: List[Dict[str, Any]] = []
    exceptions: List[Exception] = []
    synced_to_algolia = int(datetime.utcnow().timestamp())
    for result in payloads:
        result.syncedToAlgolia = synced_to_algolia
        if isinstance(result, Exception):
            exceptions.append(result)
        else:
            json_payloads.append(result.asdict())
    responses = []
    for index in indices:
        responses.append(index.save_objects(json_payloads))
    log.info(f"Pushed {len(json_payloads)} records to Algolia")
    return exceptions, responses

Documents are queued for resyncing in a variety of places throughout our codebase, triggered by updates to the underlying products. There is no automated mechanism for deleting documents from Algolia and we do this manually very rarely.

Any ideas?

Thanks!
Rebecca

Oh, also, the index we care about is https://www.algolia.com/apps/2R4ALUQMAE/explorer/browse/prod_PRODUCTS

Thanks for this video and additional information. We’ve flagged our dashboard team to take a deeper look.

I’m currently not able to reproduce it for the given objectID (81372771), but should this occur again for this item or any other, would you be able to use the getObject method or endpoint to see if the object is retrieved? This would help us understand if it is display issue or something deeper.

Regardless, we are investigating and will let you know what we find. Thanks for bringing this it our attention.

Thank you! Right now, requests seem to be working using either the python client or the dashboard, which doesn’t really give us much information. I will definitely try submitting a request with the python client if I notice a missing search result again.

Cheers,
Rebecca

1 Like