I am indexing transcripts of interviews, and they are very long text documents. As recommended in your docs, I’ve broken each transcript up into smaller chunks for the index. Each interview has an ID, so chunks from the same interview all have the same value for the “interviewID” field.
Each interview, the “parent” of the transcript chunks in a way, has data for categories, participants, location, date etc. of the interview.
Say I am faceting on city of the interview. I have 1 interview from Brooklyn. Well, since the Brooklyn interview was broken in 344 chunks, the facet on city for Brooklyn has the number 344 next to it. This is highly misleading, as here is only 1 Brooklyn interview.
I am puzzling how to best organize the data. If I did not chunk the transcripts this would not be an issue but they are very long. Thanks for any ideas!