Tutorial: Indexing PDF Or Other File Contents For Searching

I made a step-by-step tutorial for using Tika to split a PDF into paragraphs, parsing the resulting HTML with Nokogiri, and indexing it in Algolia:

https://medium.com/@obahareth/indexing-pdf-or-other-file-contents-for-searching-b2499c23568f

4 Likes

Thanks for posting @omar! There’s a lot of potential for tackling different content types with Tika, if anyone has some experience using it please chime in.