How do you index static content?

Hi folks,

We’re currently looking into Algolia to replace Google Search Engine and so far everything is going well for the records in our application’s databases. The idea is to create a multi-site search like our existing implementation with Google.

However, there is one aspect that I haven’t been able to find an answer on: how to index static content.

One of our apps, our marketing website, has a bunch of hardcoded content that changes via a commit -> deploy workflow. That means that deploying would need to trigger a re-index. That is the only “change event” we have to index new content, or re-index existing content.

How can we best achieve such a thing? Has this been done before?

Hey Hannes,

Not sure if this will help, but it could be a good place to start: our Jekyll plugin allows static sites that are powered by the static site generator Jekyll to index/sync to Algolia for powerful search.

If you’re using Jekyll for your site, then “yay!” - if not, it might be a good start to figuring out how to integrate Algolia with your own site.

Let me know if that helps,

-Liam

Hi Liam,

While I haven’t dug into the code of that plugin yet, I’m not sure if it’ll be of much help. Jekyll posts/pages are quite simple in structure - a single file often means a single page, with all content in one place, and in a simple format - markdown. It’s easy to translate that to something you can index.

Our site has a less consistent structure, with different partials to make up some pages. Even the structure in a page can change - for instance, we might use anything from a h2 to a h5 for different purposes.

We’re dealing with a Rails app with some database-backed content, but also a lot of hardcoded pages with more complex markup than your average blog post for example.

Can you think of any other ways to achieve this?

Hey,

Given what you’re looking at, it sounds like a crawler is going to be the best route - for that, Algolia DocSearch is the best solution we have available. It sounds like you’re already talking with our DocSearch team - based on the scope and complexity of what you’re looking to implement, we should be able to provide you the best guidance on whether it is possible with Algolia (I suspect yes, but possibility and financial feasibility are two different things) as well as how we would approach it.

1 Like

Hello Hannes and Liam,

I just wanted to precise that our Jekyll plugin only cares about the generated HTML, it doesn’t parse the Markdown and your layouts at all. Then, each time you perform a change to your content, after the build step, you can simply run bundle exec jekyll algolia push and the content will be retrieved from your generated HTML pages (according to your configuration and our defaults) and be pushed to your Algolia index.

I hope it helps, :slight_smile:

2 Likes