Searching content of source-code

I’m looking for best practices for building a search engine for code.
I’d like to replicate GitHub search, because of low quota GitHub gives to search content of repositories.
What should I consider when indexing content of code into Algolia? Is there anything that should prevent me from successfully indexing source code?

Algolia’s Search can be a tricky tool to use for code search. You have to remember that our search is optimized for prefix-matching keyword search. We don’t have a tokenizer or any sort of semantic graph and results. You can disable some of the prefix matching features, but at a cost of speed and relevance.

You could simulate some of this by pre-tokenizing your code and building more complex data structures in your index records, but the results aren’t going to be as good as what Github can offer.

Algolia is adding more graph-based, semantic search options as a direct result of our acquisition of and I suspect this is going to open much more interesting options for source code search, but those won’t be available until next year.

1 Like

Thanks Chuck!
A followup question while I have you here :slight_smile: - how did you guy implement this behavior on your docs site - where you display a code snippet in your page preview. I see it’s not searchable, but is it part of a metadata you have on every page?

I want to make a very similar search experience on my docs site, where I maybe show the first paragraph of the page, and any code snippet that may be relevant. I also tried to extract surrounding paragraphs for every entry the crawler extracts, to show the user a bigger context of a search result, but it became really messy logic in the crawler.
Can you share any hints for implementation of what you did for such immersive experience?

Yeah, our doc search is pretty sweet. Sarah Dayan actually wrote a blog about it that covers the implementation in broad strokes.

Check out the section on “Showing detailed information in a preview panel” – it doesn’t explain the specifics, but should give you a blueprint to building your own preview panel.