Indexing 1000's product variations

I have a holiday bookings site with 1000’s of rental properties, each property could be available or not on any given day (within ~18 months into the future), each day could have a different nightly price.

I am currently using Elasticsearch with Nested objects to represent pre-calculated available date/duration/price objects (e.g. Jan 1st, N nights, €X000).

If a property is available for the entire period, this could mean 10,000’s possible date/durations per property.

In reality, we restrict the duration to up to 30 days, and not all properties are available for the full ~500 days. This still results in ~5000 date/duration combinations to index.

Along with the date, duration and price, I am indexing useful bits of information to aid searching like Name, short descriptions, images, location names, geo locations etc.

Should I be indexing all this information alongside every single date/duration? Or is there a way to use parent/child relationships somehow? Or am I worrying about nothing since we’re talking about 10’s of millions of records to search - and this is something that can be handled “fairly simply” using distinct and so on?

Hello Guy,

Thanks for the detailed breakdown of information and sorry for the late answer!

Your approach is good. The difference with Algolia is that instead of storing nested objects with the date/duration/price information, you would have different records. By doing that you will be able to filter the results by date using numerical filters (just make sure to use a timestamp!)

A single record might look something like this:

{
    "name": "Countryside cottage",
    "desc": "...",
    "img": "...",
    "price": 320,
    "date": 1506700441 // unix timestamp here
}

With the above record, you would have around 550 records in total per property since there is one for each day. You could also have:

{
    "name": "Countryside cottage",
    "desc": "...",
    "img": "...",
    "price": 320,
    "start_date": 1506700441, // unix timestamp here
    "end_date": 1506702454, // unix timestamp here
}

Just like it’s the case today, you would only available location for a given night.

In your case, it’s probably easier to index all the context information alongside each record, especially since the overhead isn’t too big. Just as a side-note, another possibility is to have an index that only stores information on a location without price or date, and query it each time a search is performed on the booking index. Then we could merge the results on the frontend. It adds complexity and the benefits would only show if you had much bigger records.

By the way there is no problem to store millions of records on Algolia! And yes, you can also use the distinct feature if you want to only retrieve distinct rental property names.

I hope that helps, don’t hesitate if you have further questions :slight_smile: