What's the recommended approach to recreate DocSearch records

I’m using DocSearch and the crawler, but I’m interested in upgrading my plan to white-label the widget and to take advantage of some better features. I’ve read a lot of the documentation and it’s primarily focused on ecommerce and media sites (not technical documnetation sites). So I’m trying to bridge the gap between the examples in the docs and how DocSearch works.

I’m trying to understanding the best approach to generating the index myself, but it’s not clear to me that I it’s feasible to recreate what the crawler is doing with my static site.

The crawler scrapes my site and creates a record for every URL path, every h2/h3/h4/h5 on the page, and every

on the page. An example of a record for a paragrapy of content is this:

{
  "version": "",
  "tags": [],
  "url": "https://example.com/path1/path2/path3/#heading",
  "url_without_variables": "https://example.com/path1/path2/path3/#heading",
  "url_without_anchor": "https://example.com/path1/path2/path3/",
  "anchor": "heading",
  "content": "Lorem ipseum blah blah...",
  "content_camel": "Lorem ipseum blah blah...",
  "lang": "en-us",
  "language": "en-us",
  "type": "content",
  "no_variables": false,
  "weight": {
    "pageRank": "0",
    "level": 0,
    "position": 71
  },
  "hierarchy": {
    "lvl0": "path2",
    "lvl1": "path3",
    "lvl2": "heading",
    "lvl3": null,
    "lvl4": null,
    "lvl5": null,
    "lvl6": null
  },
  "recordVersion": "v2",
  "hierarchy_radio": {
    "lvl0": null,
    "lvl1": null,
    "lvl2": null,
    "lvl3": null,
    "lvl4": null,
    "lvl5": null,
    "lvl6": null
  },
  "hierarchy_camel": [
    {
      "lvl0": "path2",
      "lvl1": "path3",
      "lvl2": "heading",
      "lvl3": null,
      "lvl4": null,
      "lvl5": null,
      "lvl6": null
    }
  ],
  "hierarchy_radio_camel": {
    "lvl0": null,
    "lvl1": null,
    "lvl2": null,
    "lvl3": null,
    "lvl4": null,
    "lvl5": null,
    "lvl6": null
  },
  "objectID": "71-https://example.com/path1/path2/path3/"
}

In contrast, here’s a sample record from the docs:

[
  {
    "objectID": 42,
    "title": "Breaking Bad",
    "episodes": [
      "Crazy Handful of Nothin'",
      "Gray Matter"
    ],
    "like_count": 978,
    "avg_rating": 1.23456,
    "air_date": 1356846157,
    "featured": true,
    "lead_role": {
      "name": "Walter White",
      "portrayed_by": "Bryan Cranston"
    },
    "_tags": ["tv series", "drugs"]
  }
]

My questions:

  1. To generate an index for a content site, is it recommended to create a record for every paragraph of content on every page?
    i. How would this be done with a static site generator? The Algolia Crawler seems particularly well suited for this type of scraping, while this seems especially difficult to generate myself.
    ii. The objectID’s of these records seems problematic because there’s simply an incrementing number prepended to the URL, which means adding a paragraph to a page will result in all new records. For example, if a paragraph were added higher in the page than my example above ("position": 71), then all records below it would get renumbered (and reindexed?) to "position": 72 and so on. What is the impact of this? Does it affect search results or only the count of “search operations”?
  2. What is the purpose of all the variations of the hierarchy attributes?
    • hierarchy_radio_camel
    • hierarchy_camel
    • hierarchy_radio
    • content_camel

Perhaps I’m thinking about this all wrong? Was DocSearch built as a stand-alone product and never intended to be “migrated” or upgraded to the regular Algolia search product? Any guidance would be appreciated.