Algolia Community

SEO-friendly instantsearch with infinite scrolling

instantsearch

#1

Has anyone successfully implemented an SEO-friendly algolia-driven search page utilising instantsearch.js and infinite scrolling?

When I say SEO-friendly, I mean as close to Google recommendations for infinite scrolling pages as possible. Google recommendations are here: https://webmasters.googleblog.com/2014/02/infinite-scroll-search-friendly.html

The Algolia demo site for instantsearch.js/infinite scroll does not seem very SEO friendly. Here is the demo site: https://community.algolia.com/demo-infinite-scroll/

Specifically, it doesn’t do do the following:

  • It doesn’t load any product-specific content in the base page markup - initial products are loaded into the DOM on load via Javascript.
  • It uses hash-based url parameters rather than query strings. Hash-based is OK by Google but is considered “less optimal” than query strings.
  • Even better would be user-friendly URLs for faceted results like /search/brand/Sony but I think this would be down to individual implementations and URL-rewrites.
  • Hash-based URLs are not in the href of any tags on the page, so there is no chance Google will reach these anyway.
  • There is no change in the URL to update which (virtual) page you are on each time the infinite scroll loads more products.
  • Whilst you can manually update the p parameter in the query string to begin the products at a specific page, infinite scrolling only then works in a downward direction, not an upward direction, so it’s not possible to get to products on previous pages.
  • There is no pagination in the < head > using rel=next and rel=prev

In a nutshell, the page should operate very much like the demo recommended by Google at http://scrollsample.appspot.com/items

Getting this all working seems like a big challenge so I’d prefer not to re-invent the wheel. Any resources or links to successful implementations would be appreciated!

Cheers!


#2

Thanks for sharing that article, there are indeed some good take aways.
As you’ve spotted our implementation of the infinite scroll on a result page, as it is today, is not optimised for SEO. It would require a few tweaks to make it Google-friendly.

It doesn’t load any product-specific content in the base page markup - initial products are loaded into the DOM on load via Javascript.

That’s right, all the content is loaded dynamically via javascript once the DOM is ready. That being said, we’ve recently noticed that Google now indexes pages with content loaded asynchronously, like on that implementation searchstone.io (see indexed pages site:searchstone.io)

It uses hash-based url parameters rather than query strings. Hash-based is OK by Google but is considered “less optimal” than query strings.

That’s something that InstantSearch.js let you easily configure. For now, looking at the InfiniteScroll implem. code I can see that we passed useHash: true (probably for extended compatibility with IE9), but you don’t have to. By default InstantSearch.js uses the History API, and doesn’t use hash in the url.

Even better would be user-friendly URLs for faceted results like /search/brand/Sony but I think this would be down to individual implementations and URL-rewrites.

That’s something you could add to your implementation by creating an InstantSearch.js CustomWidget

Hash-based URLs are not in the href of any tags on the page, so there is no chance Google will reach these anyway.

Right, that’s something we’d need to investigate for future evolutions of InstantSearch.js, and SEO friendliness. Another way to get robots crawling those pages is to feed them with sitemaps composed of urls pointing to results pages for a given: brand, brand + type, category/sub-category. That’s what has been done on searchstone.io, and it worked well.

There is no change in the URL to update which (virtual) page you are on each time the infinite scroll loads more products.

Good point, I’ve created a github issue on the project repo, for future addition. Feel free to contribute.

Whilst you can manually update the p parameter in the query string to begin the products at a specific page, infinite scrolling only then works in a downward direction, not an upward direction, so it’s not possible to get to products on previous pages.

Good point, tracked in that github issue

There is no pagination in the , using rel=next and rel=prev

That’s probably something you could add to your InfiniteScroll implementation, using the InstantSearch.js pagination widget (I haven’t tested it in that use case)

@basilisab Have you already started any implementation on your side?


#3

Hi Alex,

Thanks for all the responses. I’ve started an implementation but its currently working as per your demo sites. Overall it’s working great from a usability perspective - it’s just the SEO that I’m concerned about.

That being said, we’ve recently noticed that Google now indexes pages with content loaded asynchronously, like on that implementation searchstone.io (see indexed pages site:searchstone.io)

I don’t have a lot of confidence in this at the moment. I’m sure Google are working on improving their engine in this regard but I’m not going to rely on it especially given all the recommendations.

What are your thoughts on using PHP to load the initial 20 products as part of the initial page markup, while all searches/filters after that are loaded by AJAX (instantsearch). Is this something that instantsearch and your infinitescroll demo code could support?

Hash-based URLs are not in the href of any tags on the page, so there is no chance Google will reach these anyway.

Right, that’s something we’d need to investigate for future evolutions of InstantSearch.js, and SEO friendliness.

I think that as long as you had the other points addressed, this one should be relatively easy. The refinement checkbox labels for ‘brand’ for example could actually be an a tag with an href to ?brand=x or /brand/x. You would disable default behavior on the a tag so that clicking it just refines the results rather than following the href. Only thing is you would need to load the refinement list in the html body as well (similar to PHP approach above). This would have the added benefit of allowing a user to right-click the refinement label and “open in new tab”.

Would be great to hear your thoughts.


#4

To give an update on where I’m at so far…

It doesn’t load any product-specific content in the base page markup - initial products are loaded into the DOM on load via Javascript.

I am now conducting the search and presenting initial results via PHP. I’ve had to put together code to build up the filters and page number based on the query string and URL rewrites (more later).

I now have product markup in the base HTML of the page, making it Google-friendly.

I’m still using instantsearch to populate the filtering options on the page - a couple of sliders and a couple of refinement lists. Once the page is loaded, all product filtering, search and infinitescroll is handled by instantsearch.

The downside of this is that I’m now conducting the same search twice - once in PHP and then once via instantsearch. This also means I have to manage two sets of code that do basically the same thing. Not ideal but I’m not sure how else to manage it.

It uses hash-based url parameters rather than query strings. Hash-based is OK by Google but is considered “less optimal” than query strings.

Switched this to query strings by changing the useHash setting to false.

Even better would be user-friendly URLs for faceted results like /search/brand/Sony but I think this would be down to individual implementations and URL-rewrites.

I have implemented URL rewrites such that:

/brand/Sony -> ?brand=Sony
/brand/Sony/2 -> ?brand=Sony&page=2

From there I build up filters in PHP and apply them to both the PHP search and the instantsearch (by outputting Javascript from PHP). I’ve applied the page number to instantsearch via {searchParameters: {page: x}} which seems to be an undocumented feature.

Brand is really the only facet I have implemented into URL rewrites as its the only facet I see SEO value in.

Hash-based URLs are not in the href of any tags on the page, so there is no chance Google will reach these anyway.

Within PHP I have output a list of all brands in the place where my instantsearch brand refinement list loads. The output is something like: <label for="Sony"><input id="Sony" type="checkbox" value="Sony" /> <a href="/brand/Sony">Sony</a>. I can’t use these to drive instantsearch at the moment, but at least they are present in the base html of the page so that Google can definitely see them. This section is replaced by the instantsearch refinement list upon load. I may look into getting them to work with instantsearch so that I don’t have to replace it upon load - I’m not sure if this is possible though.

I have updated options.templates.item in my brand refinement list so that the markup becomes similar to above - an a tag within the label.

What I’ve discovered is that it isn’t possible to have a regular a tag within a label act like a label. Clicking the a tag still goes to the href. You can disable the default a behaviour in Javascript, which stops the click from visiting the href, but still doesn’t make it act like a label. The best way I could find to do this is via CSS, by applying pointer-events: none; to all the relevant a tags. I’m not sure how Google will treat this but hope the fact that there are a tags with an href attribute mean that Google will crawl them.

A minor downside to disabling the a tags via CSS is that users can’t right click the a tag and open in new window/tab in order to get to /brand/Sony as I had hoped. I can live with this. I will likely also put a brand listing somewhere else on the site or in a sitemap.

There is no change in the URL to update which (virtual) page you are on each time the infinite scroll loads more products.

Not implemented yet - hopefully the github issue results in an update here. It’s probably not that difficult but I think implementing it may do more harm than good if the next point is not also implemented (see below).

Whilst you can manually update the p parameter in the query string to begin the products at a specific page, infinite scrolling only then works in a downward direction, not an upward direction, so it’s not possible to get to products on previous pages.

Not implemented yet - hopefully the github issue results in an update here. I could attempt it but to be honest I get a bit lost when trying to understand how the infinitescroll code works.

If the p parameter is updated in the URL as you scroll, but upward infinitescroll is not implemented, you will run into issues when the user scrolls down and then clicks the refresh button on the browser. On the refreshed page, the products will begin as at the point the user had scrolled to before refreshing, with no way to get to the previous products.

It’s also important because if users find my site via a Google result linking to /brand/Sony/2 (i.e. a page number > 0 is specified), they won’t be able to get to any of the previous products either.

There is no pagination in the < head > using rel=next and rel=prev

This is now output via PHP. It adds /[next page number] & /[prev page number] to the existing path (e.g. /2 or /brand/Sony/2). I.e. it points to URLs that are handled by the URL re-writes as described above.

That’s about it. I would really appreciate the experts thoughts on this approach! Would you do anything differently? Can you recommend any solutions to the challenges I still face?

Cheers
Baz


#5

Are there any updates to this? In my opinion this is the by far biggest drawback of Instantsearch today. I was planning to use Algolia for multiple sites, cause I love the UX, and how simple it is to get started. But the sites I’m planning all depends on SEO for traffic, so the current implementation is simply not good enough. Do you have any plans in the future to address these issues, and if yes, when do you think they’ll be done?


#6

Would the same issues apply if you had just been using normal pagination, and not infinite scroll?


#7

Most issues would still apply even without infinite scroll. With standard instantsearch all result information is loaded via js and not as part of the original page source.

If you didn’t have infinite scroll, you would still need to do all the steps above, except for putting pagination in the < head > - however this is probably good practice anyway.

I believe I have overcome all SEO issues now, using the techniques described above. My site is yet to launch, so it remains to be seen how Google will interpret it.

As far as I can tell, the two github issues haven’t been resolved by the team as yet. These don’t really affect SEO, but they do affect user experience.

Thanks for posting, it’s good to see others showing interest in this. As you can see I wasn’t getting much out of the Algolia team in response to my last couple of posts :slight_smile:


#8

Hello Baz,
Not sure if you are a developer or a store owner with TONS of coding knowledge, but we have implemented Algolia in our LIVE store and we are also concerned over the SEO impact as we have seen a slight decline in traffic. This decline may or may not have anything to do with Algolia Search, but better safe than sorry right? In this niche of ours, SEO is King and generic search results is all we can count on for traffic. If you are a developer we would like to know if you would be interested in implementing the SEO fixes for our store. We are not using infinite scroll at the moment and in our ignorance we figured this would eliminate any SEO pitfalls. But as I can see from reading your post, we weren’t even close. You will be paid for your services of course. :slight_smile: Please let us know if this is something you would be interested in doing. Thanks. Quick question: Does having conical URL’s implemented site-wide alleviate any of the SEO woes of instantsearch?


#9

majesty1418 - I’ll send you a PM.


#10

If you are looking for the best SEO, we highly recommend submitting a sitemap to Google that you know has all of your content in it and then making sure you have proper server-side markup for those pages. That is more robust than making too many assumptions about Google’s crawler and whether it can find your links and handle your page transitions. We see good out-of-the-box SEO with sites like searchstone.io but every mostly JS site is subject to being handled differently until there are more standards.

It’s good to keep in mind that instantsearch.js (or the React version) is client-side only. There is no server component that could bake the markup into your page. What @baz did is the way to do that if you need to.

Here’s info about generating sitemaps for Google. The third-party tools make it pretty easy.


All The Dresses - An aggregator for women's clothing hire
#11

Hello dzello, thanks for chiming in. I understand what you’re saying, but Algolia does affect how your site gets crawled. For instance, selecting “No” under the Algolia settings for template directives was blocking many CMS pages from being crawled and was generating Crawl Errors on Google Search Console. That has been since rectified, but we are new to Magento and we are slightly worried about how Algolia has affected our site.


#12

@majesty1418 Ah ok that’s good to know, thanks for sharing back.


#13

Hey everyone, thanks for all the feedback in this thread! Since InstantSearch.js is a client-side library, the crawling/indexing issues are similar to those of Single Page Apps or Progressive Web Apps. Here is a good resource that shows the dos and don’ts in terms of SEO. Official Google Webmaster Central Blog: Building Indexable Progressive Web Apps

It’s pretty much what you did @baz! The idea is to provide results from the first payload for content you want to be indexed, such a category pages. We are currently working on an easy way to achieve that using React-InstantSearch, and will be releasing an example soon. The idea is to avoid any duplication in terms of code or of request to Algolia.

As it’s already been mentioned in this thread, sitemaps are crucial for SEO, especially with dynamic content. We worked on a tool that lets you generate a sitemap based on your Algolia index, you can find it there: GitHub - algolia/algolia-sitemap: a node library allowing you to generate sitemaps from an Algolia index.

Use this tool to generate sitemaps that contains:

  • Category pages, or all “search-engine landing pages”, any place in your website you would like your users to find from a search engine
  • All your products

#14

Hi Michael,

Thanks for the update. Looking forward to seeing the work you’re doing with React instantsearch.

If you haven’t seen my post on the other forum, my implementation can now be seen at http://allthedressses.com.au

Within 2-3 days of making the site public I was already getting about 100 visitors a day from Google. Mind you, the majority of landing pages were product detail pages, which don’t use instantsearch. Google has been slow to re-index, so any adjustments I’ve made since haven’t had a chance to make an impact.

By using the Fetch as Google function within Google Search Console, I can see that Googlebot is seeing the page post-instantsearch processing, at least in terms of the screenshot it provides you.

That’s no guarantee that they’re indexing/valuing that content but it is a positive sign. I think what one could do is put together a test, whereby content pre-instantsearch rendering contains a particular unusual word or phrase, and post rendering has a different phrase. Then after that page is indexed you can run a Google search that you know will return that page first, but then also add each of the phrases you used to the search query to see if Google has indexed/valued that phrase.

One of the things I would really like to see added to is a linkage between the server-side APIs (e.g. PHP) and instantsearch. I want to be able to run the starting query in PHP, include the results in the source HTML, but then pass all the starting values to instantsearch. Instantsearch is given a starting state that is the same as if it had called the Algolia service upon load, except it hasn’t had to do that because we’ve given it the data through the source HTML/Javascript. This means less queries to Algolia overall as well as faster initial page load times.

Cheers
Baz


#15

Thanks @baz for sharing that! It’s great that the SEO strategy starts to pay off. :slight_smile:

The test you describe is really interesting. Actually, it has already been done:

The conclusion to those is: Yes, Google indexes dynamic content, but not as predictably as it does with static content.

You also mentioned the Google Search Console and the “Fetch as Google feature”. While it shows what the Googlebot will see, it doesn’t mean it will index it. However, if you can’t see your content rendered, then it’s a strong signal that it won’t be indexed. I suggest everybody to go and check their websites with it! You can find it there.

Also thanks for the feedback on the server-side / instantsearch linkage. That’s what we are trying to achieve with React InstantSearch, and we’ll try to push it to other flavors of InstantSearch when we are satisfied with the API.


#16

For anyone that was following this thread…

I’ve summarised the development of our website, All The Dresses, an aggregator for dress hire in Australia, on Medium and it has been published on Algolia Stories, here:

Hope it provides some value to the Algolia community. Cheers!


#17

@michael.sokol

The idea is to provide results from the first payload for content you want to be indexed, such a category pages. We are currently working on an easy way to achieve that using React-InstantSearch2, and will be releasing an example soon. The idea is to avoid any duplication in terms of code or of request to Algolia.

How did you go with this? And will this functionality make its way back into instantsearch.js?


#18

Has this been implemented for React InstantSearch yet?