2016 Algolia community 🎁 gift: Yarn package search

At the end of every year, Algolia offers a gift to the community. This year it was Yarn :package: package search. It allows you to search for JavaScript packages with an :zap: instant-search like experience.

Yarn package search is available right now at https://yarnpkg.com/en/packages.

This post describes the gift, talks about npm and yarn, details the inner workings and proposes future enhancements.

Happy :nerd_face: reading. If you have any questions, ask right here by replying.

:computer: Demo

This is an instant-search experience enhanced with metadata like the number of downloads, license type, version, owner and last activity.

:cat2: Yarn

Yarn is a JavaScript dependency manager that is ultra fast, mega secure and super reliable. It offers an alternative to npm.

At Algolia we are relying on it to manage our dependencies, and we are moving every project using npm to yarn.

:package: npm

We are huge fans of npm. It’s an essential tool that forever changed and shaped the JavaScript community. It played a major role in transforming JavaScript from a toy to an enterprise ready language.

Publishing packages has never been so easy. As of January 2017 450K+ packages are available in the npm registry.

Developers looking for packages are often finding themselves frustrated by the sheer number of options available. To help them (and us) find and decide which package to use, we wanted to provide an alternative to the npm search.

We wanted this search to be: fast, relevant and enriched with meaningful metadata inside the search results.

:gift: Our gift

In December 2016 we gathered ideas from developers and designers and decided to do a community gift around JavaScript package search.

We got something running in just one week, built with Algolia’s instantsearch.js for the front-end and algolia/npm-search for the npm registry to Algolia replication.

On December 22nd, we submitted a pull request to yarn/website. After one week of review with the yarn team, the pull request was merged and the package search was made public at https://yarnpkg.com/search.

:scroll: Package format

For every NPM package, we create a record in the Algolia index with the following schema:

{
  "name": "babel-core",
  "downloadsLast30Days": 4679830,
  "downloadsRatio": 0.08367903104553133,
  "humanDownloadsLast30Days": "4.7m",
  "popular": true,
  "version": "6.21.0",
  "description": "Babel compiler core.",
  "githubRepo": {
    "user": "babel",
    "project": "babel",
    "path": "/tree/master/packages/babel-core"
  },
  "owner": {
    "name": "babel",
    "avatar": "https://github.com/babel.png",
    "link": "https://github.com/babel"
  },
  "deprecated": false,
  "homepage": "https://babeljs.io/",
  "license": "MIT",
  "keywords": ["6to5", "babel", "classes", "const", "es6", "harmony", "let", "modules", "transpile", "transpiler", "var"],
  "created": 1424009748555,
  "modified": 1483473493821,
  "lastPublisher": {
    "name": "hzoo",
    "email": "hi@henryzoo.com",
    "avatar": "https://gravatar.com/avatar/851fb4fa7ca479bce1ae0cdf80d6e042",
    "link": "https://www.npmjs.com/~hzoo"
  },
  "owners": [
    {
      "name": "amasad",
      "email": "amjad.masad@gmail.com",
      "avatar": "https://gravatar.com/avatar/03637ef1a5121222c8db0ed48c34e124",
      "link": "https://www.npmjs.com/~amasad"
    },
    [...]
  ],
  "lastCrawl": "2017-01-03T19:58:19.674Z",
  "popularName": "babel-core",
  "objectID": "babel-core"
}

:gear: Ranking

Searchable Attributes

We’re restricting the search to use a subset of the attributes only:

  • popularName
  • name
  • description
  • keywords
  • owner.name
  • owners.name

Prefix Search

Algolia provides default prefix search capabilities (matching words with only the beginning). This is disabled for the keywords, owner.name and owners.name attributes.

Typo-tolerance

Algolia provides default typo-tolerance. Typo-tolerance is disabled for the keywords attribute.

Exact Boosting

Using the optionalFacetFilters feature of Algolia, we’re boosting exact matches on the name of a package to always be on top of the results.

Number of downloads

For each package, we use the number of downloads in the last 30 days to set the customRanking, which is used to sort the results.

To see this in action, try searching for babel. This will match both babel-core and babel-messages. From a textual relevance point of view, those 2 packages are exactly matching in the same way (the world “babel”). In such case, Algolia will rely on the customRanking setting and therefore put the package with the highest number of downloads in the past 30 days first.

Popular packages

Some packages will be considered as popular if they are downloaded very frequently. We currently consider packages with greater than 0.005% of the total number of downloads on the whole registry to be popular. A popular flag is used to boost popular records over non-popular ones.

If you want to learn more about how Algolia’s ranking algorithm works, you can read this blog post.

:+1: Thanks

This gift would not have been possible without you, the community. We would also like to thank:

:thinking: What’s next?

As part of making the search experience with yarn even better, we plan to:

  • Provide a package result page with even more metadata to help you dig into results and details.
  • Provide a yarn search command line tool.

:calendar: Previous community gifts

In 2014 we offered Awesome Autocomplete for GitHub, a browser extension to add instant search capabilities to GitHub search box.

In 2015 we released DocSearch, the easiest way to add search to your documentation. It’s actually used on Yarn’s documentation and 250+ other websites.

15 Likes

This gift wasn’t the end of our search experience with Yarn. From February I started working at Algolia as an intern, and my main task to begin with is getting to know our libraries first-hand by using them in a big project.

In the end these changes were enough to have a very significant input on how the search performed that we had a lot more people searching, almost 10x improvement!

If any of you have opinions about a detail page, I’d love to hear your thoughts!

7 Likes

What also is made is the detail page. It has the following sources:

  • npm data replicated in Algolia
  • all package.json things
  • README in markdown
  • CHANGELOG filename
  • when indexing a bunch of files are requested from GitHub
  • the one that resolves is remembered
  • GitHub API
  • activity graph
  • stargazers
  • unpkg’s JSON response
  • file listings
  • done by Daniel Lo Nigro

It looks like this:

Feel free to suggest things, or make pull requests on yarnpkg/website

5 Likes