engineering

Deep Dive into Peerlist's Skill Recommendation Algorithm

Deep Dive into Peerlist's Skill Recommendation Algorithm

How Peerlist recommends relevant skills to add to your profile using collaborative filtering techniques.

Nakshatra Saxena

Nakshatra Saxena

Sep 12, 2023 7 min read

What are we trying to solve?

The skills added to your Peerlist profile are some of the most important data points that you can use to portray your work, get recommendations for jobs and get noticed by recruiters, etc. Skill matching also plays a huge role in recommending your profile to jobs posted on Peerlist. This is why curating your skills section is the first thing that you should do to make your profile attractive to prospective employers and every visitor.

As important as it is, adding skills while creating a profile on Peerlist is a tedious task (or at least it was before). There were a few problems that should be addressed here -

  1. Difficulty in recalling skills - most people (including myself) suddenly forget what they are good at when asked about it specifically. There are so many technologies, languages, and frameworks that I'm good at but can't seem to recall when asked to fill them out in a form.
  2. Search friction - the user had to search through an endless list of skills to find the particular one that they needed to add and select. Text searching through these many records and adding one skill at a time is quite difficult and slow.

Take a look at how users used to add skills on Peerlist -

Adding skills on Peerlist profile while onboarding
How new users added skills to their profiles while onboarding.

Analyzing user behaviour also supported the top 2 arguments.

  • When we talked to a few users, they mentioned they had to switch tabs to go to LinkedIn or some other platform to fill out their skillsets.
  • The time taken to fill out the onboarding form was also pretty high with most of the time being spent on searching and adding skillsets (Other inputs are just basic details and are quite straightforward). Also, Peerlist doesn't allow users to create a profile without adding at least 3 skills. (How else do you know the profile is credible?)

So we started with 2 goals in mind -

  1. To reduce the time taken by a user to get onboarded to Peerlist and to make the experience as smooth as possible.
  2. To recommend relevant skillsets to the user to add to their profile and eliminate the search-to-add friction.

The Solution

To solve the problems listed above, we had to find a way to recommend skills to a user that they will surely add to their profile. As the skills are supposed to be added while onboarding, we had no relevant data points from the user. However, we already had data from existing Peerlist users about which skills they've added to their profiles.

Let's take an example - if Michelle registers on Peerlist and has added one of her skills as JavaScript, we know from the skill data of already existing users that there's a high probability of her adding ReactJS or NodeJS to the skills (as these are related technologies).

The solution assumes that if a lot of users have added JavaScript and ReactJS together then we can safely recommend one of them to a user if they've added the other one.

Seems like a straightforward problem to solve. But how do you achieve this programmatically?

Co-Occurrence Distribution

Co-occurrence distribution, in simple terms, is a way to represent the frequency of the presence of two entities together.

Let's take a step back and start from the basics - if you write software (which I'm assuming you do since you're so far down the blog) you must've created frequency maps before. The good old "how many times did a particular string appear in a sentence" kind of a problem. To solve this we just create a map storing the occurrence count of every keyword. For example the frequency map of this sentence -

'How much wood would a woodchuck chuck if a woodchuck could chuck wood?'

Should be like so -

{
    'how': 1,
    'much': 1
    'wood': 2,
    'would': 1,
    'a': 2,
    'woodchuck': 2,
    'chuck': 2,
    'if': 1,
    'could': 1,
}

Now what if the problem was "how many times did a particular string x appear along with another string y in a sentence?". Now you need a co-occurrence distribution. To solve this, we just create a 2 dimensional map.

For this example - let's consider our actual problem at hand. To recommend skills we start with every user's skillsets represented in an array -

const skills = [
    { Michelle: ['JavaScript', 'ReactJS', 'NextJS', 'TypeScript'] },
    { Aman: ['Python', 'Anaconda', 'Flask', 'Django'] },
    { Peter: ['Figma', 'Sketch'] },
    { Felix: ['TypeScript', 'NextJS', 'Redux', 'NodeJS', 'JavaScript'] },
]

To solve this - we will create a co-occurrence matrix. Imagine an n x n matrix where both the rows and columns denote the skills. It'll look something like this initially -

An nxn matrix of skillsets
Co-occurrence distribution - n x n matrix of skillsets.

We iterate through the skill-sets and increment a counter at the position [x][y] if skill x and y appear together in a user's skill-list. The code would something like this on a high level (I wrote this in JavaScript) -

for (const userSkills of skills) {
	// Iterate through selected skills for this user
	for (let i = 0; i < userSkills.length; i += 1) {
		for (let j = i + 1; j < userSkills.length; j += 1) {
			const skillA = userSkills[i];
			const skillB = userSkills[j];

			// Increment co-occurrence count for both skill pairs
			coOccurrenceMatrix[skillA] = coOccurrenceMatrix[skillA] || {};
			coOccurrenceMatrix[skillB] = coOccurrenceMatrix[skillB] || {};

			coOccurrenceMatrix[skillA][skillB] =
				(coOccurrenceMatrix[skillA][skillB] || 0) + 1;
			coOccurrenceMatrix[skillB][skillA] =
				(coOccurrenceMatrix[skillB][skillA] || 0) + 1;
		}
	}
}

After this step the co-occ matrix should look like this -

A filled nxn matrix of skillsets with frequency counts
Co-occurrence distribution (filled with frequency counts) - n x n matrix of skillsets.

Now, giving related recommendations basis this co-occurrence matrix is very easy. Let's see how we would recommend skills if a user has added NextJS as a skill -

Recommending skills for NextJS using the co-occurrence matrix
Recommending skills related to NextJS using the co-occurrence distribution (filled with frequency counts) - n x n matrix of skillsets.

We'll just have to sort the NextJS row on the basis of occurrence count and recommend skills in that order. For this example, we would recommend JavaScript, TypeScript, ReactJS, NodeJS and Redux as relevant related skills for NextJS. Which checks out with the data.

We cache this matrix for recommending skills to users, regenerating it every couple weeks.

The final implementation on Peerlist looks something like this -

Final skill recommendation feature on Peerlist
Final implementation of the skill recommendation feature on Peerlist.

It's much easier now to add skills relevant to your tech stack using the suggested skills tab. Note that the system is only showing skills related to the JavaScript tech stack out of the thousands of others using the recommendation system. As a fallback if the user cannot see the skills they want to add then they can always search for it.

Some performance metrics -

  1. The matrix generation (happens every couple weeks) currently takes about ~1.12s which was brought down from 45s using a few optimization techniques.
  2. The recommendation API has a latency of about ~300ms (caching the matrix helped).

Note - It is generally advised to apply cosine similarity on top of the co-occurrence matrix for normalizing, but the co-occurrence matrix worked well on its own for our usecase so we skipped it.

Try out the recommendation feature in your user settings. And if you aren't on Peerlist yet then you're in for a ride - signup and try it out while onboarding.