fbpx Skip to main content

How to personalize search results with Elasticsearch

Michal Komoch

Only a few years ago, a simple search engine was enough — that is a search criteria based on a relational database was enough. Yet, there are no universal and irreplaceable solutions. Thanks to data growth, new search tools have appeared. An undeniable advantage is their ability to influence search results and sorting. I’d like to provide a few examples of how to personalize search results with Elasticsearch.

Searching for data is and will be an indispensable part of almost every web application. In the past few years, the approach to data searches has changed. With the development of web applications and the moving of large sets of data to online services, the amount of data has increased rapidly. Existing search tools and solutions have become insufficient. A good, old search engine based on MySQL is no longer fast and flexible enough.

 

What can those new search tools do?

A search engine used for searching through job applications would be a great example here. Let’s say that we have a web application with multiple companies. Each company stores user job applications. Almost everyone knows how hard it is to get a good employee that fits our requirements.

Our search engine needs to have multiple search criteria. This sounds quite easy, but what if every company wants to search for applicants in its own, personalized manner? There are many ways to do this. One of them is to add “weights” to certain search parameters. Going further, we can attach these weights to a certain company and give their company admins permissions to manage them.

Elasticsearch is one of the search tools that help you create a search engine with personalized weights. We can now create an advanced search engine in an easy way. What’s more, we’ll have the possibility to affect the results. Thanks to Elasticsearch, we not only get an advanced search engine. We ensure the scalability and easier development of our application in case new functionalities are added in the future.

The examples here are based on the current version — Elasticsearch 6.8.11 and is an update to a previous guide.

I’d like to base on two types of variables attached to user applications:

Let’s say we’ve got two statuses of employment types:

1 – Permanent job
2 – Temporary job

and two statuses of work types:

1 – Full time
2 – Part time

I filled Elasticsearch with some sample data:

RECORD ONE (candidate1):

Employment type = 1
– Work type = 1

RECORD TWO (candidate2):

Employment type = 2
– Work type = 1

RECORD THREE (candidate3):

Employment type = 1
– Work type = 2

 

Code introduction

A basic search query and its result without weights looks just like this:

					

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "profile.work_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "profile.employment_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

					

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 2.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 1.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 1.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      }
    ]
  }
}

Let’s remember the weights:

1 → 2 (candidate1 weight)
2 → 1 (candidate2 weight)
3 → 1 (candidate3 weight)

 

Example 1: Personalize search weights

Project weights configuration:

					

similarity_tresholds:
    work_types.id:                     1.0
    employment_types.id:               2.0
    example.id:                        10.0

I would like to show an example based on Symfony 3.4 and FosElasticaBundle 5.1.1, but feel free to try with different stack (almost each major programming language has a great Elasticsearch library which supports mapping, queries and more).  

In this case, weights are defined globally in one of the parameter files in Symfony. But nothing stands in the way of moving this configuration to a database and attaching them to certain companies.

And how does it look in PHP?

We passed declared weights in the constructor. Then, in the class responsible for the manipulation of weights, we have a method for checking if we have defined weights that we can attach to certain search criteria.

At the end, we create a part of a search query with suitable weight to Elasticsearch query.

					

private array $filtersToBoost;
public function __construct(array $filtersToBoost)
{
    $this->filtersToBoost = $filtersToBoost;
}
private function applyBoostIfExists(string $filterName, BoolQuery $boolQuery):  BoolQuery
{
    if (array_key_exists($filterName, $this->filtersToBoost)) {
        $boolQuery->setBoost($this->filtersToBoost[$filterName]);
    }
 
    return $boolQuery;
}

 

In the results, we get a simple Elasticsearch query where:

– work type parameter has a weight = 1

– employment type parameter has a weight = 2.

It means that the employment type parameter is 2 times more important than the work type parameter. 

In practice, it looks just like this:

					

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "boost": 1, 
               "must": [
                 {"terms": {
                   "profile.work_types.id": [
                     1
                   ]
                 }}
               ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "bool": {
                "boost": 2,
                "must": [
                  {
                    "terms": {
                      "profile.employment_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

					

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 3.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 3.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 1.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      }
    ]
  }
}

In result we get:

1 → score:3 (candidate1) user with employment type = 1 – score boosted
2 → score:2 (candidate3) user with employment type = 1 – score boosted
3 → score:1 (candidate2)

As we can see, candidates with employment type = 1 are scored higher. This example shows how can we manage search weights in a simple way.

 

Example 2: Personalize results with score functions

Search results in Elasticsearch are sorted by “score” value. If the personalization of weights isn’t good enough or doesn’t fit our needs, we have the option to multiply the score value of a record by the weight parameter and boost_mode.

Let’s say we’d like to see the records with employment type = 1 have their scores increased 4-times.

The Elasticsearch query would look like this:

					

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "boost": 2,
                "must": [
                  {
                    "terms": {
                      "profile.work_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "function_score": {
                "query": {
                  "terms": {
                    "profile.employment_types.id": [
                      1
                    ]
                  }
                },
                "functions": [
                  {
                    "filter": {
                      "terms": {
                        "profile.employment_types.id": [
                          1
                        ]
                      }
                    },
                    "weight": 4
                  }
                ],
                "boost_mode": "multiply"
              }
            }
          }
        }
      ]
    }
  }
}

					

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 6.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 6.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 4.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      }
    ]
  }
}

We get:
1 → score: 6 (candidate1) – employment type = 1 – score boosted even more
2 → score: 4 (candidate3) – employment type = 1 – score boosted even more
3 → score: 2 (candidate2)

 

Example 3: Score script

An extension of the below functionality is an inline script – called painless scripting language. With that solution, we can personalize results based on the record we’ve stored in Elasticsearch.

If we want to have candidates who are looking for permanent job (employment type = 1) on the top of the list, then we can use the score script to boost those records 4 times. This is a different way to obtain similar results like we got in previous example, but here we have much more flexibility in manipulating score results for specific documents (records).

An example of an Elasticsearch query:

					

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "boost": 2,
                "must": [
                  {
                    "terms": {
                      "profile.work_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "function_score": {
                "query": {
                  "terms": {
                    "profile.employment_types.id": [
                      1
                    ]
                  }
                },
                "functions": [
                  {
                    "script_score": {
                      "script": {
                        "params": {
                          "multiplier":  4
                        },
                        "source": "def et = doc['profile.employment_types.id'].value; if (et == 1) {return _score * params.multiplier} else {return _score}"
                      }
                    }
                  }
                ],
                "boost_mode": "multiply"
              }
            }
          }
        }
      ]
    }
  }
}

					

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 6.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 6.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 4.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      }
    ]
  }
}

In results we get:
1 → score: 6 (candidate1) – employment type = 1 – score boosted
2 → score: 4 (candidate3) – employment type = 1 – score boosted
3 → score: 2 (candidate2)

 

Conclusion

To personalize search with Elasticsearch is pretty simple. I’d like to show how developers can create advanced search engines easily, based on customer needs. Those above examples are a great starting point for more complicated conditions.