Michal Komoch, Author at Espeo Software

Only a few years ago, a simple search engine was enough — that is a search criteria based on a relational database was enough. Yet, there are no universal and irreplaceable solutions. Thanks to data growth, new search tools have appeared. An undeniable advantage is their ability to influence search results and sorting. I’d like to provide a few examples of how to personalize search results with Elasticsearch.

Searching for data is and will be an indispensable part of almost every web application. In the past few years, the approach to data searches has changed. With the development of web applications and the moving of large sets of data to online services, the amount of data has increased rapidly. Existing search tools and solutions have become insufficient. A good, old search engine based on MySQL is no longer fast and flexible enough.

How to personalize search results in Elasticsearch

Table of contents:

What can those new search tools do?
Code introduction
Example 1: Personalize search weights
Example 2: Personalize results with score functions
Example 3: Score script
Conclusion

What can those new search tools do?

A search engine used for searching through job applications would be a great example here. Let’s say that we have a web application with multiple companies. Each company stores user job applications. Almost everyone knows how hard it is to get a good employee that fits our requirements.
Our search engine needs to have multiple search criteria. This sounds quite easy, but what if every company wants to search for applicants in its own, personalized manner? There are many ways to do this. One of them is to add “weights” to certain search parameters. Going further, we can attach these weights to a certain company and give their company admins permissions to manage them.
Elasticsearch is one of the search tools that help you create a search engine with personalized weights. We can now create an advanced search engine in an easy way. What’s more, we’ll have the possibility to affect the results. Thanks to Elasticsearch, we not only get an advanced search engine. We ensure the scalability and easier development of our application in case new functionalities are added in the future.
The examples here are based on the current version — Elasticsearch 6.8.11 and is an update to a previous guide.
I’d like to base on two types of variables attached to user applications:
Let’s say we’ve got two statuses of employment types:
1 – Permanent job
2 – Temporary job
and two statuses of work types:
1 – Full time
2 – Part time
I filled Elasticsearch with some sample data:
RECORD ONE (candidate1):
– Employment type = 1
– Work type = 1
RECORD TWO (candidate2):
– Employment type = 2
– Work type = 1
RECORD THREE (candidate3):
– Employment type = 1
– Work type = 2

Code introduction

A basic search query and its result without weights looks just like this:
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "profile.work_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "profile.employment_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

[/dm_code_snippet]
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 2.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 1.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 1.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      }
    ]
  }
}

[/dm_code_snippet]
Let’s remember the weights:
1 → 2 (candidate1 weight)
2 → 1 (candidate2 weight)
3 → 1 (candidate3 weight)

Example 1: Personalize search weights

Project weights configuration:
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

similarity_tresholds:
    work_types.id:                     1.0
    employment_types.id:               2.0
    example.id:                        10.0

[/dm_code_snippet]
I would like to show an example based on Symfony 3.4 and FosElasticaBundle 5.1.1, but feel free to try with different stack (almost each major programming language has a great Elasticsearch library which supports mapping, queries and more).
In this case, weights are defined globally in one of the parameter files in Symfony. But nothing stands in the way of moving this configuration to a database and attaching them to certain companies.
And how does it look in PHP?
We passed declared weights in the constructor. Then, in the class responsible for the manipulation of weights, we have a method for checking if we have defined weights that we can attach to certain search criteria.
At the end, we create a part of a search query with suitable weight to Elasticsearch query.
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

private array $filtersToBoost;
public function __construct(array $filtersToBoost)
{
    $this->filtersToBoost = $filtersToBoost;
}
private function applyBoostIfExists(string $filterName, BoolQuery $boolQuery):  BoolQuery
{
    if (array_key_exists($filterName, $this->filtersToBoost)) {
        $boolQuery->setBoost($this->filtersToBoost[$filterName]);
    }
    return $boolQuery;
}

[/dm_code_snippet]

In the results, we get a simple Elasticsearch query where:
– work type parameter has a weight = 1
– employment type parameter has a weight = 2.
It means that the employment type parameter is 2 times more important than the work type parameter.
In practice, it looks just like this:
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "boost": 1,
               "must": [
                 {"terms": {
                   "profile.work_types.id": [
                     1
                   ]
                 }}
               ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "bool": {
                "boost": 2,
                "must": [
                  {
                    "terms": {
                      "profile.employment_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 3.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 3.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 1.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      }
    ]
  }
}

[/dm_code_snippet]
In result we get:
1 → score:3 (candidate1) user with employment type = 1 – score boosted
2 → score:2 (candidate3) user with employment type = 1 – score boosted
3 → score:1 (candidate2)
As we can see, candidates with employment type = 1 are scored higher. This example shows how can we manage search weights in a simple way.

Example 2: Personalize results with score functions

Search results in Elasticsearch are sorted by “score” value. If the personalization of weights isn’t good enough or doesn’t fit our needs, we have the option to multiply the score value of a record by the weight parameter and boost_mode.
Let’s say we’d like to see the records with employment type = 1 have their scores increased 4-times.
The Elasticsearch query would look like this:
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "boost": 2,
                "must": [
                  {
                    "terms": {
                      "profile.work_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "function_score": {
                "query": {
                  "terms": {
                    "profile.employment_types.id": [
                      1
                    ]
                  }
                },
                "functions": [
                  {
                    "filter": {
                      "terms": {
                        "profile.employment_types.id": [
                          1
                        ]
                      }
                    },
                    "weight": 4
                  }
                ],
                "boost_mode": "multiply"
              }
            }
          }
        }
      ]
    }
  }
}

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 6.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 6.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 4.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      }
    ]
  }
}

[/dm_code_snippet]
We get:
1 → score: 6 (candidate1) – employment type = 1 – score boosted even more
2 → score: 4 (candidate3) – employment type = 1 – score boosted even more
3 → score: 2 (candidate2)

Example 3: Score script

An extension of the below functionality is an inline script – called painless scripting language. With that solution, we can personalize results based on the record we’ve stored in Elasticsearch.
If we want to have candidates who are looking for permanent job (employment type = 1) on the top of the list, then we can use the score script to boost those records 4 times. This is a different way to obtain similar results like we got in previous example, but here we have much more flexibility in manipulating score results for specific documents (records).
An example of an Elasticsearch query:
[dm_code_snippet background=”yes” background-mobile=”yes” bg-color=”#eeeeee” theme=”dark” language=”php” wrapped=”no” copy-text=”Copy Code” copy-confirmed=”Copied”]

GET gs_application/application/_search
{
  "_source": [
    "fullname"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "profile.work_types",
            "query": {
              "bool": {
                "boost": 2,
                "must": [
                  {
                    "terms": {
                      "profile.work_types.id": [
                        1
                      ]
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "profile.employment_types",
            "query": {
              "function_score": {
                "query": {
                  "terms": {
                    "profile.employment_types.id": [
                      1
                    ]
                  }
                },
                "functions": [
                  {
                    "script_score": {
                      "script": {
                        "params": {
                          "multiplier":  4
                        },
                        "source": "def et = doc['profile.employment_types.id'].value; if (et == 1) {return _score * params.multiplier} else {return _score}"
                      }
                    }
                  }
                ],
                "boost_mode": "multiply"
              }
            }
          }
        }
      ]
    }
  }
}

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 6.0,
    "hits" : [
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "78",
        "_score" : 6.0,
        "_source" : {
          "fullname" : "surname1 candidate1"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "80",
        "_score" : 4.0,
        "_source" : {
          "fullname" : "surname3 candidate3"
        }
      },
      {
        "_index" : "gs_application",
        "_type" : "application",
        "_id" : "79",
        "_score" : 2.0,
        "_source" : {
          "fullname" : "surname2 candidate2"
        }
      }
    ]
  }
}

[/dm_code_snippet]
In results we get:
1 → score: 6 (candidate1) – employment type = 1 – score boosted
2 → score: 4 (candidate3) – employment type = 1 – score boosted
3 → score: 2 (candidate2)

Conclusion

To personalize search with Elasticsearch is pretty simple. I’d like to show how developers can create advanced search engines easily, based on customer needs. Those above examples are a great starting point for more complicated conditions.

A few months ago me and my team were faced with a challenge: to provide an advanced search engine with many simple (and more complicated) criteria and an ability to use a full text search mechanism. In addition, we knew that our client demanded high efficiency and scalability. I’d like to focus on full text search and other advantages of the Elasticsearch solution.

Data, data, data – their amount seems to grow, month after month. We didn’t really have that ‘problem’ a few years ago. Formerly, web applications didn’t have to process that much data at once. Developers didn’t have to pay attention to their application’s efficiency. Then, a vast majority of businesses moved their solutions to the web. They were suddenly faced with the problem of processing big amounts of data. We – the developers – needed to take a different approach.
Most of the problems we face during development can be solved by using one of the main databases. These include MySQL, PostgreSQL or Oracle, or implementing cache systems or making use of non-relational databases like MongoDB.

What if this isn’t enough? And our customer needs something more powerful, a system with a highly efficient data search engine – and scalability?
This is where ElasticSearch comes in. It’s a distributed, real time search engine, based on the Apache Lucene engine.
It gives us:

easy full text search
advanced search queries
many helpful built-in solutions
manageable data result scoring
shorter development time
scalability

Communication with Elasticsearch is based on Rest API. Programming languages like Java or PHP provide libraries to handle Elasticsearch. This makes life easier for developers.

CHALLENGE: Advanced data search engine with full text search field and data scoring functionality

DATA:

MySQL version: 5.6.28
Table: 169123 records
Average length of searchable field: 2550 characters
Maximum length of searchable field: 686141 characters

Let’s say a user would like to find all rows containing words like: “job”, “php” and “test”.

MySQL – first attempt (typical simple query on long text field):

SELECT count(ap.id) FROM database.application_multimedia ap WHERE content LIKE '%test%' OR content like ‘%job%’ OR content like '%php%';

Duration time: 9,140 seconds
As we can see, the execution time of very simple query is quite high when it comes to the search engine. Our client wouldn’t accept that solution.

MySQL – second attempt (query based on full text indexed field):

Step 1:
We need to change an index of a field:

ALTER TABLE `database`.`application_multimedia` ADD FULLTEXT INDEX `FTS` (`name` ASC, `content` ASC);

Step 2:
Execute query:

SELECT COUNT(ap.id) FROM database.application_multimedia ap WHERE MATCH (name, content) AGAINST('+test +job +php' IN NATURAL LANGUAGE MODE)

Duration time: 0,45 seconds
In this case, the query’s execution time is low. If our search engine were be based only on a full-text search, then the use of MySQL is enough. However, if we’d like to apply more advanced search conditions (and data sorting based on those conditions), then we should consider Elasticsearch.

Elasticsearch – third attempt:

The same query in Elasticsearch:

Duration time: 0,471 seconds
As we can see, the queries’ execution time is similar. So why is Elasticsearch better? Just like I wrote before, if customer wants a full-text search only, MySQL or PostgreSQL are enough. But if s/he wants to develop that engine and make it more complicated, we should definitely consider Elasticsearch.
A simple example of one Elasticsearch feature that makes developers’ lives easier and makes full-text search more useful – highlights. This option returns the context of a searched word.
Example:

Result:

What’s the value of the feature shown above? You’re practically getting something for free. A developer’s effort is minimal – from a customer’s point of view, that’s a big advantage. Achieving a similar functionality while basing on a common relational database would demand a much greater workload.
An additional advantage of using Elasticsearch is its scalability. Along with data growth, we can add more Elasticsearch nodes to the system to ensure its stability and efficiency.
To sum up, an Elasticsearch engine is a solution perfectly matched to advanced search in complex data structures . This is especially true if we want to get precisely ordered results in our application. In the next article, I would like to outline more features and abilities of Elasticsearch – like influence on scoring and search results. Any questions? Ask away!