A few months ago me and my team were faced with a challenge: to provide an advanced search engine with many simple (and more complicated) criteria and an ability to use a full text search mechanism. In addition, we knew that our client demanded high efficiency and scalability. I’d like to focus on full text search and other advantages of the Elasticsearch solution.
Data, data, data – their amount seems to grow, month after month. We didn’t really have that ‘problem’ a few years ago. Formerly, web applications didn’t have to process that much data at once. Developers didn’t have to pay attention to their application’s efficiency. Then, a vast majority of businesses moved their solutions to the web. They were suddenly faced with the problem of processing big amounts of data. We – the developers – needed to take a different approach.
Most of the problems we face during development can be solved by using one of the main databases. These include MySQL, PostgreSQL or Oracle, or implementing cache systems or making use of non-relational databases like MongoDB.
What if this isn’t enough? And our customer needs something more powerful, a system with a highly efficient data search engine – and scalability?
This is where ElasticSearch comes in. It’s a distributed, real time search engine, based on the Apache Lucene engine.
It gives us:
- easy full text search
- advanced search queries
- many helpful built-in solutions
- manageable data result scoring
- shorter development time
- scalability
Communication with Elasticsearch is based on Rest API. Programming languages like Java or PHP provide libraries to handle Elasticsearch. This makes life easier for developers.
CHALLENGE: Advanced data search engine with full text search field and data scoring functionality
DATA:
- MySQL version: 5.6.28
- Table: 169123 records
- Average length of searchable field: 2550 characters
- Maximum length of searchable field: 686141 characters
Let’s say a user would like to find all rows containing words like: “job”, “php” and “test”.
MySQL – first attempt (typical simple query on long text field):
SELECT count(ap.id) FROM database.application_multimedia ap WHERE content LIKE '%test%' OR content like ‘%job%’ OR content like '%php%';
Duration time: 9,140 seconds
As we can see, the execution time of very simple query is quite high when it comes to the search engine. Our client wouldn’t accept that solution.
MySQL – second attempt (query based on full text indexed field):
Step 1:
We need to change an index of a field:
ALTER TABLE `database`.`application_multimedia` ADD FULLTEXT INDEX `FTS` (`name` ASC, `content` ASC);
Step 2:
Execute query:
SELECT COUNT(ap.id) FROM database.application_multimedia ap WHERE MATCH (name, content) AGAINST('+test +job +php' IN NATURAL LANGUAGE MODE)
Duration time: 0,45 seconds
In this case, the query’s execution time is low. If our search engine were be based only on a full-text search, then the use of MySQL is enough. However, if we’d like to apply more advanced search conditions (and data sorting based on those conditions), then we should consider Elasticsearch.
Elasticsearch – third attempt:
The same query in Elasticsearch:
Duration time: 0,471 seconds
As we can see, the queries’ execution time is similar. So why is Elasticsearch better? Just like I wrote before, if customer wants a full-text search only, MySQL or PostgreSQL are enough. But if s/he wants to develop that engine and make it more complicated, we should definitely consider Elasticsearch.
A simple example of one Elasticsearch feature that makes developers’ lives easier and makes full-text search more useful – highlights. This option returns the context of a searched word.
Example:
Result:
What’s the value of the feature shown above? You’re practically getting something for free. A developer’s effort is minimal – from a customer’s point of view, that’s a big advantage. Achieving a similar functionality while basing on a common relational database would demand a much greater workload.
An additional advantage of using Elasticsearch is its scalability. Along with data growth, we can add more Elasticsearch nodes to the system to ensure its stability and efficiency.
To sum up, an Elasticsearch engine is a solution perfectly matched to advanced search in complex data structures . This is especially true if we want to get precisely ordered results in our application. In the next article, I would like to outline more features and abilities of Elasticsearch – like influence on scoring and search results. Any questions? Ask away!