Elasticsearch is a powerful search tool – whether you’re going through candidate profiles or TV show suggestions. Due to its speed and vast range of capabilities, it outperforms standard searches on databases. A lesser-known but awesome functionality of ES are aggregations. Thanks to aggregations, we can quickly resolve problems which earlier took hours. Plus, we have an opportunity to go deeper into business analysis. In this article, I’d like to focus on aggregations, then describe their types, and lastly, show 3 tricky use cases.
What is Elasticsearch?
Before we start, a few words about Elasticsearch itself. Elasticsearch is a search engine which provides full-text search, a HTTP web interface for managing data and running queries and schema-free JSON documents as both queries body and data body. And what’s really important, Elasticsearch is really, really fast. It gets relevant especially when you analyze huge datasets. So, while standard databases engines are still processing complex queries, Elasticsearch already has the results you need.
What are aggregations?
We already know what Elasticsearch is and that it’s great for searching documents. But what are aggregations, and why should we use them? The dictionary explanation of the term is not very helpful in our context: “a group, body, or mass composed of many distinct parts or individuals”. Let’s try to explain what this means from a business perspective. When we talk about searching, we ask: “Which of these documents fit the query best?” and when we talk about aggregation we can ask: “What can these documents tell me about my business?”. So, aggregations combine analytics and summaries.
There are four types of aggregations. Let’s look at two most popular ones. The easier ones are metrics. Simple count, sum, average, minimum, maximum and so on. Straightforward – but in the right context, extremely powerful. The second are buckets. A bucket is a collection of documents that meet certain criteria. You can divide a group of people into, for example, men and women, then each of them to age groups. Aggregations can be nested, so now you can use a metric to calculate average salary for each of these buckets (aka groups).
Now you should know what aggregations are on a basic level. To give you an even better understanding, I’ll provide 3 use cases. You’ll see how wide the usage can be.
I’ve already briefly mentioned the first use case. Let’s say you have a multinational company and want to calculate a summary of earnings, starting from job location, then to sex and finally age groups. Well, here you are. Now, group employees into buckets (firstly job location, secondly sex, lastly age groups) and, for each bucket, calculate the average salary (and some more metrics if you want).
The next use case is usage of significant terms. This is a type of bucket aggregation based on uncommonly common terms. Let’s imagine you are the owner of a chocolate bar company and want to know where your products are selling best. It’s not enough to simply count people in different cities, because it’s obvious that in a bigger city you sell more than in a small one. Because that’s what you’d end up with if you do it wrong (courtesy of xkcd):
However, you can also use significant terms. In this situation, you get cities with unusual high sales comparing to the similar ones in size.
The last example is about real-time error tracking. Let’s assume that every time an error occurs in your system, you create a new document with some information about it. After each failure, we can now check if the number of errors of a given type is higher than in the previous time span. If it indeed went up, you could automatically alarm your administrator.
A sample request could look like this:
To sum up
I’ve described what Elasticsearch aggregations are, showed their types and gave 3 use cases – simple and hopefully clear! Hope this article leaves you with better understanding of Elasticsearch’s great features, and how we can distinguish it from standard SQL database full-text search. Its strengths are best employed for, for example, CV or recruitment platforms, where fast and efficient searching is a necessity.
Here is more information on the ElasticSearch and advanced search problems.
Drop us an email at [email protected] and we’ll explain what we’ve done with Elasticsearch so far.