Categories
Software Technology

ElasticSearch: How to Solve Advanced Search Problems

A few months ago me and my team were faced with a challenge: to provide an advanced search engine with many simple (and more complicated) criteria and an ability to use a full text search mechanism. In addition, we knew that our client demanded high efficiency and scalability. I’d like to focus on full text search and other advantages of the Elasticsearch solution.

Data, data, data – their amount seems to grow, month after month. We didn’t really have that ‘problem’ a few years ago. Formerly, web applications didn’t have to process that much data at once. Developers didn’t have to pay attention to their application’s efficiency. Then, a vast majority of businesses moved their solutions to the web. They were suddenly faced with the problem of processing big amounts of data. We – the developers – needed to take a different approach.
Most of the problems we face during development can be solved by using one of the main databases. These include MySQL, PostgreSQL or Oracle, or implementing cache systems or making use of non-relational databases like MongoDB.

What if this isn’t enough? And our customer needs something more powerful, a system with a highly efficient data search engine – and scalability?
This is where ElasticSearch comes in. It’s a distributed, real time search engine, based on the Apache Lucene engine.
It gives us:

  • easy full text search
  • advanced search queries
  • many helpful built-in solutions
  • manageable data result scoring
  • shorter development time
  • scalability

Communication with Elasticsearch is based on Rest API. Programming languages like Java or PHP provide libraries to handle Elasticsearch. This makes life easier for developers.

CHALLENGE: Advanced data search engine with full text search field and data scoring functionality

DATA:

  • MySQL version: 5.6.28
  • Table: 169123 records
  • Average length of searchable field: 2550 characters
  • Maximum length of searchable field: 686141 characters

Let’s say a user would like to find all rows containing words like: “job”, “php” and “test”.

MySQL – first attempt (typical simple query on long text field):

SELECT count(ap.id) FROM database.application_multimedia ap WHERE content LIKE '%test%' OR content like ‘%job%’ OR content like '%php%';

Duration time: 9,140 seconds
As we can see, the execution time of very simple query is quite high when it comes to the search engine. Our client wouldn’t accept that solution.

MySQL – second attempt (query based on full text indexed field):

Step 1:
We need to change an index of a field:

ALTER TABLE `database`.`application_multimedia` ADD FULLTEXT INDEX `FTS` (`name` ASC, `content` ASC);

Step 2:
Execute query:

SELECT COUNT(ap.id) FROM database.application_multimedia ap WHERE MATCH (name, content) AGAINST('+test +job +php' IN NATURAL LANGUAGE MODE)

Duration time: 0,45 seconds
In this case, the query’s execution time is low. If our search engine were be based only on a full-text search, then the use of MySQL is enough. However, if we’d like to apply more advanced search conditions (and data sorting based on those conditions), then we should consider Elasticsearch.

Elasticsearch – third attempt:

The same query in Elasticsearch:
elasticsearch
Duration time: 0,471 seconds
As we can see, the queries’ execution time is similar. So why is Elasticsearch better? Just like I wrote before, if customer wants a full-text search only, MySQL or PostgreSQL are enough. But if s/he wants to develop that engine and make it more complicated, we should definitely consider Elasticsearch.
A simple example of one Elasticsearch feature that makes developers’ lives easier and makes full-text search more useful – highlights. This option returns the context of a searched word.
Example:
elasticsearch
Result:
elasticsearch
What’s the value of the feature shown above? You’re practically getting something for free. A developer’s effort is minimal – from a customer’s point of view, that’s a big advantage. Achieving a similar functionality while basing on a common relational database would demand a much greater workload.
An additional advantage of using Elasticsearch is its scalability. Along with data growth, we can add more Elasticsearch nodes to the system to ensure its stability and efficiency.
To sum up, an Elasticsearch engine is a solution perfectly matched to advanced search in complex data structures . This is especially true if we want to get precisely ordered results in our application. In the next article, I would like to outline more features and abilities of Elasticsearch – like influence on scoring and search results. Any questions? Ask away!

Categories
Software Technology Uncategorized

Geecon 2016: Java 9, Spring and the Nyan Cat

Our goal for GeeCON 2016 was to broaden our knowledge about topics we encounter on a daily basis at work. We chose talks concerning Java 9 (and 8), microservices, reactive programming and Docker. Here are a few words on some of the most interesting and inspiring ones.

Tomek

I’ve heard that Sven Peters is a great speaker so without hesitation I chose his Rise of the Machines – Automate Your Development. What a talk it was! Passionate delivery and beautiful slides (do check them out) combined with inspiring content gave me lots of ideas to improve our daily development processes at Espeo. The general idea is to automate all mundane and repeatable processes that do not require human interaction and creativity. The key is to look further than the usual CI/CD tools. I thought that at Espeo we were pretty well automated already but now I know we still have room for improvement. Which is great and I can’t wait to implement a bot or two.

We’ve had Java 8 around for some time now and the direction is ever-changing towards functional programming. But do we really know how to do it and what price we may pay? That’s what Daniel Sawano and Daniel Deogun wanted to explain in their talk “Beyond lambdas – the Aftermath”. It was a code-only presentation and that’s what really matters to us, programmers. They gave us many examples of bad code and showed ways of how to refactor it to remove code smells and side effects. They outlined where the functional style might introduce hidden complexity to our code (coders love those one-liners), generate unnecessary function calls etc. Have you ever analyzed the bytecode of a lambda? Well, at the presentation we did. We learned how to avoid stateful lambdas and how it affects the runtime performance. All in all, a very rich-in-content and detailed talk.
Geecon 2016: Java 9, Spring and the Nyan Cat

Iga

This year’s Geecon was a little bit less interesting than the previous one but I found some interesting speeches. The best one was about Java 9 and its modularity. The runners-up were the charismatic speeches of Josh Long about Spring.

During only 50 minutes, Josh built a brand new application with RESTful services, a pretty GUI and well-designed architecture – all using Spring Boot, convention-over-configuration framework. In another amazing 50 minutes he talked about Spring Cloud – tools for developers to quickly build some of the most common patterns in distributed systems. Examples: configuration management, service discovery, circuit breakers, intelligent routing, micro-proxy, control bus, one-time tokens, global locks, leadership election, distributed sessions, cluster state. All of this was mentioned in context of microservices – a concept highly recommended and praised, but one which can lead to architectural complexity. In his amazing talk, he talked about how organizations like Ticketmaster, Alibaba, and Netflix cope with this complexity with Spring Boot and Spring Cloud.

What was especially interesting was his incredible performance and fluency in writing good code in an extremely short time. Everyone was amazed of how quickly and easily he created working application using those tools. He is also a very talented speaker. The audience laughed many times during his hilarious jokes and funny tricks like changing the Spring logo to the Nyan cat during the build process. If all speakers could talk about software in such funny and interesting way, showing both great tools, great tips and great jokes – Geecon would be the best conference in entire world.
Geecon 2016: Java 9, Spring and the Nyan Cat

Michał

“Java 9 Modularity in Action”. The Java world has been trying to tackle modularity issues for a long time by initiatives such as OSGi. Yet, they were never highly adopted because of the effort needed to actually understand and use these tools. Project Jigsaw, the highlight of Java 9, promises to deal with the problem of modularity at its roots. Project Jigsaw proposes a revolution by getting rid of classpath and introducing a new concept of highly encapsulated modules (so now we will have a… modulepath). These modules will (or rather – should) only expose interfaces to talk with the outer world, and no implementations. Apart from good modular design, which should be a goal in itself, it also solves some annoying problems like the clash of different versions of the same class on the classpath. Of course, to provide backward compatibility, some tools will be provided to make the transition to the world of Jigsaw more painless. The modules for legacy code will be generated automatically. I was surprised to see a live coding session during this talk, and see this modularity concept working, even though it is still some time before Java 9 will be released.

Nobody expects the Spanish Inquisition, and nobody from the Geecon team expected such an interest in the “Java and Docker, a Good Idea” talk by Christopher Batey. The room was packed to the brim with Docker-hungry programmers. Despite the name of the talk, it was not about the question whether to use Docker with JVM or not. It was focused on the not-really-obvious traps to avoid during running JVM inside Docker containers. especially under high load. For instance, subjects like operating near memory limits and page swapping limits were covered.
Geecon 2016: Java 9, Spring and the Nyan Cat
There were many more interesting talks on e.g. event sourcing, reactive programming in general and in detail (RxJava), as well as a bird’s-eye view of microservices and gore implementation details. Plus some interesting concepts like self-healing systems and many performance-related talks. Want to know how storage works? How do traditional HDDs work and how they differ from SSDs? What’s coordinated omission in performance testing and why does it matter? Not to mention Big Data topics (we wrote about those too!). All that and much more on GeeCON. Now let’s put theory into practice and see you there next year!

Categories
Software Technology

Road To Angular 2 – Reactive Programming (RxJS)

Welcome to the first stop on the Road To Angular 2! The new version of Angular isn’t as simple as the previous one. It introduces a lot of new, hot and trendy stuff, so you have to prepare yourself before reaching the final destination!

In this series we’ll introduce you to:

  • Reactive Programming
  • ES6 features
  • TypeScript
  • Main concepts of Angular 2 apps
  • Integrating Angular 2 with Symfony backend

Today I’m going to tell you about Reactive Programming using RxJS, are you ready? Let’s start!

RxJS – why? It isn’t Angular 2, right?

Angular 1 is famous for its simplicity. It includes a lot of necessary features, so we don’t have to use jQuery for XHR communication, for example. As a result, it doesn’t force us to include external libraries. The Angular team has changed their attitude. Microsoft technology (TypeScript) and RxJS are included in Angular 2. As you can see, we have to get to know these technologies before learning our favourite framework.
Reactive programming is a subject important not only for JavaScript Developers, but also for every developer who wants to include asynchronous events into their apps. Rx isn’t only for JavaScript. There are versions for .Net, Java, Python and many others. So, I’d like to encourage you to read this article even if you don’t know JavaScript.
Let’s dive in…

Callback vs Promise vs Observable

JavaScript is a language which offers a few approaches that support handling asynchronous events, so let me just remind you what the difference is between those methods.
If you feel comfortable with callbacks and promises you can simply skip this part.
Callback is technically a function which we pass as a parameter into another function, but what’s more important is that it isn’t executed immediately, but right after the parent function is complete. It gives us the freedom to manage asynchronous events as we want.
https://jsfiddle.net/q288pfb1/7/
As you can see in the example, we get data from the server using Ajax and we pass callback as a parameter. In the callback, we retrieve the response and pass objects to another function in which we insert some items into the list. Easy, right? Is it pretty? Nope!
Have you ever heard of callback hell? It looks like this:

It just isn’t readable.
I promise that there are better places to live in than Callback Hell, one of them – for sure – is Promise Heaven!
Of course we can deal with callback hell but it isn’t main topic of this article. You can read more here: http://callbackhell.com/
Most readability problems can be solved with promises. ES6 introduces the Promise object as an object in JavaScript. Before this, we had to deal with external libraries like jQuery which offered such objects.
A brief reminder. Promise has three states:

  • pending – when initializing object – “Hello, I really want to do something, I promise, but it can take some time, just wait”
  • fulfilled – after function has completed – “Hi! I finished my work and everything is alright, mate!”
  • rejected – meaning – “Something went wrong, sorry”

The same code, but written using Promise, looks like this:
https://jsfiddle.net/y9czkyoc/4/
To put it simply, Promise is an object which can be rejected or resolved. After fulfilling we can process returned data using the then() method. Promise is a set of callbacks under the hood. If you are interested in implementing it, I encourage you to read this article: https://www.promisejs.org/implementing/
As we are talking about Angular, promises are ubiquitous. When we want to get data using Ajax we use the $http service which returns… promise represented by $q object! Our Angular services often look like this:
http://plnkr.co/edit/rUmN1rwPA3795Hytw6R3?p=preview
When I sat down to Angular 2 without documentation, I was surprised that my then() function didn’t work with object returned from http service. What is more, the construction http.get() didn’t make a request. Then I realized that the returned object type is Observable.
Quoting the documentation: “The return value may surprise us. Many of us would expect a promise <that’s the point>. We’d expect to chain a call to then() […] Instead we’re calling a map() method. Clearly, this is not a promise.”
…and so it began… Angular 2 uses a special and trendy reactive way to solve asynchronous methods, which looks like this:
return this.http.get(thistechnologyUrl)
               .map(res => res.json().data)
               .catch(this.handleError);
Before I explain these lines of code, I want to describe what reactive programming means.
Remember React.js !== RxJS

Why does Angular 2 use RxJS? Are promises out of fashion?

A few years ago, web applications weren’t as interactive as they are now. Only asynchronous action was used with the form submit button. Now the situation is a bit different. Often, we want to use many asynchronous actions at the same time. The answer to this problem is reactive programming, which can deal with it. In this situation, we have to change our thinking. In reactive programming, everything is an asynchronous data stream. What does this mean?
It’s likely all of us have used an asynchronous data stream without even realizing it. Are you familiar with code like this?
object.addEventListener("click", function(){
    // Do something after clicking
});
Yes? You often use it, and an asynchronous data stream is equal to such actions, but with additional features. We can say that this stream from reactive programming is an array which contains some values returned by a function over a period of time. As it is an array, it gives us a possibility to use the advantages of functional programming. Often, streams are illustrated like this:

On the top we have one timeline (representing the asynchronous data stream) on which there are some values generated by events. Below the top timeline there is a block representing the functions which, for example, map/filter this stream and return new stream with desirable data.

There are a few definitions you have to know before analysing the code:
Observer pattern – Technically, it is said that there are two types of objects: one of them is called Observable which sends signal/notification to objects called Observers. Observers can react to this signal.
In RxJS, subscribe is a method to listen (observer). The subscribe function gets 3 parameters which are callbacks. The first of them is called onNext and it is passed when value is emitted from Observable. The second one is onError which is passed when something goes wrong (for example status 500 from the server). The last parameter is onCompleted – it’s passed when a stream finishes work.
A lot of people claim that Observables are “lazy”, but what does that actually mean? As I said at the beginning, I was surprised when I wrote http.get() in Angular 2 and it didn’t make a request to the server. This is the difference between Observables and Promises.
I will illustrate it using a simple analogy:
Observables are like the guys who don’t “talk” when nobody wants to listen. They are ready to talk as long as somebody is interested in listening. What is more, they’re smart – and they stop talking when listeners signalise that they don’t want to listen anymore.
Promises think differently, they have only one thing to say and they have to hurry because their lives are short, so they start talking right after they’re born… ‘till they die. We cannot stop them.
Some code illustrating this analogy:
https://jsfiddle.net/wwLkvzbj/4/
Analyse this code and try to understand it. As you can see, promise started working at the beginning where it was declared, but observable started working when the first subscriber/listener appeared. Furthermore, values of observable change over time – different listeners have different values. Promise after resolve cannot be changed or cancelled but observable has these features.
If you aren’t convinced of using Observable you can still use promises in Angular 2. Basically, they convert observable into promise, so would it make sense? Maybe in some situations. This code shows how can you do it:
return this.http.get(this.technologiesUrl)
                .toPromise() // This line convert Observable into promise
                .then(res => res.json().data, this.handleError)
                .then(data => { console.log(data); return data; });

Should we use RxJS in Angular only for server communication?

There are some cases where we can take advantage of reactive programming. It is a good idea when have a lot of asynchronous actions over which we want to have full control. Imagine that you’re writing a web IDE. You want to add autocompletion and hinting syntax. It isn’t good idea to make a request to server every time you fire event keydown – requests will kill your server and slow your application. In this situation, people will be punished for their fast writing and your startup will be ruined. What’s more, you plan to give clients useful shortcuts. There will be combinations of mouse and keyboard events (sometimes connected with request to the server). Using the traditional way you’ll have to:

  • add some variables to control state
  • manually take care of timeouts and clear them
  • spend time to manage code properly for shortcuts

Reactive programming can help you solve these problems. Try to think how you would do it with and without reactive programming. It’s a good exercise.

Conclusion

Reactive programming is difficult and it requires us to change the way we think about solving problems. The topic is so wide that I only managed to scratch the surface of RxJS in basic Angular 2 usage. I think the effort to understand reactive programming will be rewarded in the future.
Some recommended reading:

See you soon at the next stop, ES6 and TypeScript!
 

Categories
Software Technology

Java integration tests with Spring

Recently, we had to write integration tests because a connection with the database was required in almost all aspects of our business logic. Writing only unit tests without a DB connection didn’t make much sense – we were only able to check if the API returns a proper error message in cases like no records or unsuccessful authentication. Therefore, it was necessary to test using the database.

It’s important to test the DB connection without changing the structure of the production database – but how can this be done? The answer is quite simple – use two databases – the production one and the test one.

Our project is written in Java using the Spring 4 platform.
First, I created a test database using dump from the original one.
I prepared scripts to create, recreate and drop this database as needed, which were added to Makefile.

Now, how can you force the application to use the regular database in all cases except test, where it should use the test database? You just have to swap the database.properties file.
My original database.properties looked like this:

datasource.driver-class-name=org.postgresql.Driver datasource.url=jdbc:postgresql://127.0.0.1:5432/my_database datasource.username=my_username datasource.password=my_pass

I added a database-test.properties file:

datasource.driver-class-name=org.postgresql.Driver datasource.url=jdbc:postgresql://127.0.0.1:5432/test_db datasource.username=test_username datasource.password=test_pass

We can use a different property (not only for the database, but for everything else as well) using a Spring annotation from spring-test dependency.

The test class should have 3 annotations:
@RunWith (SpringJUnit4ClassRunner.class) – Indicates that the class should use Spring’s JUnit facilities
@ContextConfiguration (Application.class) – Indicates ApplicationContext.
My Application.class looks like this:

 @Configuration
 @PropertySources({
    @PropertySource("classpath:database.properties"),
    @PropertySource("classpath:application.properties")
 })
 @ComponentScan
 @EnableAutoConfiguration
 public class Application extends SpringBootServletInitializer {
    @Override
    protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
       return application.sources(Application.class);
    }
}

And to override original properties you should just use @TestPropertySource annotation. It is a class-level annotation, configuring the location of property files and inlined properties which should be added to set of PropertySources in the Environment. The test property sources have a higher priority than those added declaratively or programmatically using @PropertySource annotation. It works fine to override single classes and make them use different property sources than in the rest of application.
So, the entire test looked more less like this:

 @RunWith(SpringJUnit4ClassRunner.class)
 @TestPropertySource("classpath:database-test.properties")
 @ContextConfiguration(classes = Application.class)
 public class SomethingIntegrationTest {
    @Autowired
    private SomeRepo repo;
    @Test
    public void should_return_proper_something_list_size() {
       List allSomethings = repo.getAllSomethings();
       assertThat("Somethings list should return 7 records", allSomethings.size(), is(7));
    }
 }

I added this dependency to pom.xml file:

<dependency>  <groupId>org.springframework</groupId>  <artifactId>spring-test</artifactId>  </dependency>

We’re skilled in Java programming, but we’re also experts in creating software in many different languages… if you’re interested, take a look at our services.
author: Iga Stępniak