Big Data technologies at Confitura conference

Confitura fascinates me every year. So many great talks, and this year was definitely the strongest one so far, especially in terms of Big Data technologies. I always thought it was the paid conferences that attracted the top speakers. I was wrong. Confitura has shown that a free entrance can guarantee the same level of quality as a paid one. In this case, I would say the level was even higher. Why? Just take a look at these presentations.

Sławomir Sobótka: C4 – a light approach to architecture documentation

I was pretty excited to attend to this talk, and I wasn’t disappointed. It seemed like he was describing me personally. He pointed out how many mistakes every developer makes. Thanks to this presentation, I realized how important it is to include a short description of why a given technology stack has been chosen into every project.
For my current project, I’ve chosen Cassandra as a database, and Apache Spark as a tool to filter huge amounts of data. In CONTRIBUTION.md, I put a long description of why Cassandra has been chosen. Since it doesn’t have JOIN and GROUP BY it might be really hard to understand how it works properly. Thanks to CONTRIBUTION.md, each new developer now knows how to use it. Additionally, it’s good to show some use cases, your thoughts why you’ve chosen this particular technology stack and not another. It could prevent many mistakes or inaccuracies in the future.
The presentation is available here [PL] https://youtu.be/T12Fdqf6ReQ?t=7h33m10s

Maciej Próchniak: Streams, flows and storms – How not to drown with your data

Such a nice guy to listen to. After his presentation, I realized how much better Kafka is compared to other message brokers. I can’t say much about Apache Flink and Kafka, because I’ve only used Spark Streaming before, but check out the video I attached below. The presentation is very professional, and I really wanted to mention it.
Presentation available here [PL] https://youtu.be/RFstLZc_2y8?t=1h34m34s [EN] https://www.youtube.com/watch?v=-L_Rc6ElqJ0

Andrzej Ludwikowski: Cassandra – Lessons learned

This presentation confirms my opinion that we chose the right database for one of our projects. It’s based on time series, and it works in connection with Kafka and Apache Spark. As I wrote before, this database has no JOIN and GROUP BY and Andrzej confirmed my belief that the way we store data in this database and maintain data consistency is the right way indeed. Data duplication in Cassandra is ok, you don’t need to care about redundancy. Nowadays, SSD disks are relatively cheaper than RAM, so whether you have 100GB of data or 300GB – it costs you nothing. When you have 1GB of data, and no redundancy, you must do joins – it costs money. You add more RAM. RAM is much more expensive than SSD disks.
This is the clue. The video is not available yet, but hope it will be uploaded soon. Such a good presentation.

Grzegorz Piwowarek: Javaslang – Functional Java the right way

Each Java programmer should definitely start using http://www.javaslang.io. As a Scala programmer, I was really surprised that Java finally has the same options as Scala has, available though this library. For example, Optionals with the Guards:
Match(optional).of(
   Case(Optional($(v -> v != null)), "defined"),
   Case(Optional($(v -> v == null)), "empty")
);
By using it with Try you can, for example, handle each error using if statement in a nice, functional way.
A result = Try.of(this::bunchOfWork)
   .recover(x -> Match(x).of(
       Case(instanceOf(Exception_1.class), ...),
       Case(instanceOf(Exception_2.class), ...),
       Case(instanceOf(Exception_n.class), ...)
   ))
   .getOrElse(other);
The video is not yet available either.
Thanks to Grzegorz, I found a really cool library. This is the thing about every IT conference. Sometimes you need just a couple of minutes to find a new way to solve an issue. Other times, you find a new library, or you meet a great new speaker. And as Confitura has shown – you don’t need to pay for this knowledge. I’ll be on the lookout for Big Data technologies news (and you can read a bit more about Big Data from me here). I apologise for not mentioning anyone else, but it’s impossible to be on each session if there are 5 concurrent ones!