Skip to main content


scala software development

Scalar: Highlights From One of Europe’s Top Scala Conferences

On Saturday I attended one of the best Scala conferences in this part of the world – Scalar. It was packed with great talks, and I’ve learned so much. Before the conference I actually thought I was good at Scala programming… now, I realize there is a boatload of things that I still need to do. Here’s what motivated me the most.

I was really impressed by three talks:

  • SWAVE – A FRESH REACTIVE STREAMS IMPLEMENTATION by Mathias Doenitz
  • THE EFF MONAD, ONE MONAD TO RULE THEM ALL by Eric Torreborre
  • COOL TOOLZ IN THE SCALAZ AND CATS TOOLBOXES by Jan Pustelinik

I’d say Shapeless, Scalaz and Cats are pretty hard to get, so I’m super glad to have had the opportunity to hear a thing or two about those during Scalar. They’ve been a little easier to get since.

However, the top spot on my podium goes to IOT, TIMESERIES AND PREDICTION WITH ANDROID, CASSANDRA AND SPARK by Amira Lakhal, IMO the best talk at Scalar, and one of the best I’ve ever attended. Simple, professional, funny. Since I’m into Spark and Cassandra, I’d like to write a few things about this presentation. In 30 minutes, Amira showed us how to handle events in realtime (sent by mobile phone to Spark) and a cool analysis. In this case, she showed us how to see whether you’re running, sitting or jogging by tracking your phone position.

How did she do it?

Here you can find an Android app that sends your x,y,z position in realtime to the Cassandra database.
In the background, a library works to calculate the differences between those x,y,z points, and shows a prediction representing the given state (whether we’re sitting, standing, walking or jogging). CassandraReceiver queries Cassandra every 2 seconds using the methods in CassandraQueriesUtils and sends results to ML. It’s well-written and easy to understand for every beginner in Spark and Machine Learning.

But I would change one thing (and only one thing). This method:


def getUsers ( cassandraRowsRDD: CassandraRDD[CassandraRow]): Array[String] =
{
cassandraRowsRDD.select("user_id")
.distinct.map(row => row.toMap)
.map(row => row.get("user_id").asInstanceOf[String])
.collect()
}

I would rewrite like this:


def getUsers ( cassandraRowsRDD: CassandraRDD[CassandraRow]): Array[String] =
{
cassandraRowsRDD.select("user_id")
.distinct.map(row => row.toMap)
.as((user_id: String) => (user_id))
.collect()
}

Here, we’re avoiding additional mapping and casting to string using AsInstanceOf, as it could be a bottleneck later.

I would also use Kafka to get a direct stream from the Android app, and avoid querying Cassandra every time. It’s not a problem if we have a small number of users, but could be a major issue if we have thousands of users per second. Well, Cassandra should handle thousands of queries per seconds, but I would use Cassandra only to write data, and predict the state (sitting, standing, walking or jogging) before saving.

I really like the way she built the prediction model. I’m a big fan of Azure Machine Learning, but now I see that using Spark ML can be easy as well.


def computePrediction(model: RandomForestModel, rdd: RDD[CassandraRow]): Unit = {
println("****************************** start")
val predict: String = FeaturesService.predict(model, FeaturesService.computeFeatures(rdd))
val predictions: List[CassandraRow] = List(CassandraRow.fromMap(predictionResultToMap(new PredictionResult(TEST_USER, new Date().getTime, predict))))
val result: RDD[CassandraRow] = Spark.ssc.sparkContext.parallelize(predictions)
result.saveToCassandra(KEYSPACE, "result")
println("**************************** Predicted activity = " + predict)
}

This simple method can calculate our prediction using Random Forest: and it can accurately tell us what we’re currently doing.
The video from Scalar isn’t available yet, but I will attach it when it becomes available. I did, however, find another talk by Amira on the same topic, though written in Java.

Just to recap: for me, Amira’s talk was the best at Scalar, I learned a great deal, and I’ll definitely use this solution in my future project, since my case is pretty much the same.

Share:Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on Pinterest

Like what you see?

Get in touch! We'll respond quickly, and we'll keep your data confidential.