The other day I learned about Odysseus, a framework to process data in real time. Truth to tell, I know little about the topic, but I consider it interesting enought to share it with you. Most of you are familiar with relational databases, SQL, O-R-mappers, transactions, the ACID principle and things like this. Working with streaming data is similar – and at the same time, it requires a major shift of mind. The more you look into it, the bigger the differences.
Consider the little weather radar gadget I embedded in the article1. Most people are interested in the current weather and the near future. So it doesn’t suffice to show a static radar. It should be updated regularly. To do this you could store the weather data in a traditional SQL database and run a query to display the current weather. Of course, you have to update the map regularly, so you have to repeat the query regularly.
While this is possible, there are dedicated systems to make that easier. Odysseus, for instance, is an open source solution for processing streaming data that comes complete with an editor and a framework both for selecting the current data and for visualizing it. The nice thing is that you don’t have to care about updating the visualization. It updates automatically when the data changes.
CQL: Adding the temporal dimension to SQL
CQL is only one of several query languages for data streams, but it may be interesting because it’s an extension of SQL. Basically, CQL query aren’t that different from SQL queries:
SELECT * FROM weather_station ws [RANGE 1 MINUTE]
The difference is that we add a window. In our example the window covers the last minute.
It’s also possible to convert a static relational database to a stream using
RSTREAM. For instance, to update the stream every five minute you add the
SELECT RSTREAM(ws.*) FROM weather_station ws [RANGE 1 MINUTE SLIDE 5 MINUTES]
Wrapping it up
Processing data streams in real time poses interesting challenges. It’s important to deliver the data in a predictable time. This kind of reliability is even more important than speed. It does no good to write an algorithm that provides your car’s engine sensor data that’s a minute old just because the garbage collector needed to “stop the world”.
Beyond traditional office applications there’s a host of other applications. Stream processing plays a crucial role in many everyday applications: controlling the power output of a wind farm, traffic control, online auctions, processing the sensor data of your car, just to name a few.
The weather gadget has been embedded with kind permission of Niederschlags.de.
- For the sake of correctness I should add that I don’t believe that the weather radar has been implemented with CQL. But it’s a nice example of a visualization of streaming data, and from a technical point of view, it probably could be implemented using CQL. ↩