This post explores what the term Window means.
Wikipedia under the topic of Data Stream Management System lists a nice definition of Window:
Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to FIFO lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data.
In Esper we have...
- Data is the events that are arriving into Esper
- A window that considers the last 10 elements is #length(10), aka. length window
- A window that considers the last 10 seconds of data is #time(10), aka. time window.
The smallest unit of change to a window is an individual event. A new event goes inside the window. The old event escapes the window. And thus events come and go. When such a change happens Esper determines if this change is meaningful. It does that by incrementally updating aggregations and match-recognize patterns each time an event comes and goes. When for example a query compares an aggregation against a threshold value it indicates this meaningful change to the application.
---> Esper evaluates windows continuously and incrementally on the level of individual events.
I have looked up Apache Flink which has a write-up on windows among its documentation in Application Development->Streaming->Operators->Windows. In Flink windows are at the heart of processing infinite streams. Windows split the stream into “buckets†of finite size, over which we can apply computations.
. Flink forms a window and only when the window is completely formed does it apply a computation and then form a new window, unless I'm mistaken. This sounds a lot like batch processing to me. But batch != window.
In Esper a window can be an arbitrary subset of events. Here are two examples.
- #length(10)#time(10) considers the last 10 elements that are not older than 10 seconds
- create window MyWindow#keepall with on-merge, to insert and remove events according to any criteria
Other systems seem to form a window only from the data that arrives next to each other. It seems impossible in other systems to form a window across arbitrary data. They seem to require inserting into a table of some kind as I understand.
The term window in Esper means subset of events and in some systems means batch-delineation.