General
- What is Complex Event Processing (CEP)?
- Where does Complex Event Processing fit in the 4D model (Detect-Derive-Decide-Do)
- Can you compare Esper to stream processing platforms please?
- How does Esper compare to other CEP products?
- How does Esper scale?
- Is Esper memory efficient?
- What part of Esper is in-memory computing?
- What latency can be achieved and is it "real-time"?
- What are the advantages of Esper's Event Processing Language (EPL)?
- What business areas/problems is Esper best suited for?
- Please clarify some common misconceptions?
- What might be some misuses for it?
- What is the intended audience and what is their interface?
- Who uses Esper?
- What is the concept or philosophy behind the design?
- How has this been tested? What guarantees do I have that the next release works just as well?
- How do you issue bug fixes or patches? How are problems tracked?
- What operating systems has it been tested on?
- It claims to be fast...how does it do that? Has this claim been tested?
- Do you have any benchmarks available for Esper?
- Can Esper handle a large number of statements?
How does it work?
- How does Esper work? How does Esper allow you to search and match patterns on temporal events?
- What algorithms does Esper use? Is it based on research?
- What is the difference between Esper and an in-memory database?
- How does the runtime discern which data to retain? Is it based solely on the statements registered with the runtime at the time the event comes in?
- What or how many events does Esper keep in memory? Does Esper keep matching events of a statement in memory?
- What happens on runtime start? I assume that if I have a time based statement, since there's no history within the runtime, there's no way to get any events to fire until the time and events have been consumed?
- Can statements be added to the system only on runtime start, or can they be added dynamically? Will those statements work with any internally stored historical data when they're started?
- When working with composite streams, i.e. when using the 'insert into' mechanism, does the entity being inserted into have to be a registered object in the system or are those created simply by registering the statement?
- Could you explain the concept of data windows for a database programmer?
- What is the difference between "select * from MyEvent" and "select* from MyEvent#length(1)" and "select * from MyEvent#keepall()" ?
- What happens if I send events to which no statement is created yet? Can a time-to-live be specified to retry matching an event?
- How can the timestamp for an event be explicitly controlled? What to do with event occurrence time, time synchronization or event timestamps?
- Does the runtime make a copy of events?
Integration
- What additional components does Esper require to run?
- Can I run it with multiple threads? What, if anything, is multithread-safe?
- What is the footprint of Esper in a typical installation, i.e. what is the RAM, disk and CPU usage?
- If one overloads Esper with events through a queue, will Esper queue events internally until it processes them or will events stay in the queue until Esper process them?
- Is there a way to send a bunch of events into Esper and get back a notification when it is done with processing all events?
- What is the policy you recommend on UpdateListener (or Observer) if UpdateListener does long processing? For example, why don't you create a thread or attach a queue for output in some examples?
- Have you tested Esper in an OSGI container?
- What ClassLoader does it use? How do I get the class loading right in OSGI, Apache Axis or other containers?
- Can Esper run on small devices?
- How do I test EPL? Is there integration with a testing framework?
General
What is Complex Event Processing (CEP)?
-
Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible (source: Wikipedia).
CEP aims at detecting situations. CEP is not a general-purpose application code container or a distributed processing platform. CEP helps in detecting situations by providing a declarative language (event processing language, EPL) or other abstractions to make situation detection easier and faster.
Typically, CEP is state-ful analysis since in order to detect situations there is a need to remember certain things. For example for simple counting we need to remember the current count and for pattern matching we need to remember partially-matched patterns. The current count and partially-matched patterns are for this example the state to keep.
Typically, situations must be detected as soon as they occur. Detection latency is the time between an event arriving and a situation being detected. CEP aims for detection latencies that are clearly below 1 second. Detection latencies between 1 nano-second to 1 millisecond are often desired and are achievable with CEP.
Typically, CEP deals with large amounts of small data points (events) arriving. Many of those events may cause state to change. Therefore state can be frequently updated and fast-changing.
Typically, CEP analysis means that the state is not the events themselves but the information derived from the events. For example, when simply counting, the derived information is the count and the event itself contributing to the count is not remembered. For example, when pattern matching, that fact that an event arrived is state however the events contributing to the pattern match may not need to be remembered.
Typically, CEP is interested in time passing which can be event-time or other time. State can change and expire based on time passing.
Where does Complex Event Processing fit in the 4D model (Detect-Derive-Decide-Do)
-
The purpose of CEP is analyzing events and finding situations of interest. CEP detects and derives information, so you can become aware of a situation immediately and react in the best possible way.
An example situation to be detected is: A suspicious account is derived whenever there are at least three large cash deposits in the last 15 days.
- The "Detect" is about the raw event, for example a cash deposit event.
- The "Derive" is about the situation, i.e. "did something happen?", for example there is a suspicious account.
- The "Decide" is about the decision of what to do, for example the decision to determine a risk score or determine another course of action
- The "Do" is the action, for example an action that opens an investigation
The "Detect" and "Derive" are the responsibility of CEP. CEP is time and event-driven and continuous in nature. It deals with a stream (historical or currently-arriving) of pre-defined but open-ended events, different event types, along with associated event data and may have more than one input source.
The "Decide" is sometimes handled by decision management tools or rules engines, as their strength lies in decision tables and fact based analysis. Decision management tools are generally request-driven seeking conclusion to a current business decision by running a fact analysis with one execution.
The "Do" is sometimes handled by business process or workflow tools.
Can you compare Esper to stream processing platforms please?
-
The following table attempts to capture the differences:
 Complex Event Processing Stream Processing Also Known As CEP, event stream processing, event series analysis Real-time computation, stream computing    Example Providers Esper (in no particular order, just to mention a few, there are significant others)
   Type Business Intelligence and Decision-Making Container Technology. Containers are mutually exclusive to each other; i.e. running Storm within Spark Streaming or J2EE does not make sense.    Framework/Platform versus Language/Compiler/Runtime Esper is a language, a compiler and a runtime (similar to Scala) that operates on top of the JVM.
Esper, EsperHA and some parts of Enterprise Edition are components that can run as part of any JVM-based software stack. Enterprise Edition provides both a masterless horizontal scale-out platform as well as a more server-centric architecture.
Framework, platform or system.    Pattern matching and detection, filtering, transformation, aggregation, event hierarchies, detecting relationships (such as causality, membership or timing) between events, managing event lifecycle Central to Esper and CEP Not central to stream processing    Transporting events between processes and hosts Not central to Esper and CEP in general. Enterprise Edition however addresses this requirement. Central to stream processing    Distribution and fault-tolerance A central concern of Enterprise Edition. A central concern    Embeddable runtime Esper, EsperHA and Enterprise Edition components are embeddable into any JVM process regardless of JVM language. They are supported by EsperTech when used with Storm, Samza, Spark, Flink and Akka. Not generally embeddable (with exceptions), may require specific JVM launch and OS    Schemas Larger variety of schema types. Less likely to be unstructured data. Likely more relational and flat but not necessarily, sometimes unstructured data.    Sharing events among many use cases For CEP, sharing events across many statements or many patterns is a central problem. Topologies typically have few hand-programmed operators. Tends to place a higher emphasis on high data volumes with relatively fewer statements.    Continuous Queries (Statements) Express stream analysis in event processing language (EPL); Compile using the compiler; Deploy into a runtime; (all at runtime); no need to restart the server or container Code your own operators, subclass framework classes, package, configure servers, deploy, stop the process or job, redeploy; No means to add continuous queries on-the-fly    Target Analysis Extract-Transform-Load (ETL), Distributed Remote Procedure Call (DRPC), Integrating systems, Simple aggregation Attributions: Mani K. Chandy, Opher Etzion, Rainer von Ammon, "10201 Executive Summary and Manifesto Event Processing", Schloss Dagstuhl, 2011.
How does Esper compare to other CEP products?
-
Esper is an open-source language, compiler and runtime available under the GNU GPL license (GPL also known as GPL v2). The open-source nature of Esper helps in tailoring the event processing language and other community driven features.
Esper is the only compiler for streaming analytics that produces 100% byte code.
Esper and NEsper are embeddable components written in Java and C# and are therefore suitable for integration into any Java process or .NET-based process including J2EE application servers or standalone Java applications. Esper and NEsper are not a server by itself but are designed to hook into any sort of server, ranging from market standard JEE server (weblogic, websphere, jboss etc), service bus, or lightweight solutions (OSGi based, grid etc) and also Microsoft based .Net technologies. NEsper is suitable for use in desktop end-user stations.
The other advantages this model offers is that the components can run standalone in your development environment making development and testing much easier, while for the target production environment this makes it much more tailored to what you really need, or possibly have already in place. End to end performance and latency is also enhanced as your application may then not need to transport events to a dedicated remote server process, but can process events at the event source saving marshalling/unmarshalling/network.
Esper's pull and fire-and-forget API is noteworthy, one of our customers recently remarked "Indeed one of the important feature of a Real-Time analytics is to be able to connect to CEP on-demand, basically if the analytic is off-line, the server is continuing to calculate.". Most CEP products support a subscribe-only model and for other products no state snapshot queries are possible.
Events in Esper allow for a rich domain object representation since Esper supports all aspects of object-oriented design as well as dynamic typing. Many other CEP products force a flat Map-like tuple-set definition of events which we think is not rich enough. Esper can thus handle schema evolution well.
Esper features a Statement Object Model API, which is a set of classes to directly construct, manipulate or interrogate EPL statements.
Another user stated "Esper can detect all kinds of event patterns, from simple and/or/not to complicated state machines." Many other CEP productions only offer SQL joins and aggregations. Esper in addition offers a pattern language based on university research, and another pattern runtime based on NFA regular expressions from proposed SQL standards.
Esper offers a rich set of parameterizable data windows (expiry policies). Most other software provide a very small set of very simple rolling, sliding or hopping windows. Esper data windows can be put into intersection and union set-logic relationships.
One customer remarked "Many products expose just a GUI to input simple event definitions. EPL provides a way to express complex events." Indeed Esper expression language that is very extensible and makes use of lambda closure-type constructs can handle complex analysis.
How does Esper scale?
-
Linear horizontal scalability, elastic scaling, load distribution, balancing and re-balancing, fault tolerance, dynamic discovery of nodes through seed nodes, replication and multi-datacenter support are addressed by Esper Enterprise Edition which builds on Esper and EsperHA at its core.
Scaling has three components.
The first component of scaling is the throughput that can be achieved running single-threaded. For Esper we think this number is very high and likely between 10k to 200k events per second.
The second component of scaling is scale-up by adding CPUs and/or memory. For Esper we have tested 32 CPUs and found that, with proper statement design the runtime can achieve excellent parallel processing performance. Please contact us if you need help. It is not necessary to create 32 statements or 32 runtime instances to utilize 32 CPUs: It is sufficient to properly design 2 EPL statements at a minimum. We recommend reviewing Context and Context Partitions in the documentation in detail.
The third component of scaling is scale-out by adding JVMs and/or systems. Scaling across JVMs is not a design goal of the core Esper CEP runtime itself however it is a primary concern of EsperHA and Enterprise Edition.
Is Esper memory efficient?
-
We have analyzed the memory stack for large statement numbers (100k+ statements) and large context partition numbers (100k+ partitions) with common types of statements and optimized memory use.
The Esper compiler generates byte code for various constructs. One such construct is the aggregation row. The compiler co-locates aggregation state into fields such that each aggregation value is one or more fields and there is no need to allocate an object for each aggregation value in each row.
-
The runtime design is such that most common operations do not allocate unnecessary objects from heap.
It is not necessary to retain any events in memory for performing aggregations or pattern matching.
EsperHA in conjunction with Esper manages heap memory and provides fault tolerance. EsperHA can largely eliminate the chance of out-of-memory errors for certain types of use cases.
It can however be expected that Esper requires more memory then hand-coded specific application code would need.
What part of Esper is in-memory computing?
-
All of Esper computing is in-memory computing.
What latency can be achieved and is it "real-time"?
-
Complex Event Processing and Esper are standing queries (statements) and latency to the answer is usually below 10us with more than 99% predictability. Note that mileage varies depending on use case therefore please contact us for tuning information or help.
Most existing big data technology requires saving data to someplace and then performing queries on saved data. Most Hadoop or Map-Reduce infrastructure answers queries with latency of minutes to hours. Some in-memory databases and optimizations reduce query answer time to minutes or seconds.
Esper provides real-time Big Data analytics for immediate insight, turning high velocity log and other machine data into streaming operational intelligence. Why is Esper so fast? Esper is a 'NoDatabase' technology since no data is stored. Instead data arrives as real-time streams and is processed in-memory using continuous SQL-conforming queries. This allows for massively parallel streaming data processing, ensuring the best use of today's multi-core, multi-blade servers, while still allowing applications to be deployed in a fraction of the time and at a fraction of the cost of alternative Big Data analytics solutions.
What are the advantages of Esper's Event Processing Language (EPL)?
-
The Esper event processing language (EPL) converges event stream processing (filtering, joins, aggregation) and complex event processing (causality) into one single language. The core language is SQL conforming ensuring rapid learning, but is also highly oriented toward support of modern technologies so it is for example object oriented (more than table oriented), enabling for simple extension. The language, of course, includes event windows and causality patterns as first citizens. We natively support several types of event formats, from Java/.Net object, maps, to XML documents.
What business areas/problems is Esper best suited for?
-
Esper is best suited for real-time event driven applications. Typical application areas are business process management and automation, finance, network and application monitoring and sensor network applications. Esper take much of the complexity out of developing applications that detect patterns among events, filter events, aggregate time or length windows of events, join event streams, trigger based on absence of events etc.
A primary difference with system relying on classical SQL databases is that we do not query a repository for events matching some conditions, but instead trigger customized actions as the flow of events come in matching event conditions - hence drastically reducing the latency.
Please clarify some common misconceptions?
-
Misconception: "Complex" in complex event processing means "complexity".
"Complex" stands for forming composite events by detecting relationships between events. It does not mean processing is necessarily complex or complicated.
Misconception: SQL handles time-based data poorly.
SQL does a great job at expressing joins, subqueries and aggregation relationships, whether the data is time-based or not. SQL is a solid base for extensions that are specific to time-based data such as interval algebra,time-based patterns etc..
Misconception: Esper retains events in memory
Esper may retain events in memory if you instruct it to do so, but by default will not. When specifying a data window, patterns, special functions and certain other constructs then relevant events can be retained. Many uses such as aggregation, filtering and transformation do not retain events in memory.
Misconception: Esper uses a lot of threading and queues internally
By default there are no threads except for a timer and no queues, unless you change the configuration.
Misconception: Esper copies properties of an event between streams.
It does not.
Misconception: Multiple aggregation or other functions means multiple states
The query planner rewrites your functions to refer to the same underlying data structure.
Misconception: To aggregate data points across many dimensions you need to retain the data points separately for each dimension
Through named windows and/or group-by with rollup the runtime holds data points once for any number of dimensions to be aggregated across.
What might be some misuses for it?
-
Esper is not designed for storing and retrieval of fairly static data - that is better left to conventional databases.In-memory databases may be better suited to CEP applications than traditional relational databases as they generally have good query performance. Yet they are not optimized to provide immediate, real-time query results required for CEP and event stream analysis.
What is the intended audience and what is their interface?
-
Developers use the Esper compiler and runtime APIs within the applications. The APIs are mature and stable between release versions (with ths exception of Esper-8 changes).
Esper Enterprise Edition provides a graphical user interface and REST services. It also provide an out-of-the-box server with many development and enterprise-ability features. It is designed for power uses, business analysis and developers.
Who uses Esper?
-
We think Esper is the most widely-deployed CEP technology available. It's licensing and technical requirements have made it suitable for integration in to many environments and software products.
EsperTech provides redistribution licenses to software companies that incorporate Esper/NEsper under commercial (non-GPL) terms. Some of the companies that agreed to be listed as licensors appear on the EsperTech web site.
What is the concept or philosophy behind the design?
-
Esper was developed using test-driven development and excellent automated test coverage. Esper design evolved by re-factoring with courage towards higher design quality. Favorite patterns are dependency injection/inversion of control by context injection, Immutable, Specification, GOF patterns (except Singleton :).
We did not assume that runtime current time is the same for all events, or that each event must carry a long-type or nanosecond value that somehow relates to that runtime current time. Consider events coming from multiple un-synchronized systems, the concept of runtime current time can be convenient but not always. We allow declaring a start timestamp property name and end timestamp property name on event types in connection with support for interval algebra, for example.
We did not assume that every event arrive in order. See below for a more complete discussion.
We did not assume that analysis is only over a few events - the runtime optimizes in various ways for evaluating large numbers of events.
We did not assume there are only a few or few hundred statements - the runtime tries to keep overhead per statement small.
We did not assume the application is single-threaded and that the runtime must control any threads.
We did not limit event types to be a tuple of name-value pairs, and did not limit available types to a few numeric or string types. The runtime supports any Java type including application types, inheritance and polymorphism.
How has this been tested? What guarantees do I have that the next release works just as well?
-
Esper uses the JUnit testing framework to automate regression testing of the system. While JUnit is well integrated with common development tools and build processes, it is a bit inflexible when it comes to managing large numbers of test scenarios. Esper has a very large number of tests.
We care a lot about having good tests. The aim is to achieve excellent test coverage and correct tests. Test performance is also important so that a full test run does not take long. Tests for Esper must be well-organized so that scenarios are easy to find and the organization must be extensible. Tests must also be reusable and re-runnable with different settings.
-
Given the above goals, we have structured tests into a hierarchical organization:
- Suite: top is the suite, namely: client,context,epl,event,expr,multithread,infra,pattern,resultset,rowrecog,view.
- Sub-suite: Within suites we have sub-suites, since some suites have many test categories, for example for the "event" package we have: avro,bean,infra,map,objectarray,plugin,render,revision,variant,xml.
- Executions: Within suite or sub-suite are executions, which test an area of functional or non-functional compiler and runtime behavior. We do not use JUnit for executions and instead we use a custom runner interface "RegressionExecution".
- Assertions: Within executions live assertions, sometimes one assertion or multiple assertions and sometimes nested assertions. With assertions we mean pieces of code that exercise behavior, atomically.
You can find the suites at the following link: https://github.com/espertechinc/esper/tree/master/regression-lib/src/main/java/com/espertech/esper/regressionlib/suite
The "regression-lib" folder builds into a jar file. The reason that we package the execution classes into a jar file is that we execute the same tests with EsperHA and by using a jar file we can reuse the same executions.
In order to actually execute tests, you need to run the JUnit test cases. These can be found in the "regression-run" folder. The test cases that run the executions are at this link: https://github.com/espertechinc/esper/tree/master/regression-run/src/test/java/com/espertech/esper/regressionrun/suite
As of version 8.0.0 tests take about 20 minutes to run and achieve test coverage as reported by IntelliJ IDEA test coverage:
Element Class % Method % Line % com.espertech.esper.common 93% 75% 80% com.espertech.esper.compiler 97% 58% 78% com.espertech.esper.runtime 67% 61% 69% We certify that every minor release (for example 2.x to 2.y) does not break the public interface of the compiler, runtime and common APIs and does not (substantially) break the implied interface (the expected result of EPL statements and typing).
The Esper team follows the practice of test-driven development (TDD) rigorously, ensuring that each feature added has automated test coverage. We develop and evolve the tests for each feature along with the feature that is currently being developed.
How do you issue bug fixes or patches? How are problems tracked?
-
We track all known issues in Github. Many of the improvement or feature requests are also tracked in Github issues.
Patches are attached to the specific Github issue for which we are issuing a patch. Patches are always cumulative and we always list the Github issues addressed by a patch.
What operating systems has it been tested on?
-
We test the compiler and runtime each release on Windows and on Linux. Please see the change history notes that indicate what JDK version(s) are used for building the release, and which we certified to pass all tests.
It claims to be fast...how does it do that? Has this claim been tested?
-
We have published benchmark kit results and have the benchmark kit itself available for download.
We have spent a good amount of time making sure the Esper compiler compiles EPL into byte code that is efficient.
Esper builds and maintains all data structures in memory. It does not use or require any internal or external database or disk drive to run. The internal data structures are optimized for minimal locking and high write speed thereby allowing the runtime to process events at the speed of arrival for most applications.
Esper is fully multi-thread safe and typically able to leverage all available CPUs. The runtime provides advanced threading options for inbound threading, outbound threading, timer execution threading and route (internal event) threading.
Esper does not provide any transports or protocols to bring events into the runtime or to handle outgoing events. Most applications find event input and event output a bottleneck.
Esper internally builds all the indexes and uses many optimization techniques hidden to your application. These techniques are verified as part of the performance-asserting regression tests that are part of the source code, and that are executed as part of our build process.
There is a chapter in the documentation dedicated to performance tips.
Currently no standard performance test exists for CEP engines.
Do you have any benchmarks available for Esper?
-
We have published benchmark kit results and have the benchmark kit itself available for download.
There is no benchmark industry-wide that would allow easy comparison.
We also have an RIFD demo example that is designed to do performance testing (we can for example run about 100 000 event per second against 2000 statements on a single dual-core CPU of a commodity hardware - but one would argue this does not mean a lot if we don't look at event, statement complexity, underlying resiliency etc.).
Compared to other software in the CEP space, (N)Esper can run on a very large number of platforms -basically any platforms that has a Java or .Net runtime, either 32bit or 64bit, with no lock-in with any operating system(slightly more true for Esper than NEsper of course). It is also possible to run Esper on modern compute appliances such as from Azul(R) technologies in the field of high performance computing (hundreds of core, real time capabilities, etc). This is strictly not possible for other CEP engines.
Can Esper handle a large number of statements?
-
The answer will depend on the types of statements required. The best approach may thus be to get familiar with the benchmark kit and use or customize it to test the specific type of statements needed.
The observation will likely be that a larger number of statements have very little implication on Esper itself- although the memory footprint due to internal bookkeeping will slightly increase.
The time to create new statements is usually very small, likely less than 1 millisecond or perhaps 2 to3 milliseconds per statement. The runtime has been tested with a very large number of statements (large > 100k). The runtime is very efficient in matching incoming events to the statement(s) that need to see an event. Also note the performance tips section in the documentation that provides additional hints.
Â
How does it work?
How does Esper work? How does Esper allow you to search and match patterns on temporal events?
-
Esper is an event stream processing (ESP) and event correlation engine (CEP) written in Java. Basically instead of working as a database where you put stuff in to later poll it using SQL queries, Esper works as real time engine that triggers actions when event conditions occur among event streams. A tailored Event Processing Language (EPL) allows registering statements in the runtime, using Java objects(POJO, JavaBean) to represent events. A listener class - which is basically also a POJO - will then be called by the runtime when the EPL condition is matched as events come in. The EPL allows expressing complex matching conditions that include temporal windows, and join different event streams, as well as filter and sort them.
A simple example could be to compute the average stock price of the IBM tick on a sliding window of 30seconds. Given a StockTick event bean with a price and symbol property and the EPL"select avg(price) from StockTick(symbol='IBM')#time(30 sec)",a POJO would get notified as tick come in - and in real world millions of ticks can come in -so there's no way to store them all to later query them using a classical database architecture. Statements can be much more complex, and also be combined together with "followed by" conditions.
The internals of Esper are made up of fairly complex algorithms (see next).
What algorithms does Esper use? Is it based on research?
-
The EPL pattern runtime is a dynamic state machine in which states can have sub-states. The idea of EPL patterns is from the "Rapide" pattern language of Stanford University research. The EPL pattern runtime does not employ NFA, it is a based on dynamic state trees where branches (active pattern sub-expressions) create and destroy.
The term "delta networks", a network of objects in which only changes to data are communicated across object boundaries and only when required, is at the foundation of runtime design.
Esper uses indexes, a data structure that improves the speed of data retrieval operations. For sorted access it may prefer a binary tree index while a hash-based index is great for key lookups.
For efficient matching of incoming events to statements the runtime uses inverted indexes.
Multi-version concurrency control is a concept used for variables and also for filters to allow concurrency and reduce locking.
The match-recognize pattern matching functionality is built using non deterministic finite automata(NFA).
Statement query planning based on the analysis of expressions used in the where-clause is another technique used by the compiler and runtime. The execution strategy may choose nested-loops versus merge joins.
The Esper grammar is built using ANTLR and based on Extended Backus-Naur Form (EBNF).
Allan`s interval algebra is the foundation for many of the date-time methods.
Enumeration methods employ lambda expressions aka. closures.
What is the difference between Esper and an in-memory database?
-
The Esper compiler and runtime works a bit like a database turned upside-down. Instead of storing the data and running queries against stored data, the Esper runtime allows applications to store queries and run the data through. Response from the Esper runtime is real-time when conditions occur that match queries. The execution model is thus continuous rather than only when a query is submitted.
How does the runtime discern which data to retain? Is it based solely on the statements registered with the runtime at the time the event comes in?
-
Yes, the data that the runtime retains is based solely on the statements registered with the runtime. The runtime retains the minimum needed events and/or derived data to satisfy any started statements. Thus, if one has an runtime running and consuming events, but has no statements registered with the runtime, the runtime does not retain any data.
What or how many events does Esper keep in memory? Does Esper keep matching events of a statement in memory?
-
Memory use depends solely on the EPL statements.For example, if all your EPL statements are simply "select * from Event" then Esper keeps zero events in memory.
Esper keeps the events in memory for a certain amount of time only if one of the following conditions holds:
- Your EPL statement declares a data window, for example "select * from Event#time(1 second)". In this case Esper keeps exactly the last 1 second of events, according to runtime/event time, in memory.
- Your EPL statement declares a named window Esper keeps the events according to the named window declaration.
- Your EPL statement declares an EPL pattern, for example "select * from pattern [a=Event -> b=Event]". For EPL patterns Esper keeps only tagged events in memory until the pattern either fires or its sub-expression ends. In the example the "a" and "b" are the tags. If you remove the tags from the EPL pattern, for example "select * from pattern [Event -> Event]", that means the runtime does not keep any event in memory.
- Your EPL statement declares a match-recognize pattern and the measures-clause selects the variable. If the measures-clause does not select the variable the runtime does not keep the matching event in memory.
- If your EPL statement uses output rate limiting, depending on the output-clause and the hints, either the input events or the output events can be buffered until the output actually takes place (see docs for more information).
- If you change the default configuration and configure advanced threading such as inbound, outbound or route threading, there is a thread pool and queue.
What happens on runtime start? I assume that if I have a time based statement, since there's no history within the runtime, there's no way to get any events to fire until the time and events have been consumed?
-
The Named Window feature allows starting statements from a prior event history. Named windows are similar to traditional tables and help to initialize new statements with data.
Currently the Esper runtime itself does not provide state persistence, fail-over or recovery, or an event replay mechanism. The Esper runtime does not write to disk or perform any IO, cluster or persist in any other way runtime state. Thus if your application process or your hardware system goes down, then the runtime state is lost.
If you need fail-over and/or recovery capability, then the EsperHA (Esper High Availability) product by EsperTech can be a good solution. Please contact us for further information.
Alternatively your application could replay events into the runtime, but that is currently a process the application or middleware must do and that Esper has no facilities for.
Can statements be added to the system only on runtime start, or can they be added dynamically? Will those statements work with any internally stored historical data when they're started?
-
Statements can be compiled and deployed as well as undeployed while the runtime is running in a multithread-safe fashion. The facility to explicitly attach or initialize a new statement from a prior data window is a feature of named windows.
When working with composite streams, i.e. when using the 'insert into' mechanism, does the entity being inserted into have to be a registered object in the system or are those created simply by registering the statement?
-
No, there is no registration required. The compile and deploy of the statement that contains the insert-into clause creates the new stream and makes it available to use in further statements.
Could you explain the concept of data windows for a database programmer?
-
One could perhaps think of a table with a timestamp column containing the time when the row was inserted. We could create a view that sorts by timestamp descending and selects all rows between the current timestamp and say up to 1 minute prior to now. Every time we fire a query against this view, the view returns the recent rows added in the last 1 minute. The rows returned are the contents of a 1minute time window. Every time the query is fired we get a new window. Older rows would seem to leave the window while new rows would seem to enter the window.
What is the difference between "select * from MyEvent" and "select* from MyEvent#length(1)" and "select * from MyEvent#keepall()" ?
-
The statement "select * from MyEvent" retains no event data, posts an insert stream and does not post a remove stream. If adding an aggregation function such as "select sum(qty) from MyEvent"then the statement returns the total quantity since the statement was started. The iterator method whenused on the statement returns no events. The "previous" function is not available.
The statement "select * from MyEvent#length(1)" retains the last event and posts an insert and remove stream. If adding an aggregation function such as "select sum(qty) fromMyEvent#length(1)" then the statement returns the quantity of the last event. The "iterator" method when used on the statement returns the last event. The "previous" function is available and can query the immediately previous event only.
The statement "select * from MyEvent#keepall()" retains all events since the statement was started. When using a named window, via on-select or on-delete one can do queries on such retained data. If adding an aggregation function such as "select sum(qty) from MyEvent#keepall()" thenthe statement returns the total quantity since the statement was started. The iterator method when used on the statement returns all events since statement start. The "previous" function is available and can query any depth of previous events.
What happens if I send events to which no statement is created yet? Can a time-to-live be specified to retry matching an event?
-
Events that are of no interest to any started EPL statement are simply dropped. There is no internal queue they are retained in. To specify a time-to-live for unmatched events consider using a data window such as a time-window. When events leave the time window they are finally no longer considered, but while they are in the time window you can match, filter or aggregate, for example.
Your application can use the UnmatchedEventListener interface to catch an event that is dropped because no EPL statement needs to see it. Your application would need to retain and retry such unmatched events to achieve a time-to-live for unmatched events.
How can the timestamp for an event be explicitly controlled? What to do with event occurrence time, time synchronization or event timestamps?
-
Consider attaching timestamps to events and the use of EPL support for Allan`s interval algebra in the where-clause of joins and sub-queries (see docs). You may declare a start timestamp property name and an end timestamp property name for event types. These work conveniently with interval algebra methods.
Consider controlling the concept of time in your application code: In Esper the concept of time is under the control of an application via CurrentTimeEvent at the level of the runtime and at the level of statements via isolated service provider. This allows any number of time dimensions to be controlled separately, such as event originating time or transmitted time for example.
Consider using the time-order view (ext:time_order) to reorder events arriving out-of-order. The time order view can operate on a timestamp event property that is part of your event. The time-order view works by buffering events for a short amount of time to allow late-arriving events to sort into place. The time-ordered output stream of the time-order view can then be used by the EPL pattern followed-by and the match-recognize regular expression patterns, which both provide a concise and convenient syntax but require that the pattern-specific input stream (not the arrival stream of event into the runtime) is ordered.
Consider if you have events that are created by different computers you may want all those computers to have synchronized time, e.g. using NTP: either the event itself already has a timestamp in it, or when you send the event using any transport you add a timestamp. Note that if you manually add a timestamp, that timestamp won't exactly be the real time at which the event was created, it will just be the time when it was received by that computer which is going to forward it - also investigate how precise you can get using NTP nor how precise you need to be.
Does the runtime make a copy of events?
-
The runtime will not ever copy events unless modifying events using on-update or update-istream.
Assume the following example statement:
select * from pattern[A -> every B]#time(1 min) as pair
For the above pattern, the runtime internally generates one intermediate event for each combination of A and B (one could name this event 'pair'). Each intermediate event is a Map that holds a reference to the original A event and a reference to the current B event. The time window holds intermediate events. Depending on your select clause the intermediate events get transformed into output events or output directly.
Integration
What additional components does Esper require to run?
-
Please see the feature list for this information. And the "lib" folder in the source distribution contains a readme file that describes the dependencies.
Esper builds and maintains all data structures in memory. It does not use or require any relational database or disk drive to run.
Can I run it with multiple threads? What, if anything, is multithread-safe?
-
All administrative and runtime operations are multithread-safe as of release 1.5 for all types of statements. Applications can perform multithreaded sends of events into the runtime as well as create, start and stop statements during operation, while retaining full control over threading and efficiently sharing resources between statements.
Additionally, Esper supports multiple independent Esper runtimes per Java VM. Thus applications can segregate work to multiple runtime instances allocating one or more threads to each runtime instance.
Iterating (pull-model) of result data by using a statement's safeIterator method concurrently to the statement's processing of events is also thread-safe.
Not thread-safe are the following: iterating via iterator method (use safeIterator instead),configuration API, SODA API when sharing a statement object model instance between threads.
What is the footprint of Esper in a typical installation, i.e. what is theRAM, disk and CPU usage?
-
The kernel itself if very lightweight and fits in a few MB heap (in RAM). The disk usage is also limited(logs, jar file of a about 2 MB incl. third party jars, and a few KB for configuration files).
The CPU consumption is a factor of the events entering and exiting the system, and also of the actual listeners you register with the statements.
The heap consumption (RAM) is proportional to the number of streams and statements you deal with and the window sizes (correlating / computing average over 100 events or 100000 events, or for 10 seconds or 10 days).
For more information, please check the performance section in the reference documentation.
If one overloads Esper with events through a queue, will Esper queue events internally until it processes them or will events stay in the queue until Esper process them?
-
Esper does not have an internal event queue. The threading is completely driven by the application that embeds Esper. Please see the documentation under API and threading for more information.
The EsperIO package has adapters for various queues.
Is there a way to send a bunch of events into Esper and get back a notification whenit is done with processing all events?
-
As discussed before, there is no internal event queue, and the application threads process all events. Esper does not have a built-in mechanism to bunch up messages, however the Java concurrent library provides very good infrastructure to do this. A bunch of messages can be collected into a list, and a threadpool (see java.util.concurrent.Executors) and Callable, that takes the list and sends the events to Esper, could be used here, for example.
Our benchmark kit provides an example built upon concurrent queues and thread pools.
What is the policy you recommend on UpdateListener (or Observer) if UpdateListener does long processing? For example, why don't you create a thread or attach a queue for output in some examples?
-
A blocking UpdateListener or subscriber blocks event processing for that thread, unless configuring the outbound threading option that the runtime provides. Decoupling output processing via a further threadpool may have an advantage if output listener processing can be very slow, but incurs the cost of further threads and context switching.
One factor to consider is the number of output events. A second factor is the action that your application may perform for each output event, such as whether the event needs to be communicated to another system or simply displayed.
The examples generally leave the transport and threading out of the picture, since that is specific to any application and particular integration environment as well as event stream density. We also want to keep examples simple.
Have you tested Esper in an OSGI container?
-
Yes we have tested in Equinox and Felix containers. The Esper and EsperIO jar files ship with OSGI-compliant manifests that can be used as they are or modified if necessary. Esper dependencies are available for download from an OSGi bundle repository thus each dependent jar file can itself be a bundle and resolved by the OSGi container.
The next FAQ entry answers the ClassLoader question. If using Java classes as events in an OSGi container then such event classes may need to be on the system classpath or the thread's ClassLoader may need to be set explicitly by your application.
When using nested-Map or XML DOM as events then a regular import of the exported Esper packages should suffice.
What ClassLoader does it use? How do I get the class loading right in OSGI, ApacheAxis or other containers?
-
We have users that use Esper in an OSGi container and therefore consistently use the thread's context classloader as the following examples show:
Class.forName(name, true, Thread.currentThread().getContextClassLoader()); // Java Reflection
We recommend that you use the same classloader with all entry points to Esper: Creating and configuring the Esper runtime instance, creating a statement and supplying a subscriber or listener to it and sending data to the Esper runtime via sendEvent() and route().
One user is using Esper inside an Axis2 web-service. The class defining the service compiles and loads just fine; but the user got a runtime error from Esper "Error configuring runtime: Event type .. was not found". For reasons the user didn't completely understand, Axis2 changes the classloader when starting theservice. The user didn't know why, but expected that it would be critical for other Axis2 processing.So, he copied a solution used by a colleague working on this project with a similar class loader issue:
// get current context classloader ClassLoader contextClassloader = Thread.currentThread().getContextClassLoader(); // then alter the class-loader (but which one ? the one used to load this class itself) with: Thread.currentThread().setContextClassLoader(this.class.getClassLoader()); // create my Esper statement, and finally restore the class loader to its original value: Thread.currentThread().setContextClassLoader(contextClassloader);
When sending events into an Esper runtime in a bundle, in the default configuration the thread that calls the send event method performs all the work, which may also require additional class loading. Consider configuring Esper with an inbound and outbound threadpool to have runtime threads perform this work. Please see the documentation on advanced threading options.
Can Esper run on small devices?
-
Esper is a 100% Java component and works anywhere the minimum required Java version is fully supported. The list of dependencies is small: The compiler requires just ANTLR, Janino and SLF4J, while the runtime only requirs SLF4J. The compiler and runtime have no disk or other device or storage dependency and its memory and CPU use requirements depend only on what statements are needed.
How do I test EPL? Is there integration with a testing framework?
-
The best source of examples for testing is the regression test suite execution classes that can be found at this link: https://github.com/espertechinc/esper/tree/master/regression-lib/src/main/java/com/espertech/esper/regressionlib/suite.
When controlling time in a test case, use the "EPEventService#advanceTime" method to set or advance time. You would want to set time before creating a statement and advance time between events as needed (make sure you turn the internal timer off via configuration).
To help make the test cases in your test suite independent of each other, and independent of the order in which they are executed, use the initialize method to reset runtime state to the last provided configuration.
Consider implementing multi-threaded tests and/or a good simulator for production data to simulate production-like conditions early on.
In below code snippet we provide a possible layout for test cases using the JUnit test suite:
public class TestSampleJUnit extends TestCase { private EPRuntime runtime; // Called by the testing framework public void setUp() { // configure with timer disabled, leaving the concept of time in control of this test Configuration config = new Configuration(); config.getRuntime().getThreading().setInternalTimerEnabled(false); // use the initialize method to reset the runtime to pristine state before each test, if desired runtime = EPRuntimeProvider.getDefaultRuntime(config); runtime.initialize(); } public void testSampleOne() { // set time to a start time, lets say time zero (0) but could be any time or System.currentTimeMillis runtime.getEventService().advanceTime(0); EPStatement stmt = compileDeploy(runtime, "... my EPL statement here...").getStatements()[0]; // add a listener or subscriber to the statement, or iterate the statement // send a new event object for each event runtime.getEventService().sendEventBean(new MyEvent(), "MyEvent"); // advance time as needed, here we pretend 1 second passed runtime.getEventService().advanceTime(10000); // send more events and assert as needed } public void testSampleTwo() {...} private static EPDeployment compileDeploy(EPRuntime runtime, String epl) { try { // Obtain a copy of the runtime configuration Configuration configuration = runtime.getConfigurationDeepCopy(); // Build compiler arguments CompilerArguments args = new CompilerArguments(configuration); // Make the existing EPL objects available to the compiler args.getPath().add(runtime.getRuntimePath()); // Compile EPCompiled compiled = EPCompilerProvider.getCompiler().compile(epl, args); // Return the deployment return runtime.getDeploymentService().deploy(compiled); } catch (Exception ex) { throw new RuntimeException(ex); } } }