What is wrong with gen_event?

By Wojciech Gawroński | August 31, 2018

What is wrong with gen_event?

gen_event confused me from the beginning, so I wanted to investigate the topic more deeply. I did that here. Then I left that topic, and it returned recently to me when I was wondering how the situation changed since then. Here is the updated version of the initial investigation, which started with the following statement:

I never used a gen_event, I think it is a bad pattern.

At first, it may look like a controversial statement, but I heard a lot of those complaints from other people. Initially, I heard that exact statement during the presentation made by Garrett Smith about pattern language - someone asked about that behavior at the end. Moreover, I heard similar thing in José Valim’s presentation about Elixir future while ago, when he introduced GenStage and GenRouter idea for the first time.

However, before we dive into reasons and explanations, let’s recall what the purpose of this behavior is.

What is gen_event?

OTP introduces two different terms regarding that behavior - an event manager and event handler modules.

The responsibility of event manager is being a named object which can receive events. An event can be, for example, an error, an alarm, or some information that we log. Inside manager, we can have 0, 1 or more event handlers installed. The responsibility of the handler is to process an event.

When the event manager is notified about an event, all handlers process it. The easiest way to imagine that is to think about the manager as a sink for incoming messages and handlers as different implementations which are writing messages to disk, database or terminal.

Another example is in my implementation of Francesco Cesarini’s assignment called Wolves, Rabbits and Carrots simulation. The primary purpose of that assignment is to introduce yourself to the concurrency, but internally it is a simulation. Inside, multiple events are happening at the same, and rest of entities receive notification.

In that case simulation_event_stream is an event manager:

We can quickly add and remove event handlers. The event manager essentially maintains a list of {Module, State} pairs, where each Module is an event handler, and State is the internal state of that event handler.

One of the handlers implementation - simulation_cli_handler - is related with writing messages to the console. It is the actual gen_event callback module, so all handlers are implementations of that abstraction:

Moreover, the essential part regarding the complaints mentioned above is that: when starting event manager, we spawn it as a process and each event handler is implemented as a callback module. It means that processing logic executes in the same manager process.

BEAM me up, Scotty!

Click and enter your email to get access to the useful resources and get notified whenever we publish a new blog post on our website.

Subscribe me

Why it is problematic?

Let me reiterate on that - after spawning gen_event manager and installing handlers on it, handlers exist in the same process as the manager.

That causes two most significant issues - handlers are not concurrent and, they are not isolated from each other - at least in the notion of a process. Unfortunately, there is more - we heard explicitly that I never used gen_event, I think it is a bad pattern, and whole argumentation about that can be summed by:

  • That behavior mentioned above it is not used anywhere besides error_handler and alerts mechanism in OTP.
  • It causes problems with supervision (because of not so natural approach for Erlang about combining manager and handlers in one process).
  • It is tricky to use it in a fault-tolerant way (as above - all handlers are bound together in the single process).
  • It is tricky to manage state in manager, it may be tempting to use, e.g., process dictionary, but you should push it down to handlers (which is not apparent on the first sight).

So let’s analyze the causes of each complaint separately.

Not widely used in the erts and OTP

That argument is partially correct - as a behavior, they used it for error_logger, alarm_handler and error_handler facilities. Is that a reason to drop the behavior entirely? No, but I think that it is a guide that responsibilities and use cases of that behavior are kind of limited, and much narrower than those we are trying to assign them.

It is the same process for all handlers

It was not explicit on the list, but it manifests itself when it comes to failure handling and supervision. Also, also it has another, really significant drawback - which is obvious when you think about it - all handlers are invoked synchronously and sequentially in one process.

To dispatch an event to the manager, you can use one of two gen_event functions - notify and sync_notify. With first you can dispatch event as quickly as possible, but you have no backpressure applied, and you can end up in the situation when events are incoming at high-speed, but processing is slower. That cause process queue to grow and eventually, it can cause even a crash. It does not also check the manager presence so that you can throw messages to the void. On the other hand - synchronous dispatch waits for event processing by all handlers, which can be slow and eventually become a system bottleneck.

This problem is also very nicely described in the Nick DeMonner talk from this year ElixirConf US conference - check this out if you are interested. Elixir GenEvent implementation also has the third function - ack_notify which acknowledges the incoming messages, and it is something softer than sync_notify, but still asynchronous when it comes to processing.

It is hard to supervise

When you are approaching Erlang as a newcomer, and you are fascinated by the mantra everything should be a process, the worst possible thing that can happen is to have some thoughts about event handling from other platforms or languages. Why? Well my “oh crap” moment about how things work, came when I started an observer and looked for the handler processes. At that moment I realized, oh crap, I have here a single process.

This behavior hides the complexity underneath, and it has perfect assumptions regarding that model of dispatching (if we separate handlers from manager, reliable dispatch is much harder to achieve, e.g., when it comes to fault tolerance), but it is merely counterintuitive when it comes to the Erlang philosophy, especially for the newcomers.

One thing worth mentioning is that you can work around that. One example could be by doing the synchronous call to a different process (in particular - even to another gen_server) inside gen_event.

Those solutions are feasible, but complicate the implementation and do not provide an easy way of handling failures. What happens if our process is overloaded by messages and the synchronous call times out? Replacing it the asynchronous call does not help either.

Failure handling

It is the apparent thought when you realize that handlers and manager coexist in the same process is that: What happens if there is a fault in the installed event handler module?

It may sound strange at the beginning, but it removes faulty event handler silently. It does produce an error report printed on the terminal, but nothing more. Moreover, well-known monitoring techniques, such as link or monitors cannot be used with the event handler module, because it is not a process. Moreover, a faulty event handler code does not crash the manager.

We can use different facility exposed by gen_event called add_sup_handler. It means that there is supervision for connection between the process that wants to dispatch an event and the handler. What does it mean? If we delete the event handler due to a fault, the manager sends a message {gen_event_EXIT, Handler, Reason} to the caller. It means that we need to provide an additional process, often called a guardian for the possibly faulty handler. Then, we dispatch the event through that guardian process, and when it receives the failure message (via handle_info) we can act accordingly to the requirements.

Keep in mind that underneath it uses links, not monitors - event handler chapter from Learn You Some Erlang For Great Good! has an excellent explanation why it may be dangerous and what issues it causes. Long story short, after using add_sup_handler you need to be cautious when it comes to the event manager shutdown.

What is interesting, Elixir’s version of that behavior solved that problem in the past by exposing add_mon_handler/3, which used monitor under the hood. Still, both solutions have another problem - it does not deliver {gen_event_EXIT, Handler, Reason} message when the manager process crash. You need to prepare for this another edge case - you either need to monitor manager or link it and trap exit signals in all handlers.

Moreover, it does not expose it anymore and whole behavior is deprecated, so you should use either directly and Erlang one or one of the alternatives.

State management

One more thing that I cannot stress enough is the state management and that you should always pass the state down to your handlers. The code above describes that, but also when it comes to the fault tolerance - each handler can be removed due to failure operation, and after restoring it, we can pass the new state. If we preserve the state of that handler in the manager (and we build the facility for exposing that), it may cause strange and hard to debug side effects related with the state of the newly created handler.

Alternatives?

Is there something that we can use instead? Without using third parties (like uwiger/gproc) I am afraid that there is nothing like that in the core.

If you are interested, Elixir has GenStage behavior which solves that case. José explained that idea many times, and official documentation suggests that path as well. Official blog post with announcement contains a perfect explanation how you can use GenStage instead of GenEvent.

Another solution mentioned there, is replacing GenEvent with Supervisor and GenServer - this article presents the concept clearly. However, do not use it blindly - at the end of the post, there is a list of drawbacks. One of them is that it still does not provide any mechanism for backpressure, so your mileage may vary. 😉

Summary

If you think more wisely about that, it is not particularly useful behavior, because it has minimal capabilities and responsibilities. Maybe that is the reason why they used it internally so rarely. It also means that we should not bend it to our use cases. If the specific application is very similar to the one used inside OTP (I mean the error_logger) and we do not need concurrency support when it comes to the processing logic, we can safely use it. Otherwise, we incur troubles on ourselves. 😉

It is a typical example of caveat emptor - let the buyer beware.

Credits

Veteran Elixir/Erlang Team Available

Are looking for Elixir or Erlang experts?
You are in the right place! We truly love working with that technology, and as a side effect, it turned out that we have mastered it.

Schedule a call with our expert
comments powered by Disqus