An analysis of the data structures used in EventEmitter2 (http://github.com/hij1nx/eventemitter2)

6/21/2011 - 12:20 AM

An analysis of the data structures used in EventEmitter2 (http://github.com/hij1nx/eventemitter2)

Rendered
Source

Previously events had been stored in an n-ary tree of arbitrary depth (i.e. a deep Javascript object literal), the nodes of which are individual values of a split event. e.g.

var emitter = new EventEmitter2({ delimiter: ':' });
emitter.on('foo:bar:*', function () { });
emitter.on('foo:foo:foo', function () { });
emitter.on('bazz:buzz:foo', function () { });
emitter.on('bazz:fizz:tar', function () { });

Would result in

foo
`-bar
|  `-*
`-foo
|  `-foo
|
bazz
`-buzz
| `-foo
`-fizz
  `-tar

Currently events are stored in a hash table (i.e. a Javascript object literal of depth 1), the keys of which are full qualified events. So the same code above would result in:

foo:bar:*
|
foo:foo:foo
|
bazz:buzz:foo
|
bazz:fizz:tar

In both cases we need to consider the Worst Case, Best Case, and Average Case and compare this to how we might expect this to actually be used by a developer:

N-ary tree of arbitrary depth

Worst Case The worst case is clearly the degeneration of the tree into a linear data structure, something like:

foo
`-bar
  `-bazz
   `-buzz
     `-etc-etc-etc

This would get rid of any benefits of the pruning search that was previously performed and degenerate to O(n) performance for a given call to .emit().

Best Case The best case is a set of event statements which result in a balanced tree (i.e. AVL tree, although not self-balancing). This would result in access, insertion and deletion times of that of an AVL tree or O(logn).

Average Case This is not a formal average case analysis. I think it is somewhat reasonable to assume that there will be a considerable amount of grouping in namespaces (i.e. developers will tend to have 5-10 namespaces per level). This is good for us and pushes the average case towards the best case because this will result in a tree that is at least somewhat balanced.

Hash table

Worst Case / Best Case The worst case here is the same for the best case because the current implementation iterates over the keys in the hash table as a linear data structure. So in all cases (best and worst case) it is an O(n) operation since complete enumeration of the hash table occurs.

Average Case This is not a formal average case analysis. I think that it is reasonable for a developer to want to have a lot of event listeners. Given the benefits of namespacing I could see something like:

  var emitter = new EventEmitter2({ delimiter: ':' });
  
  //
  // Warning: psuedo-code to do something with Twitter
  // tending topics, like hookup to the firehose or whatever
  //
  var trendingTopics = Twitter.get(TRENDING_TOPICS);
  Twitter.on('message', function (msg) {
    var t = infer.topic.from(msg);
    emitter.emit('message:' + c, msg);
  });

  //
  // Do something different on each tending 
  //
  trendingTopics.forEach(function (topic) {
    emitter.on([message, topic].join(':'), function () {
      //
      // Do stuff on a message from this topic
      //
    });
  });

Assuming there are several general hundreds of trending topics, I think you would start to see performance decline significantly as the O(n) implementation (while more succinct) fails to ignore large portions of the event tree.

Contradictory Empirical Evidence

I ran the benchmarks against HEAD and 17c006a (immediately before the rewrite) and I only saw a ~50ms increase in the time to execute which suggests that even given the degenerative nature of the O(n) access time this may be an acceptable approach. However, I would like to see these benchmarks run with hundreds of event listeners, not just one event listener.

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster