indexzero
6/21/2011 - 12:20 AM

An analysis of the data structures used in EventEmitter2 (http://github.com/hij1nx/eventemitter2)

An analysis of the data structures used in EventEmitter2 (http://github.com/hij1nx/eventemitter2)

Previously events had been stored in an n-ary tree of arbitrary depth (i.e. a deep Javascript object literal), the nodes of which are individual values of a split event. e.g.

var emitter = new EventEmitter2({ delimiter: ':' });
emitter.on('foo:bar:*', function () { });
emitter.on('foo:foo:foo', function () { });
emitter.on('bazz:buzz:foo', function () { });
emitter.on('bazz:fizz:tar', function () { });

Would result in

foo
`-bar
|  `-*
`-foo
|  `-foo
|
bazz
`-buzz
| `-foo
`-fizz
  `-tar

Currently events are stored in a hash table (i.e. a Javascript object literal of depth 1), the keys of which are full qualified events. So the same code above would result in:

foo:bar:*
|
foo:foo:foo
|
bazz:buzz:foo
|
bazz:fizz:tar

In both cases we need to consider the Worst Case, Best Case, and Average Case and compare this to how we might expect this to actually be used by a developer:

N-ary tree of arbitrary depth

Worst Case The worst case is clearly the degeneration of the tree into a linear data structure, something like:

foo
`-bar
  `-bazz
   `-buzz
     `-etc-etc-etc

This would get rid of any benefits of the pruning search that was previously performed and degenerate to O(n) performance for a given call to .emit().

Best Case The best case is a set of event statements which result in a balanced tree (i.e. AVL tree, although not self-balancing). This would result in access, insertion and deletion times of that of an AVL tree or O(logn).

Average Case This is not a formal average case analysis. I think it is somewhat reasonable to assume that there will be a considerable amount of grouping in namespaces (i.e. developers will tend to have 5-10 namespaces per level). This is good for us and pushes the average case towards the best case because this will result in a tree that is at least somewhat balanced.

Hash table

Worst Case / Best Case The worst case here is the same for the best case because the current implementation iterates over the keys in the hash table as a linear data structure. So in all cases (best and worst case) it is an O(n) operation since complete enumeration of the hash table occurs.

Average Case This is not a formal average case analysis. I think that it is reasonable for a developer to want to have a lot of event listeners. Given the benefits of namespacing I could see something like:

  var emitter = new EventEmitter2({ delimiter: ':' });
  
  //
  // Warning: psuedo-code to do something with Twitter
  // tending topics, like hookup to the firehose or whatever
  //
  var trendingTopics = Twitter.get(TRENDING_TOPICS);
  Twitter.on('message', function (msg) {
    var t = infer.topic.from(msg);
    emitter.emit('message:' + c, msg);
  });

  //
  // Do something different on each tending 
  //
  trendingTopics.forEach(function (topic) {
    emitter.on([message, topic].join(':'), function () {
      //
      // Do stuff on a message from this topic
      //
    });
  });

Assuming there are several general hundreds of trending topics, I think you would start to see performance decline significantly as the O(n) implementation (while more succinct) fails to ignore large portions of the event tree.

Contradictory Empirical Evidence

I ran the benchmarks against HEAD and 17c006a (immediately before the rewrite) and I only saw a ~50ms increase in the time to execute which suggests that even given the degenerative nature of the O(n) access time this may be an acceptable approach. However, I would like to see these benchmarks run with hundreds of event listeners, not just one event listener.