What have you found for these years?

2011-03-17

asynchronized heroku-scaler in ruby 1.9 (2)

It's been a while after the first launch of heroku-scaler.
During this, the most trouble for us is that exceptions
happen in callbacks in EventMachine would cause the
entire Thin server down. This is not acceptable.

I thought `EM.error_handler` would handle this, but it seems
it only works for some exceptions, but not all. I mean,
some circumstances, but not all. Sometimes we'd see
exceptions were properly handled, but sometimes it just
crashed.

Finally I decided to dig into this and wanted to solve this once
and for all. (sorry, I am so lazy to dig into this unless someone
really needs it).

According to the EventMachine codes, exceptions are
rescued, and then it'll schedule the machine stop, then
at last it will re-raise the rescued exception again.

It seems that previously, there are handlers to deal with
this, but are commented with:
No one was using it, and it degraded performance significantly
I am not sure why *no one was using it*, but that's it...

So what I can do here is, overwrite (yes, it's overwrite, not
override, because this is not designed to be overridable,
especially here EventMachine is used inside Thin, not our
application.) `EM.stop`, and detect that if the exception was
stored in `@wrapped_exception`. If so, then report it to the
monitor, and then erase it to pretend nothing happened.

But if `@wrapped_exception` is nil, then just call the original
`EM.stop` to schedule a regular stop.

At first, I was trying to do an un-stop, but the code is written in
C(++), the scheduling state is stored in a global boolean variable!
I don't want to write a C extension for this... :s

Anyway, I just want to say that there should be an easy way to
handle this, instead of digging into the code and monkey patching it.

*

I don't know, but I feel although EventMachine is fairly famous
or popular, not many people are using it directly. I wonder if it's
because it's too hard, or it's just like a powerful tool that only
power users are using it, and it's just trivial to solve this, so
no one was talking about it on the Internet.

Weird...

p.s. 0
Ramaze is not working correctly when I was trying to
use `env['async.callback'].call`. manveru told me that
Ramaze is thread-safe but not em-safe (not designed
for EM). I had tried to change Thread to Fiber to see
if this would solve the problem, but somehow it only
worked half way... I am too lazy to dig into it to see why :(

p.s. 1
Rainbows with EventMachine as concurrency model is
working very oddly. One is that it will consume too much
CPU resource, another one is there are delays between
each HTTP request. I am... I guess I don't have enough
knowledge to dig into this. I might should be reporting
this to Eric Wong.

0 retries:

Post a Comment

All texts are licensed under CC Attribution 3.0