What have you found for these years?

2010-09-17

asynchronized heroku-scaler in ruby 1.9

Thu, Sep 9, 2010 at 3:33 AM
Umm... where to start?
Since heroku-scaler is running on Ramaze on top of EventMachine (Thin),
it has the ability to run asynchronized. Actually, we're doing that already,
in the background job (periodic timer) checking Heroku dynos, and pinging
itself to prevent from Heroku shutting down "idle" dynos.

But...

Actually they're not really asynchronized, since we're mixing our codes with
Heroku::Client, which *is* synchronized. Thus, our codes would block there.
This is not really a big deal running on Heroku, because the requests are
very fast, and we don't monitor many applications on it. So we might not
notice huge impact. Things are very different running locally (in Taiwan),
or if we monitor many applications there.

When we're querying the number of dynos with Heroku::Client, it uses
rest-client (which rest-graph uses, too) which uses Ruby's build-in Net::HTTP
to do HTTP request. Actually, it'll do *3* HTTP requests in a run.

If we're monitoring 5 applications there, and for every 10 seconds, the
check begins, which will do at least 15 (3 times 5) HTTP requests to check
the numbers of dynos. This is a *huge* pain, if each request needs 100ms.
I am not sure how many requests we'll need setting dynos. I didn't check it.
But for querying, it's already not acceptable, running locally. Yes we won't
do that for production, but still a potential issue, IMHO. Say if we have 10+
applications need to be monitored. It's just like creating spikes to spike
ourselves... Moreover, there would be a big gap between first check and
last check, since it is executing sequentially!

So it'd better make Heroku::Client evented (asynchronized) as well.

But now we'll face another issue. Asynchronized programs are a lot harder
to build than synchronized programs. I felt a bit pain writing rest-api-cacher
back in the day. The problem is that in synchronized programs, everything
is one-by-one, and one-by-one; while in asynchronized programs, every
asynchronized part is nested, and nested, and nested...

It's still acceptable if we're building programs from scratch. It's not so
easy, but once used to it, it's very natural to do that. The problem is....
Converting existing synchronized programs to asynchronized programs is a
*huge* pain. Or in other words, it's not worth the effort to do so. At least
I don't want to do that.

em-synchrony[0] might be the rescue.

In short, em-synchrony wraps all asynchronized operations with a fiber,
which can pause the execution while waiting for a callback, and resume
the execution when the callback gets called. It's ok not understanding
the detail, we can just remember that this could be *transparent* for the
client code. That is, writing in synchronized style, while under the hook,
it's executed asynchronized.

This way, it's sooooo easy to convert existing synchronized programs
asynchronized. In this case, it's Heroku::Client.

I've already written this monkey patch[1] and it seems working. (hurray)

We'll need a Fiber.new{ }.resume to wrap the codes where we need
Heroku::Client. Also, if we're doing synchronized asynchronized HTTP
request inside Ramaze action, we'll need rack-fiber_pool[2] as well.

I've removed that requirement with this commit: [...]
Since we can read apps from memory instead of fetching from database,
the background job (periodic timer) would do that for the Ramaze actions.

Now the background jobs won't block anymore, all checks begin at
(nearly) the same time, (it's very significant running locally) and
Ramaze actions don't touch HTTP requests and won't block as well,
for both background checks and HTTP requests.

The bad news is, fiber is only available in Ruby 1.9, so I've only pushed
it into a separate branch instead of master...

I am testing this on Ruby 1.9.2, but Heroku only supports up to 1.9.1.
Not sure if there's any significant difference. At least 1.8.6 => 1.8.7 is
a lot more significant than 1.9.1 => 1.9.2...

cheers,

[0] I've pasted this slide for many times, still, read this for
detailed explanation.
http://en.oreilly.com/rails2010/public/schedule/detail/14096
http://github.com/igrigorik/em-synchrony

[1] http://gist.github.com/570648
[2] http://github.com/mperham/rack-fiber_pool




*** Tue, 14 Sep at 5:52pm via email ***

I forgot to mention Rails.

Ramaze is a lot better in design, but... Rails has much more tools!
For example, I need to take care of the data
in between each test; while in Rails, they do
this for us. Rails has better error messages,
Ramaze do not, sometimes it gives mysterious
error messages because of misusing the API.

In short, unless we feel strong need for concurrency,
stick with Rails could reduce cost on developing our own tools.
Not sure about Rails 3 still. (we could try it on the next new project)




*** 2010-09-17 13:50 ***

The hardest problem I've met so far is that EventMachine is easily crashed
encountering my bugs. We'll want the server lives as long as it could when
it is really serving requests or running jobs periodically. I thought
`EM.error_handler` would do the trick, that catches all exception and
sends the report to getexceptional[3]. But the reality is that EM has crashed,
and there's double free error in Ruby 1.9.2!! I am not sure what's wrong
with it, is it a EM bug, or a Ruby bug, or both of them have some bugs?

I can't google something meaningful for EM.error_handler.
There are so few discussion about this. It seems that not many people
using EventMachine directly, thus only developers of EM is discussing it.
This reminds me there's a post[4] that says:
Why I Don't Like EventMachine, And Why You Should Use Rev (and Revactor) Instead
I haven't read it carefully. I should, but very, very busy recently,
can't find a good time to calm down and read it peacefully.

I start wondering if EventMachine is a good implementation.
I can't judge for only glimpsing at the C++ codes.

On the other hand, em-synchrony is good, but sometimes there are errors
say that "FiberError: can't yield from root fiber". I don't know what
exactly the reason behind this, since I've already wrapped a lot of fiber
outside. All I can do is that wrap more, and trial-and-error. This is
tiring as well.

Evented code is harder to code (well) than my imagination.

Conclusion, there are two major issues need to be solved more elegantly.

1. Catch-all error_handler? Working or not?
2. When shall we wrap a fiber when using em-synchrony?

And minor issues:

1. em-http-request's API is hard to use. We need a rest-client layer on it.
2. em-synchrony should not override #get #post, since it changes the semantic
of the original method. Instead, it should proide #sget, #spost, not aliasing
the old ones into #aget, #apost.

[3] http://getexceptional.com/
[4] http://www.unlimitednovelty.com/2010/08/multithreaded-rails-is-generally-better.html

0 retries:

Post a Comment

Note: Only a member of this blog may post a comment.



All texts are licensed under CC Attribution 3.0