What have you found for these years?

2012-02-19

Fiber-aware asynchronous rest-core

Alright, this is a long story. I'll try to keep this short to save me
some time, but well, I am a bit excited to announce that this now
finally works, after all those hard works.....

To begin with, I've implemented asynchronous HTTP requests in
rest-graph long ago, by using em-http-request. Also multi-requests
are supported, but let's forget that at the moment.

Later I started to work on rest-core, which is the successor of
rest-graph. Also rest-more was spun off from rest-core, which
collects pre-built clients.

I did implement most of the features from rest-graph in rest-core
and rest-more. Til today, there's still some missing features haven't
been implemented in rest-core and rest-more, i.e. test utilities and
multi-requests I just mentioned above. But other than that, they are
all implemented.

Asynchrony support is *just done*, lying in async branch in rest-core
and rest-more. It's not yet released as a gem, but when I think it's
ready, I would release it as 0.9.

Setting up for Rails


For now, we're testing it by putting this in our Gemfile:
gem 'rest-more', :git => 'git://github.com/cardinalblue/rest-more.git',
                 :submodules => true,
                 :branch     => 'async'
And we're using Zbatery (Rainbows (Unicorn)) as the web server,
wrapping around EventMachine and Fibers, with this configuration:
worker_processes 4 # assuming four CPU cores
preload_app      true

Rainbows! do
  use :EventMachine, :em_client_class => lambda{
    RainbowsEventMachineFiberClient
  }
  worker_connections        100

  client_max_body_size      20*1024*1024 # 20 megabytes
  client_header_buffer_size  8*1024      #  8 kilobytes
end

require 'rest-core'
::RC::Builder.default_app = ::RC::EmHttpRequestFiber

class RainbowsEventMachineFiberClient < Rainbows::EventMachine::Client
  def app_call input
    Fiber.new{ super }.resume
  end
end
Run the server with this in Procfile for Heroku's Cedar stack:
ruby -r bundler/setup -S zbatery \
-c config/rainbows.rb -p $PORT -E $RACK_ENV

There is some quirks which I don't understand for this setup. For example,
why if we use Thin then `rake assets:precompile` would try to
connect to the database, and then failed on Heroku, while Zbatery
won't and assets are successfully compiled. Why if we use Thin then
Redis.current = ... is enough for redis-object while
Zbatery needs $redis = Redis.current = ..., otherwise
redis-objects won't be correctly connected. I spent a lot of time
trying to fix this, but I still don't understand what's going on.

I saw someone is saying that we need after_fork hook to
re-establish the connection for ActiveRecord and Redis in Rainbows,
but well, Zbatery won't fork at all.

Also, running Thin we can also lazily compile the assets on the fly,
while under Zbatery, assets pipeline would cause SystemStackOverflow!
I don't get it, but that's why I spent a lot of time trying to fix assets
precompile for Heroku... Well, it's a very tedious work to fix that,
I don't really want to talk about this here. Just remember this is very
hard to fix due to the way how Rails loads things.

Anyway, now it works... Let's put our focus on what's more important.

Fiber-aware asynchronous rest-core?

The problem of synchronous I/O


Let's start with the traditional blocking synchronous style. ok, I don't
think I need to explain this. Everyone knows that this is very bad for
concurrency. One Rails process could only serve one request? This
must be fixed. Consider we're developing a website which has an
API, and we want to call that API on our own website. Well, we can't.

We can't!? You yelled. ok well, I yelled.

No, we can't. Because Rails can only handle one request per time,
while the first request is not yet finished, all the incoming requests
can't be served. That is, we can't call our API in the request, it's called:



So we'll need to launch multiple processes to handle this situation.
Unicorn can solve this beautifully, we can give it an unicorn configuration,
saying that we want worker_processes 2, then the master
unicorn would handle the load balancing for us, serving the first request
in one worker, and serve the API request in another worker.

Well, then what about calling an external API? We all know Facebook's
API is painful to work with. Sometimes it would need over 10 seconds
to serve our API request, for merely a simple query.

It just happens, occasionally. We don't really want to rely on our luck,
praying Facebook is up or fast all the time. API calls could block the
request, but it shouldn't block the entire server. If people know how to
make our server call Facebook, they could play DoS attack easily.
And actually, our websites are heavily relying on Facebook...

Running several processes reduces the problem, but not solves it,
nor can Thin or Unicorn really help here. Using workers to call the
API might be another way to reduce the problem, but it's not so
natural, the work flow and process are entirely different, and that's
also not really fixing it, but moving the problem. Hey, now our workers
block... our workers are suffering from DoS attacks too. Running
more workers? Buying more machines? Renting more servers?

The problem of asynchronous I/O


ok so, asynchronous I/O is the way to solve it, just like NodeJS. Is it?
Well, sort of. The idea is very simple. Instead of blocking the rest
of the program, we tell the HTTP client (or other client), whenever
the response is back from the upstream, call me! This is so-called
Hollywood principle:



This way, while the server is waiting for the response, it would also
try to serve another request. After the response is back, then the
server would call the original requester's callback it gave last time.
And then, we'll finish the original request in that callback.

Sounds good. But really, this would be getting very complex soon.
Consider we'll need to make 10 requests in a row, and the most
important thing is, we can't make those 10 requests in the same
time, because the next one is depending on the previous one's
response. So we must make the requests sequentially, no matter how.

So this is the problem. It's hard to program like that. Previously,
we program like this:
r0 = get
r1 = get r0
r2 = get r1
# ...
Now we program like this:
get{ |r0|
  get(r0){ |r1|
    get(r1){ |r2|
      # ...
    }
  }
}
Looks like monad without syntactic sugar (do notation), hmm?
Anyway, forget about monad, we can't use monad in Ruby...

Not a problem? Well, I don't know how to show this more clearly,
because I decided to retreat soon after I can smile it...

What I can only say is that it would get nasty and nasty very soon,
we'll need to create a lot of closures in order to keep the values,
or, we'll need to pass a lot of values along the way, way, way to the
deep, deep, end.



So the rescue.

Fibers, or, Coroutines


I am not going to explain all the details here. That would be worth
another post. But here's the concept: fibers are manually scheduled
threads. They both have their own call stacks, and require context switch.

Let's look at the code first. Remember what we're doing for
asynchronous requests. We're going to add fibers there:
f = Fiber.current
get    { |r0| f.resume(r0) }
r0 = Fiber.yield
get(r0){ |r1| f.resume(r1) }
r1 = Fiber.yield
get(r1){ |r2| f.resume(r2) }
r2 = Fiber.yield
...
ok, I guess that doesn't really look like "adding" fibers, but that's what a
fiber-aware asynchronous program would look like under the hook.
Yes, under the hook. This can be easily wrapped as:
r0 = get
r1 = get r0
r2 = get r1
# ...
Yes, exactly the same as a synchronous program. We just need to define
get around asynchronous get:
def get arg
  f = Fiber.current
  asynchrony_get(arg){ |response| f.resume(response) }
  Fiber.yield
end
That's everything to make the trick, making asynchronous codes look like
exactly the same as synchronous code, with *little effort*.

So how does that work? First we remember the current fiber, which we
can think of that as the current context. Then we make the asynchronous
call, and then the program would go through it since it would only call
the callback block after the response is back. Actually the program is
not yet really making the request, it's just like putting that request in a
queue which later the reactor (or event loop) would look into it.

Here comes the first dragon. Fiber.yield. This tells the root fiber,
or the main fiber, previously blocked, continue to work. That is, the reactor,
or the event loop, keep working to serve other requests.

Here, the Fiber.yield did block here, but it only blocks the context of
that fiber. The root fiber, or other fibers could then take the control of
the program (CPU), doing their rest of works. Note that only *one* fiber can work
in the same time. That is, we're controlling the scheduling between all those fibers.

Now, suppose the request is done by the reactor, then the reactor calls the
callback we gave above: { |response| f.resume(response) }

Here comes the second dragon. Resume! The reactor, or the root fiber, is telling that
fiber to resume to work, taking away the response. Then, the program jumps back to
Fiber.yield in the third line of that get method definition, returning the response.

Wait, you ask, what about the root fiber?

It's suspended, just like Fiber.yield would suspend the current fiber. Calling any
fiber other than the root fiber to resume is also meaning to suspend the root fiber.
They are sort of dual. One suspends, another resumes. One resumes, another suspends.
That's why they are called coroutines -- cooperative routines.

The root fiber would be resumed whenever the resumed fiber calls Fiber.yield.

You might ask, why is it Fiber.yield, but not f.yield as in f.resume?
Good question, I guess this is because that you can only yield from current fiber,
but not any other fiber. It's meaningless to tell a suspended fiber to suspend. It's
already suspended. So Fiber.yield should be the same as Fiber.current.yield,
and that would save us some typing and the errors from
already_suspended_fiber.yield. So why not?

Fiber-aware asynchrony rest-core?


Finally we can talk about rest-core. As you can see, fiber-aware asynchronous I/O
is based on asynchronous I/O. It doesn't mean callbacks are wrong, it's just too
hard to work with, and we need to wrap around them.

That means, I still need to provide asynchronous I/O, that is, callbacks style in
rest-core to begin with. And actually the process of providing this facility is
a bit painful...

If you know the internal of rest-core, it's a kind of Russian doll, the most inner
doll is the real HTTP client, and which is wrapped with a middleware, and which
is wrapped with a middleware, and which is wrapped with a middleware, ...
so on so forth.

So how do we pass the callback to the HTTP client? Well...



We need to pass the callback all along the way to the most inner HTTP client...
Some middlewares would also need to hook its callback into this callback,
because they need to post-process the response. In the end, the real callback
passed to the HTTP client is a huge nested callback.

The good news is... I'd handled all that for you ...If you're not writing your own
middleware which needs to post-process the response.

So, actually this was done long ago, long ago. I just didn't get enough motivation
to finish it, and I should have written more tests for it. But to keep it in agile, I decided
to try it first before properly testing it! A bit ironic, hmm? I thought agile development
should be test-driven? But I don't really want to write detailed tests if I don't even
know if this is going to run..... maybe I should call it prototype driven development...

Anyway. Before I jumped into all the details, I also tried another experiment in my
sandbox on github, trying that which concurrent model is most suitable for me in
Rainbows? Is it FiberSpawn? Or CoolioFiberSpawn (this one only exists in the code)?
Or EventMachine? The experiment code is here: config.ru

As you can see, CoolioFiberSpawn works the best, and EventMachine is tricky,
I need to use async-rack and rack-fiber_pool altogether! Otherwise, it would not
work correctly. I didn't really know why at the time I was doing this experiment,
but now I knew. It's just the wrong approach. Fibers should be wrapping the
client, not the app. Also, throw :async is totally a wrong approach.

Anyway, CoolioFiberSpawn looks perfect. So I started to work on that instead of
EventMachine. But then I realized that it's not a finished product... SSL supports
was missing, that means I can't make HTTPS requests in cool.io, this is not acceptable...

Thanks to Tony's help, I successfully wrote a very simple SSL support for
cool.io, using similar approach in rev, realizing cool.io is very similar to rev.
After this was done, I put it in my HTTP client for cool.io, cool.io-http. It's easier
to use compared to the one in cool.io.

At this point, all components I needed were prepared.

Then there's a long pause, a very long pause, maybe months,
missing enough motivation to keep going. Tony is not very
interested in cool.io anymore. Yes, I can wait for celluloid-io,
which should be the successor of cool.io, but does Rainbows
support it? Can I fight alone, or should I just wait?

Finish the missing piece


Frankly, cool.io is fun to work with. Although it missed a lot of features and crafts,
it's really much better than working with eventmachine. Working with eventmachine
is sometimes painful. The API is not intuitive as most of the good ruby libraries.
It's a bit twisted. No, I am not talking about twisted, I never used it before, I am
just saying eventmachine's API is hard to use, and cool.io is much more better.

We don't have to use EM.run to wrap the entire program. We just add watchers
to the loop, and at the end tell the loop start running. It's much more easier to work with.

So of course I tried cool.io first, with my cool.io-http which supports SSL.

Well, it works perfectly, in trivial tests. Since all the hard works were already done
before, I only spent a few hours if not minutes finishing all the "apps" in rest-core,
including RestCore::Coolio, RestCore::CoolioFiber, also RestCore::EmHttpRequest,
and RestCore::EmHttpRequestFiber, yesterday.

Again, there's an example showing how to use them on async.rb, contents:
require 'rest-more'

# RC::Builder.default_app = # for global default_app setting

puts "RC::Coolio"
RC::Facebook.builder.default_app = RC::Coolio
RC::Facebook.new.get('4'){ |r| p r }
Coolio::Loop.default.run
puts



puts "RC::CoolioFiber"
RC::Facebook.builder.default_app = RC::CoolioFiber
Fiber.new{ p RC::Facebook.new.get('4'); puts "DONE" }.resume
Coolio::Loop.default.run
puts



puts "RC::EmHttpRequest"
RC::Facebook.builder.default_app = RC::EmHttpRequest
EM.run{ RC::Facebook.new.get('4'){ |r| p r; EM.stop } }
puts



puts "RC::EmHttpRequestFiber"
RC::Facebook.builder.default_app = RC::EmHttpRequestFiber
EM.run{ Fiber.new{ p RC::Facebook.new.get('4'); puts "DONE"; EM.stop }.resume}
And today I was excited and tried to use this async rest-core on our websites.
The result is... some part is frustrating, but some part is a total success.
cool.io-http is frustrating, while em-http-request is a total success, if we don't
count the pain using that in the first place.

cool.io-http did work, but not all the time. There were a lot of small issues. Sometimes
the response is nil, and I have no idea why. Sometimes there's HTTP parse error.... wtf?
I don't remember all of them, it's just very weird, and I need to handle a lot of corner
cases, it seems. Because in all trivial tests, it is working fine and well. Well, I don't think
I am going to put much effort on cool.io-http considering it a dead end.

I was a bit frustrated after realizing this, until I figured out how to use eventmachine
with fibers for this. As you can see, there's no EventMachineFiberSpawn as
CoolioFiberSpawn in Rainbows. That's a very important reason why I tried cool.io
first. It's natively supported in Rainbows!

Yeah, I know there's NeverBlock, which is actually a kind of EventMachineFiberSpawn.
But hey, it's not maintained! And I don't even know which one is the official gem?
According to rubygems.org, this one is the one in the gem, but Rainbows points
this one from escape. One is not updated for 2 years, and another one is 3 years.
How could I trust them? Even rest-core has more downloads, more watchers....

But surprisingly, it works perfectly.

No errors, very fast comparing to cool.io. Everything looks good...
Except that I still cannot accept using an unmaintained project,
and actually this shouldn't be too hard to apply fibers...

So I looked into Rainbows again, trying to see how I could implement
that just like neverblock. And then I come up with RainbowsEventMachineFiberClient,
shown at the very beginning in rainbows.rb.

Rainbows and Unicorn are so well written. Eric is amazing.
I should really keep my patience reading through all the codes.

Everything works great, except the quirks I mentioned at the very beginning.
I guess that's not a very big deal, since it's already solved. I just didn't know
what's the reason behind them. I would guess it's a Rails' magic... considering
there did have a ton of magic wtf in Rails, and I am so used to it!

Lastly


I really need to go to sleep. Anyway, last words. Thank you all for the great libs
and patience reading my long whines. Hope this would be going to work out,
then I'll release a new gem soon.

And if someone wants to write tests for me :P

Cheers,

2 retries:

Unknown said...
This comment has been removed by the author.
Unknown said...

Awesome Lin! Thanks for working on this. Asycn features for Facebook are very important given how inconsistent their api has been.

Post a Comment

Note: Only a member of this blog may post a comment.



All texts are licensed under CC Attribution 3.0