3 Dec 2009 21:00
Re: ejabberd_router blocking
Badlop <badlop <at> gmail.com>
2009-12-03 20:00:35 GMT
2009-12-03 20:00:35 GMT
2009/12/1 Andy Skelton <skeltoac <at> gmail.com>:
I've investigated ejabberd code with Alexey Shchepin, see the summary:
ejabberd_router.erl is the main stanza router, and there are two
methods to send him a stanza:
A) call the function route/3
B) send an erlang message to the erlang process 'ejabberd_router'.
> When the memory problem occurs there is
> only one process that is eating RAM: ejabberd_router. It builds up a
> huge message queue which requires gigabytes of RAM.
> the
> filter_packet hook blocks ejabberd_router. If anything on that hook
> ever gets slow the entire router queue will wait. It must be one of my
> packet filters blocking the router and causing the pile-up.
As you noticed, method B blocks a unique process in all the ejabberd node.
That is a bad idea. Your initial solution was to paralellize method B.
Method A blocks the calling process, which may be a c2s associated to
a client session,
a s2s associated to a remote server... That is paralellized by design.
Consequently, using method A is preferable over B.
> Our throughput is almost all pubsub events.
Method A is commonly used, but there are still some instances of method B:
$ git grep "ejabberd_router:route(" | wc -l
235
$ git grep "ejabberd_router \! {route" | wc -l
19
There are 18 in mod_pubsub, and one in mod_vcard.
It's very easy to change those to method A, just replace
ejabberd_router ! {route, From, To, Stanza}
with:
ejabberd_router:route(From, To, Stanza)
I'll check with Christophe Romain about the mod_pubsub
if it's completely safe to make those changes.
Tracked in: https://support.process-one.net/browse/EJAB-1114
For testing, you can make those changes yourself, or wait for a patch.
---
Badlop
ProcessOne
RSS Feed