1 Apr 2004 19:32
Re: Messages sent through heartbeat are not received.
Steve Dobbelstein <steved <at> us.ibm.com>
2004-04-01 17:32:04 GMT
2004-04-01 17:32:04 GMT
Alan Robertson wrote:
> Horms wrote:
> > On Wed, Mar 31, 2004 at 06:33:38PM -0600, Steve Dobbelstein wrote:
>
> >>I work on the Enterprise Volume Management System (EVMS) which has a
> >>plug-in to support clustering features under Linux-HA. One of our test
> >>clusters is running fine with heartbeat 1.1.3. I figured we should
move up
> >>to the latest heartbeat, so I install 1.2.0 on another test cluster.
> >>
> >>The EVMS HA plug-ins fail to start up on 1.2.0. I installed the latest
CVS
> >>code and am getting the same results.
> >>
> >>Debugging the problem further, I am finding that messages sent by the
> >>plug-in on one node through heartbeat are not being received on the
other
> >>node. The plug-in uses
> >>heartbeat_handle->llc_ops->sendnodemsg(heartbeat_handle, msg, node);
> >>(heartbeat_handle is what was returned from
ll_cluster_new("heartbeat");)
> >>That succeeds, but I don't see a callback for delivery of the message
on
> >>the other node.
> >>
> >>I realize this description is very sketchy. I'm not sure what kind of
> >>information one needs to debug this problems. I will be happy to
provide
> >>more information (configuration files, logs, test runs, etc.).
> >
> >
> > The plugin implementation changed significantly between 1.1.X and
1.2.X.
> > You will need to update your plugin accordingly.
>
> What it amounts to is that certain misuses of the interface were
"harmless"
> in 1.1.x, and became fatal in 1.2.x.
>
> In particular, this would work in 1.1.x:
>
> while (select() > 0) {
> read a message
> }
>
> but this won't in 1.2 because there is buffering in the messaging scheme,
> where there was none in the 1.1 version.
>
> The proper way to use the interface (before and now) is:
>
> while (select() > 0) {
> while (is_message_pending()) {
> read a message
> }
> }
>
> [Of course, this is an outline of the real code, but this should give you
> the right idea].
>
> This has been discussed extensively on the linux-ha-dev (development)
> mailing lists. If you are absolutely unable to tolerate input buffering,
> then there is a way to make the first input form work, by telling the IPC
> layer to not buffer input. But, this should be avoided if at all
possible
> - particularly if you're sending large quantities of data.
>
> If you are using the mainloop code, then there is a new call you'll want
to
> use to return the ipc channel. There is a new function called ipcchan()
> which returns the IPC channel. You can still get the file descriptor
like
> before, but if you're using mainloop input sources you'll want to switch
> from G_main_add_fd() to G_main_add_IPC_Channel(), and feed it the return
> from the ipcchan() function.
Thanks! That did the trick.
Now I have a question. Since EVMS can be run on a system with either
heartbeat 1.1.x or 1.2.x installed, what is the proper way to determine
which version of the hb_api.h is installed so that the code will know at
compile time whether ipcchan() is available? It would be nice to use
LLC_PROTOCOL_VERSION. However, looking at my machines I see that the 1.1.3
version of hb_api.h and the CVS version both have LLC_PROTOCOL_VERSION set
to 1, even though struct llc_ops is different between the two versions. :(
Any ideas on how I can tell at compile time whether ipcchan() is available
in hb_api..h? Thanks.
Steve D.
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
RSS Feed