3 Nov 2006 03:45
Re: Testing configurations
Narayan Desai <desai <at> mcs.anl.gov>
2006-11-03 02:45:25 GMT
2006-11-03 02:45:25 GMT
>>>>> "Andrew" == Andrew Hume <andrew <at> research.att.com> writes:
Andrew> narayan, paul, brandon i am puzzled by this discussion. the
Andrew> task narayan brings up, that of staging a new service, seems
Andrew> completely independent of the issues around version
Andrew> control. and conflating the two seems confusing.
Andrew> i think what narayan wants, and the task requires, is
Andrew> a way of referring to various cluster properties, namely a)
Andrew> there exists a functioning new ntp server b) all nodes use
Andrew> the new ntp server c) the node with the old ntp server has
Andrew> that old server deleted
I think that I see the source of some of this confusion. I have much
more modest goals. I am just trying to build reasonable configuration
state feedback into the system. All I really want to be able to probe
is if a set of clients have moved into the proper configuration
state.
While the high-level goals would be nicer, I tend to focus on
deployment mechanics because they are easier to work with IMO.
Andrew> then what narayan is saying is we implement a) by
Andrew> moving to version 301. when that is stable, we move to 302
Andrew> and wait until all nodes use the new server. then we can
Andrew> move to 303.
Andrew> all true and all, but hopeless because the thing you
Andrew> want is a), b) and c), and not 301-3. the steps are then
Andrew> A) pick a node, and give it the new ntp server; verify
Andrew> property a) is true. this might be rev 301. B) point all
Andrew> nodes to teh new server; verify property b) is true. this
Andrew> might be rev 304. C) delete the old ntp server; verify
Andrew> property c). this might be rev 312.
Andrew> now, everything is clearer. other things can happen in
Andrew> parallel and no one cares because we make the changes when
Andrew> it is safe to do so-- when the previous step has
Andrew> succeeded. now even brandon can make this automatic. (i
Andrew> realise that part of brandon's comment also refers to the
Andrew> fact that human approval steps often serve to act as a check
Andrew> that things external to the systems' state haven't arisen,
Andrew> but i regard this as a purely process thing; if people want
Andrew> to make it automatic, i want to support that.)
What you are suggesting is a different set of inputs to this same
process. You are right that my steps count on some implicit mapping of
goals to configuration specification, but I think this input is
critical to make things work right. Ideally, the associated triggers
are:
- (a) monitoring the new service starts to work
- (b) configuration monitoring shows the service consumption
configuration change completely deployed
I think that using higher-level constructs is the right thing to do in
some, but now all cases.
Andrew> i know i harp on this insessantly, but node/group
Andrew> properties are the ONLY things that matter in the long
Andrew> run. config changes, or revisions, are just means to an end.
Andrew> so issuing the change to point nodes to teh new server is
Andrew> merely interesting unless you combine it with a check that
Andrew> they are doing so.
Andrew> as for paul's comments, i have already spoken to most
Andrew> of them; you apply changes when you are ready to deploy
Andrew> them. for the person designing the changes, it means that
Andrew> you don't simply edit the configs, you write programs to
Andrew> make the (hopefully) simple changes. if this is impractical
Andrew> or too hard, then make them in real time. to do otherwise is
Andrew> to impose the complexity of your change management flow onto
Andrew> the version control system, which can barely cope with it
Andrew> (to say nothing of users).
Andrew> athough he did not say so explicitly, i took paul's
Andrew> comments to also cover the issue of feature interaction,
Andrew> which is much harder. an example might be that an urgent fix
Andrew> might require that the new ntp server has to be another node
Andrew> and that this occurs during step B). the best thing would
Andrew> likely be to point everyone at the old server, implement the
Andrew> new urgent thing, then start step B) over again. there
Andrew> might be some transient issues, but that's teh price you pay
Andrew> for urgent things.
I think that the feature interaction problem is actually a little
harder than this, at least with out implementation. Since we have
fine-grained enough statistics to isolate outcomes, we can simplify a
lot of this. I suspect that a more distributed scm system than
subversion would perform more naturally in this case.
Andrew> do i have this right? or is there some other issue
Andrew> underneath that justifies enmeshing version control with
Andrew> staging and verifying actions?
Well, the actual point that we make in the paper is that you need an
independent time variable in the configuration repository in order to
be able to directly represent change. using an SCM repository revision
was a convenient mechanism that also happened to be discrete. It was
mainly convenient. Another reason we went this way is because the use
of revision control is at least a familiar concept to users already;
a separate implementation of this functionality would be pretty
tough. It is also intertwined with auditing and understanding past
states, so having all of the scm tools is nice.
So I guess there are some factors that make it convenient, but that it
doesn't need to be revision control based. You could make an
implementation that uses a completely independent time variable, but it
would probably be more unwieldy...
-nld
RSS Feed