Paul Anderson | 29 Sep 09:30
Picon
Picon

Useful stats from a CM system (John Rouillard)


On 28 Sep 2008, at 20:00, John Rouillard <rouilj@...> wrote:

>> I'd be interested in a very simple
>> classification of the reasons - eg. software upgrade, hardware
>> upgrade, configuration bug fix, etc. etc.
>
> Interesting idea. Do you have an exhaustive list of categories
> that you would like to see?

No - I had these "starters", but I thought we'd need to try it out  
for real
and add new ones when people came to make a change that didn't fit
any of the existing categories. I'd *really* like to see some figures on
this, but you need the buy-in from the people actually making the
changes to make this worthwhile ....

>  The
> amount of work the user has to do is reduced by 1/(multiplier) since
> the user touches one file rather than say the 10 files that really
> have to be updated.

Yes. Again, its not just the amount of work though - its the improved  
chance
of getting it consistent and correct ....

>> (*) "How do you tell whether your config management system is making
>> things better?". It would be very interesting to do some kind of
>> formal study on this
>
> Formal huh, do I need to take my tux out of the closet for this?  If
> so that means I'll have to learn to tie that bow tie again >8-(.

No, no - we *are* sysadmins here ..... :-)

I think you'd need to try and do something which gathered comparable
results from at least two different organisations, using different  
tools.

>> - but I think it would be long-term project.
>
> I agree. I tried to get something started where I work by tagging
> tickets with FirstTimeFailure and UnintendedWork to indicate:
>
>   FirstTimeFailure - rework was required after the ticket was
>         supposedly done.
>
>   UnintendedWork - probably should have been named UnplannedWork to
>          more correctly identify with ITIL and the methodology in "The
>          Visible Ops Handbook". Basically this means that the ticket
>          was opened because on an unrecognized dependency of a planned
>          change, or due to an external unplanned change.
>
> However the tags were inconsistently applied partly due to incomplete
> definition of the purpose of the tags, and partly due to apathy. So
> nothing came of that attempt to track effectiveness of the CM system
> and our procedures.

Yes - this is very similar to what I wanted to do with the categories  
above ...

>> present, I think this is just empirical - everyone who has real
>> experience of a good tool in a large environment
>
> Well how large do you need? I see an advantage from 1 system up...

Yes - but the learning can be curve is steep and usually local. The
infrastructure also has a cost to set up. This means that small shops
rarely have the resources or the time to understand the problem.
In fact, as the installation grows, it is liable to go through several
painful step changes - automatic, prescriptive configuration, change
control, etc. Alva had a good talk on this somewhere ....

> Well Narayan's question begs the question "is the volume of changes a
> good thing?" Maybe we should be reducing the change volume rather than
> multiplying it because the CM system makes it so easy to do. </me
> takes of devil's advocates hat>.

Yes. Its very good to ask that question. Its nice to be agile, but it  
can get out
of control. We now have a very tight release mechanism. "Development"
machines still change almost continually, but once a week a set of  
changes
are frozen and run for a few days on a set of test machines before  
being released
into "production".

> So your dimensions include:
>    efficiency....
Does it save time/people?
>
>      correctness  (metrics: first time success/pass rate)
How often do things break because of bad configuration changes?
>      security (not sure what metric to use here as this is a subset
>                of correctness. If you have automated security
>                monitoring, TIGER, nmap scans etc, a rate of alerts
>                from those/file changed or file distributed might be
>                useful)
Yes - this is a consequence of correctness. But you can't really  
address it
by testing because the problem comes in translating "what you mean"
into "all the details that I put in the files". Working out whether  
there is
some backdoor route where person X can get access to function Y
on machine Z can be very difficult to do from the files. If your system
specifies this at a "higher-level", you can be more confident.
>     reliability (first time success/pass rate, reject rate of changes
>                from automated tests)
I meant reliability of the systems themselves - again a consequence of
"correctness"
>      uniformity (percentage of hosts where a file is under CM control)
I suppose this is another aspect of "correctness" - it means that a user
can sit down at a machine and get the same version of things as they
get at another machine which they should expect to be "the same".

> Are the others? (Gee why does this sounds the the D in DMAIC??)
I think there is a set of very important considerations - but they  
are related to
the above. For example, "usability" - if the configuration system is  
not easy
and clear, then people are likely to make mistakes using it, and the  
"correctness"
drops. Similarly, the ability to manage diversity .... there is quite  
a lot about this in
the SAGE booklet again (no, I don;t make any money out if it :-)

>> PS. Will you be at LISA? Would be nice to have a face-to-face
>> discussion ....
>
> Yup, I am going to the Practical CM workshop, so I should be in
> sometime on the Saturday prior to the conference and as they say:
> "I'll be here all week folks" 8-).

Ah good! See you there. Be good to have a chat - seems we have quite  
a lot
of agreement ....

   Paul

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Gmane