23 Sep 01:30
Useful stats from a CM system (part 3) reports
Hi all:
This is my third and last musing on CM stats using the DACS CM tool.
In my first musings I talked about using a version control system to
determine how efficient your CM system is, how may files need to be
changed per ticket, how many time a file has to be changed for a
ticket to get it right etc. In the second musings I discussed how to
determine if the CM system is being used or subverted by looking at
the number of files changed in the wild by the CM system. (This can be
very different from the number of files changed in the version control
system if you generate files.)
(As a funny aside I just calculated those metrics for where I
work. Usually we manage 5300 changes/month from November-2007 through
July-2008. In August we had fewer than 2500 changes with no fewer
tickets completed/updated. Sigh, anyway.)
In this musing I look at the daily report that the CM systems
generate. I equate configuration anomalies with bugs in software. It
is also a measure of how successful the systems are at getting to a
standard foundation on which you can build an efficient environment.
The metrics I am looking at gathering here are:
1 Number of files not in compliance and not flagged as testing/work
in progress. Our oncall person is responsible for dealing with
these reports. There is usually one of three types:
1 True anomaly - resolve by pushing file from DACS
2 Work in progress - ignore the anomaly, it will be fixed when
the associated ticket is done.
3 Useful anomaly - investigate cause and change DACS to
incorporate the change, then push file from DACS.
I think each type needs to be measured separately with a goal of
minimizing 1 and 3. Type 3 anomalies should have a ticket created
so they become a type 2 anomaly. I see these metrics as being
useful to determine compliance of the systems and the robustness
of the procedures. I claim similar things can be gained from the
size of the daily report although some changes produce more
output than others (e.g. firewall verifications that fail produce
a diff(1) between what is currently running and what would be
pushed by DACS.)
2 Number of auto-pushed files. This metric is a weird one that is
unique to push systems, and may be unique to DACS as well. We
have some files (/etc/ssh/ssh_known_hosts) for example that
change when a host name changes as all aliases for host are
listed in that file, so adding a new name for the host adds an
entry to the file. Very often we forget to push this file to all
the hosts. However it is almost always the "right thing to do"
(rttd). Is a count of files/targets that can be automatically
pushed and which would otherwise fall into the type 1 anomalies
above a useful metric? Or is it a bad thing that indicates poor
processes or insufficient automation/notification that the user
that this is a pending request that they should do. In a pull
system I assume the file would just be retrieved and installed
unless you were using a version controlled staging mechanism like
bcfg2.
3 We have some changes that are business as usual (BAU). An example
of this is the passing of the oncall mantle from one person to
another. This is a weekly process and we don't open ticket for
it. However every check in to DACS requires a ticket id. So we
use ticket ID 0000. Is the number of changes assigned to ID 0000
meaningful for non BAU changes (e.g. simple documentation
change)? Does it indicate that the ticketing system is too
burdensome to use? Would it be a useful idea to email all ticket
0000 logs to all the admins to notify them of the change rather
than opening a ticket that is our usual notification mechanism?
Does anybody else have a coupling between changes in the CM
system and the tickets that drove those changes? How do you
handle changes that are highly successful, have a limited failure
mode, and are done repeatedly?
Also I asked a couple of people in personal email who are on the list
if
the talk of metrics was useless, interesting or theoretical and too far
away from anything useful
I got most responses of on the theoretical side. So my question is how
do you tell when if your CM system is actually making things better
rather than wasting time and making all changes into a burdensome
hurdle?
Also another couple of questions are:
is there a set of metrics that is applicable to all tools, or does
each tool have to have it's own metrics? If so does that imply
trying to improve CM tool acceptance is going to be harder since
there is no way to calculate a bang for the buck?
do the metrics depend on how far along the deployment curve
you are?
e.g. at the initial stages (assuming you can get signoff on
deploying a tool):
"ease of integration with other tools"
"adaptability of tool to different work models"
...
then later:
"person-hours of work avoided by deploying consistent
working files"
Basically where is the business case for deploying CM tools, or are we
all just deluding ourselves by making busywork with no real
improvement.
Quips, comments, evasions, questions, answers, or just a general "shut
up and go away John" welcome.
--
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111
RSS Feed