John Rouillard | 23 Sep 01:30
Favicon

Useful stats from a CM system (part 3) reports

Hi all:

This is my third and last musing on CM stats using the DACS CM tool.

In my first musings I talked about using a version control system to
determine how efficient your CM system is, how may files need to be
changed per ticket, how many time a file has to be changed for a
ticket to get it right etc. In the second musings I discussed how to
determine if the CM system is being used or subverted by looking at
the number of files changed in the wild by the CM system. (This can be
very different from the number of files changed in the version control
system if you generate files.)

(As a funny aside I just calculated those metrics for where I
work. Usually we manage 5300 changes/month from November-2007 through
July-2008. In August we had fewer than 2500 changes with no fewer
tickets completed/updated. Sigh, anyway.)

In this musing I look at the daily report that the CM systems
generate. I equate configuration anomalies with bugs in software. It
is also a measure of how successful the systems are at getting to a
standard foundation on which you can build an efficient environment.

The metrics I am looking at gathering here are:

   1 Number of files not in compliance and not flagged as testing/work
     in progress. Our oncall person is responsible for dealing with
     these reports. There is usually one of three types:

        1 True anomaly - resolve by pushing file from DACS

        2 Work in progress - ignore the anomaly, it will be fixed when
          the associated ticket is done.

        3 Useful anomaly - investigate cause and change DACS to
          incorporate the change, then push file from DACS.

     I think each type needs to be measured separately with a goal of
     minimizing 1 and 3. Type 3 anomalies should have a ticket created
     so they become a type 2 anomaly. I see these metrics as being
     useful to determine compliance of the systems and the robustness
     of the procedures. I claim similar things can be gained from the
     size of the daily report although some changes produce more
     output than others (e.g. firewall verifications that fail produce
     a diff(1) between what is currently running and what would be
     pushed by DACS.)

   2 Number of auto-pushed files. This metric is a weird one that is
     unique to push systems, and may be unique to DACS as well. We
     have some files (/etc/ssh/ssh_known_hosts) for example that
     change when a host name changes as all aliases for host are
     listed in that file, so adding a new name for the host adds an
     entry to the file. Very often we forget to push this file to all
     the hosts. However it is almost always the "right thing to do"
     (rttd). Is a count of files/targets that can be automatically
     pushed and which would otherwise fall into the type 1 anomalies
     above a useful metric? Or is it a bad thing that indicates poor
     processes or insufficient automation/notification that the user
     that this is a pending request that they should do. In a pull
     system I assume the file would just be retrieved and installed
     unless you were using a version controlled staging mechanism like
     bcfg2.

   3 We have some changes that are business as usual (BAU). An example
     of this is the passing of the oncall mantle from one person to
     another. This is a weekly process and we don't open ticket for
     it. However every check in to DACS requires a ticket id. So we
     use ticket ID 0000. Is the number of changes assigned to ID 0000
     meaningful for non BAU changes (e.g. simple documentation
     change)?  Does it indicate that the ticketing system is too
     burdensome to use?  Would it be a useful idea to email all ticket
     0000 logs to all the admins to notify them of the change rather
     than opening a ticket that is our usual notification mechanism?

     Does anybody else have a coupling between changes in the CM
     system and the tickets that drove those changes? How do you
     handle changes that are highly successful, have a limited failure
     mode, and are done repeatedly?

Also I asked a couple of people in personal email who are on the list
if

   the talk of metrics was useless, interesting or theoretical and too far
   away from anything useful

I got most responses of on the theoretical side. So my question is how
do you tell when if your CM system is actually making things better
rather than wasting time and making all changes into a burdensome
hurdle?

Also another couple of questions are:

  is there a set of metrics that is applicable to all tools, or does
  each tool have to have it's own metrics? If so does that imply
  trying to improve CM tool acceptance is going to be harder since
  there is no way to calculate a bang for the buck?

  do the metrics depend on how far along the deployment curve
  you are?

    e.g. at the initial stages (assuming you can get signoff on
       deploying a tool):

         "ease of integration with other tools"
         "adaptability of tool to different work models"

      ...

    then later:

         "person-hours of work avoided by deploying consistent
          working files"

Basically where is the business case for deploying CM tools, or are we
all just deluding ourselves by making busywork with no real
improvement.

Quips, comments, evasions, questions, answers, or just a general "shut
up and go away John" welcome.

--

-- 
				-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

Gmane