John Rouillard | 4 Aug 02:27
Favicon

Useful stats from a CM system (part 1)

Hello all:

I am currently looking at trying to get some information out of the CM
system for the purpose of:

   * verifying that the system is getting used

   * improving the speed with which changes can be made

   * determining/increasing the first pass success rate (i.e. the
     first time you change the file is the only time you have to
     change the file)

   * looking for problems where training can be provided to reduce
     error rates and eliminate having to redo work.

The system I am analyzing is DACS, it consists of four basic elements:

   * inventory system that maps services and host characteristics
     (e.g. ip address, ethernet address, has particular hardware etc.)
     onto a host.

   * a version control system based on subversion where (the inputs to)
     everything that gets pushed is version controlled allowing
     rollback and re-establishment of a prior state.

   * a build system based on gnu make that can take version controlled
     input files and transform them into files to be pushed to
     machines.

   * a push oriented distribution system based on rdist(1) that maps
     files onto the data from the inventory system. So a host that has
     the APACHE service running on it, is set up with the standard
     httpd.conf files, has /etc/init.d/httpd linked into /etc/rc?.d
     etc.

These are a few of the metrics I am considering looking at, and I was
wondering if anybody had any similar metrics they used, or other
metrics that they find useful in gaging how well the CM system works.
Since there are a few of them, I will stretch this email out into
multiple installments to keep each one relatively short.

I am also interested in the collected wisdom of how you evaluate and
monitor the CM systems at your place? What are your check steps in a
PDCA or DMAIC cycle to see how well things are working and get a
warning when things aren't working so well. I am somewhat blessed
where I am as I get a lot of complaints about DACS so I know of a
number of areas that need improvement. I suppose I could just go with
reducing the number of complaints, but I am sure that all of you have
experienced the people for whom no CM system is a good CM system
resulting in never stopping complaints. But that is better than when
the complaints just go underground undermining the CM environment
until it is too late to salvage anything.

Whew, if I ever get around to reconstructing my blog, I guess I will
have something to jabber about 8-). I will start this discussion with
some metrics from the version control system built into DACS.

In order to check into our version control system, you have to supply
a ticket number. So from the log of the version control system, I can
determine what changes were associated with what tickets in our (rt
based) ticketing system. Given this info, I can find:

   1 the number of changes to an individual file for the same ticket
     number. High number of changes can be an indication of:

      1 staged/phased deployment where partial changes are made to the
        file, tested and then later more changes to implement a final
        state are done. This is fine and expected. Arguably each phase
        could be split out into it's own ticket but I am ok with this.

      2 reworking the file because of an error in editing the file
        (e.g. getting the syntax wrong), not understanding how to
        modify the file to implement the goals of the ticket.
        Obviously the lower this number the more successful the system
        is, and the less wasted effort is occurring. This could be any
        of:

         * lack of automation issue (forcing a manual change to a file
           that would be better generated from a data file)
         * lack of knowledge/familiarity with file or subsystem
         * incorrect specifications from the ticket submitter
         * other issues (always looking for examples)

   2 The average number of files that must be changed for ticket to be
     solved. Lower numbers are better, as there is less editing to do,
     and less chance of fragmentation of data resulting in a
     misconfiguration. High numbers indicate:

      1 An opportunity to generate files with linked information
        reducing the chance of errors and reducing duplicated
        information as well as requiring less administrator time to
        perform the operation.

      2 A need to simplify the configuration to reduce the time spent
        changing multiple files.

   3. Total number of changes to a file, and the average number of
      lines changed.

      1,2 This provides opportunities much like #2 to simplify
          configurations and generate changes rather than doing them
          manually.

      3 Also it identifies hot files that may cause delays when
        multiple people are trying to work on them causing time to be
        wasted.

The distribution system and the nightly compliance reports can also
produce more useful metrics, and there are some other metrics I would
like to see but I don't yet know a good way of producing them. However
those will be the subject of another couple of emails.

--

-- 
				-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

Gmane