4 Aug 02:27
Useful stats from a CM system (part 1)
Hello all:
I am currently looking at trying to get some information out of the CM
system for the purpose of:
* verifying that the system is getting used
* improving the speed with which changes can be made
* determining/increasing the first pass success rate (i.e. the
first time you change the file is the only time you have to
change the file)
* looking for problems where training can be provided to reduce
error rates and eliminate having to redo work.
The system I am analyzing is DACS, it consists of four basic elements:
* inventory system that maps services and host characteristics
(e.g. ip address, ethernet address, has particular hardware etc.)
onto a host.
* a version control system based on subversion where (the inputs to)
everything that gets pushed is version controlled allowing
rollback and re-establishment of a prior state.
* a build system based on gnu make that can take version controlled
input files and transform them into files to be pushed to
machines.
* a push oriented distribution system based on rdist(1) that maps
files onto the data from the inventory system. So a host that has
the APACHE service running on it, is set up with the standard
httpd.conf files, has /etc/init.d/httpd linked into /etc/rc?.d
etc.
These are a few of the metrics I am considering looking at, and I was
wondering if anybody had any similar metrics they used, or other
metrics that they find useful in gaging how well the CM system works.
Since there are a few of them, I will stretch this email out into
multiple installments to keep each one relatively short.
I am also interested in the collected wisdom of how you evaluate and
monitor the CM systems at your place? What are your check steps in a
PDCA or DMAIC cycle to see how well things are working and get a
warning when things aren't working so well. I am somewhat blessed
where I am as I get a lot of complaints about DACS so I know of a
number of areas that need improvement. I suppose I could just go with
reducing the number of complaints, but I am sure that all of you have
experienced the people for whom no CM system is a good CM system
resulting in never stopping complaints. But that is better than when
the complaints just go underground undermining the CM environment
until it is too late to salvage anything.
Whew, if I ever get around to reconstructing my blog, I guess I will
have something to jabber about
. I will start this discussion with
some metrics from the version control system built into DACS.
In order to check into our version control system, you have to supply
a ticket number. So from the log of the version control system, I can
determine what changes were associated with what tickets in our (rt
based) ticketing system. Given this info, I can find:
1 the number of changes to an individual file for the same ticket
number. High number of changes can be an indication of:
1 staged/phased deployment where partial changes are made to the
file, tested and then later more changes to implement a final
state are done. This is fine and expected. Arguably each phase
could be split out into it's own ticket but I am ok with this.
2 reworking the file because of an error in editing the file
(e.g. getting the syntax wrong), not understanding how to
modify the file to implement the goals of the ticket.
Obviously the lower this number the more successful the system
is, and the less wasted effort is occurring. This could be any
of:
* lack of automation issue (forcing a manual change to a file
that would be better generated from a data file)
* lack of knowledge/familiarity with file or subsystem
* incorrect specifications from the ticket submitter
* other issues (always looking for examples)
2 The average number of files that must be changed for ticket to be
solved. Lower numbers are better, as there is less editing to do,
and less chance of fragmentation of data resulting in a
misconfiguration. High numbers indicate:
1 An opportunity to generate files with linked information
reducing the chance of errors and reducing duplicated
information as well as requiring less administrator time to
perform the operation.
2 A need to simplify the configuration to reduce the time spent
changing multiple files.
3. Total number of changes to a file, and the average number of
lines changed.
1,2 This provides opportunities much like #2 to simplify
configurations and generate changes rather than doing them
manually.
3 Also it identifies hot files that may cause delays when
multiple people are trying to work on them causing time to be
wasted.
The distribution system and the nightly compliance reports can also
produce more useful metrics, and there are some other metrics I would
like to see but I don't yet know a good way of producing them. However
those will be the subject of another couple of emails.
--
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111
RSS Feed