13 Dec 07:02
Re: Laundry list for NGC (long post) -- How does Koha measure up?
Joshua Ferraro <jmf <at> liblime.com>
2006-12-13 06:02:00 GMT
2006-12-13 06:02:00 GMT
On Tue, Dec 12, 2006 at 04:13:47PM -0800, Karen Coyle wrote: > This is one of those areas where systems designers have been locking > horns with the cataloging rules for quite a while. It even has its own > name: the multiple versions problem, or "mulver." No kidding ... I can't tell you how many hours I've spent mulling this issue over -- it's a serious problem! ... clever name btw> I would love to see > the next generation of rules fix this... The current rules require that > each Manifestation (in FRBR-speak) have its own bibliographic record. In > the case of copies (a microfilm copy of a journal), the records for the > original and the copy are virtually identical because they must both > describe the original item. In the case of items that were issued in > multiple formats, each format gets its own cataloging. Today, the > difference between "different formats" and "copies" is blurred: is a > case of a document in Word that is also saved as PDF a copy, or a > different format? What if you can't tell which is the "original"? > Anyway, what many libraries would like to see (and some are doing > already in a kludge) is using the MARC Holdings record or their library > system's item record to record the data, much like it appears Koha does. > But those libraries cannot share that data in that format, because it > violates the MARC standard and the cataloging rules. It also doesn't > provide them with the fields they need to provide all of the > format-specific data (i.e. the print book is 300 pages long and the > audio book is 6 CDs and lasts 8 hours, and is read by Mr. T.) With the > big digitization projects going on, this means that every time a book is > digitized, a new record will be added to the library catalog. Sheeeesh! > And not good library service. So there's actually a rather elegant way to handle this issue without changing any of the current standards (though by all means, lets push for change). The method I've concluded is best in the meantime is to abstract groups of MARC records into a 'MetaRecord', that contains all of the data of each of the records in that group. The group retains all the characteristics of each of the individual records, but has the added feature of establishing a relationship between the records in the group. The Koha community has been doing some experiments with this idea for a few months with our shiny new Zebra integration, and it's definitely the direction we're taking for the next generation search engine. In tech-speak, the MetaRecord will be an XML schema that is a conglomeration of the MARCXML definition. Zebra makes the process of defining a new set of indexes a pretty trivial exercise, so once the XML is properly defined, it's pretty simple to set up a parallel index and run the regular MARC database side by side with the MetaRecord one. At that point, the MetaRecord database is really just another Z39.50 target with some funky XML format types that break down nicely into MARC records, which our tools can already parse and display -- so building the UI is a fairly trivial exercise as well. The only remaining piece is how to establish the relationships in the first place. There's been a fair amount of work on this already in FRBR projects; xisbn is a possible service-based approach to solving the problem in real time for certain content types. There's also some fairly common sense aproaches that will work in most cases, such as simple field comparisons with existing records in the database, etc. In this way, you don't need to break MARC at all, just fit it inside a more general container for the initial search and retrieve operation. Once the user finds what they're looking for, just present them with various ways to further refine their interest, narrow down to material type, language, or whatever ... > My fear is that the next set of rules will not address this issue, but > systems designers will be expected to magically make the data look more > like what the user wants. I share your fear, but as a systems designer, I'm also looking for practical ways we can utilize the existing data to its fullest -- a lot of librarian-hours goes into creating the rich semantic data that comprises a typical MARC record, and it's a real shame to lose the ability to utilize it ... Cheers, -- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf <at> liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
> I would love to see
> the next generation of rules fix this... The current rules require that
> each Manifestation (in FRBR-speak) have its own bibliographic record. In
> the case of copies (a microfilm copy of a journal), the records for the
> original and the copy are virtually identical because they must both
> describe the original item. In the case of items that were issued in
> multiple formats, each format gets its own cataloging. Today, the
> difference between "different formats" and "copies" is blurred: is a
> case of a document in Word that is also saved as PDF a copy, or a
> different format? What if you can't tell which is the "original"?
> Anyway, what many libraries would like to see (and some are doing
> already in a kludge) is using the MARC Holdings record or their library
> system's item record to record the data, much like it appears Koha does.
> But those libraries cannot share that data in that format, because it
> violates the MARC standard and the cataloging rules. It also doesn't
> provide them with the fields they need to provide all of the
> format-specific data (i.e. the print book is 300 pages long and the
> audio book is 6 CDs and lasts 8 hours, and is read by Mr. T.) With the
> big digitization projects going on, this means that every time a book is
> digitized, a new record will be added to the library catalog. Sheeeesh!
> And not good library service.
So there's actually a rather elegant way to handle this issue without
changing any of the current standards (though by all means, lets push
for change). The method I've concluded is best in the meantime is to
abstract groups of MARC records into a 'MetaRecord', that contains all
of the data of each of the records in that group. The group retains all
the characteristics of each of the individual records, but has the added
feature of establishing a relationship between the records in the group.
The Koha community has been doing some experiments with this idea for a
few months with our shiny new Zebra integration, and it's definitely the
direction we're taking for the next generation search engine.
In tech-speak, the MetaRecord will be an XML schema that is a
conglomeration of the MARCXML definition. Zebra makes the process of
defining a new set of indexes a pretty trivial exercise, so once the XML
is properly defined, it's pretty simple to set up a parallel index
and run the regular MARC database side by side with the MetaRecord one.
At that point, the MetaRecord database is really just another Z39.50
target with some funky XML format types that break down nicely into MARC
records, which our tools can already parse and display -- so building the
UI is a fairly trivial exercise as well.
The only remaining piece is how to establish the relationships in the
first place. There's been a fair amount of work on this already in FRBR
projects; xisbn is a possible service-based approach to solving the
problem in real time for certain content types. There's also some fairly
common sense aproaches that will work in most cases, such as simple
field comparisons with existing records in the database, etc.
In this way, you don't need to break MARC at all, just fit it inside a
more general container for the initial search and retrieve operation.
Once the user finds what they're looking for, just present them with
various ways to further refine their interest, narrow down to material
type, language, or whatever ...
> My fear is that the next set of rules will not address this issue, but
> systems designers will be expected to magically make the data look more
> like what the user wants.
I share your fear, but as a systems designer, I'm also looking for
practical ways we can utilize the existing data to its fullest -- a lot
of librarian-hours goes into creating the rich semantic data that
comprises a typical MARC record, and it's a real shame to lose the
ability to utilize it ...
Cheers,
--
Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE
President, Technology migration, training, maintenance, support
LibLime Featuring Koha Open-Source ILS
jmf <at> liblime.com |Full Demos at
RSS Feed