Features Download
From: Alan G Isaac <aisaac <at> american.edu>
Subject: citation enhancement
Newsgroups: gmane.text.docutils.user
Date: Monday 17th July 2006 20:15:04 UTC (over 10 years ago)
I do not claim the attached document is complete,
far from it, but I think it has tentative answers
to most of David's questions.

Alan Isaac

Proposal for Citation Handling

.. note:: This document includes Proposals_, an example Workflow_, and a
Terminology_ appendix.


Documents presented in different contexts
(websites, working papers, journals with different formatting requirements)
are required to format the citation references in different
ways.  It is desirable to maintain one version of the
document where the citations and citation labels (as
substituted) can readily be reformatted as needed.

.. note::
   One important thing for preprocessing is that the document can be
   for citation references.  The current citations reference syntax allows
   to be done easily.  For example, Bibstuff_ can already do this.

The object: all of the content in a References
(or Bibliography) section of an reST document is to be
generated by a preprocessor.  This includes both the
(proposed) citation-label-replacement text and the (already
supported) citation text. Citation-label replacement is to
be handled by reST, not by a preprocessor.  (See the
Comments below.)


Citation Handling Goals for reST include:

1. not be LaTeX specific.
2. retain reST's current and future handling of citations in
   this and other, possibly unforeseeable, ways.
3. allow easy interaction with a bibliographic database
4. all citation references should "look like"
   reST style citation references
5. allow continuous working on the original document
   without continually regenerating the citations
   (unless of course new citation references have been added).
   For elaboration, see the Workflow_ description below.
6. allow hyperlinks from citation labels (**after** substitution)
   to citations, and otherwise retain reST's current
   and future handling of citations
7. allow easy change of formatting the citations
8. allow easy change of formatting both of the
   citation's display label and of the
   citation reference's display label,
   possibly independently.

.. note::
   Many of the following goals are independent of the proposals under
   consideration.  The last goal is directly linked.

.. note::
   Goal 3 is accepted as desirable but in the context of this document may
   some extent conflict with another goal: that substitutions look like
   substitutions.  The perspective that citations that induce formatted
   substitutions are still *fundamentally* citations may be fundamental to
   sympathy for the current proposal.  David Goodger is *not* currently
   sympathetic to this perspective.
.. note::
   Unless there is a fixed syntax for citation references, authors that
come to
   citation reference preprocessing late in the game will face problems. 
   example, when used as a preprocessor for reST documents, Bibstuff_
   looks for citation references, based on the extant syntax.  If the
   syntax for substitution references is used, a preprocessor must be able
   somehow distinguish those substitution references that are citations
   other substitution references.  On the face of it, this will be
   as a general proposal unless reST imposes some syntactical restriction
that effectively
   distinguishes the two.  Relying on the existing citation reference
   currently appears the most graceful and natural approach.  David Goodger
   has proposed another pretty nice syntax (see below), which however would
   be enforced by authors and not by reST.


Proposal A:
Implement the `Currently Feasible Approach`_.

.. note::
   David Goodger appears to like this proposal,
   and appeared to imply it is currently feasible,
   but I have not yet gotten it to work.

Proposal B:
Implement a `citation` directive that supports substitution.

.. note::
   David Goodger may be willing to consider this proposal,
   but does not currently prefer it.

Proposal C:
reST will directly support substitution for citation reference labels
so that docutils parser can replace the citation label
with citation-label-substitution text that is specified in the citation.

.. note::
   Proposal C is my favorite.
   David Goodger has indicated that Proposal B is likely to be *rejected*.

Currently Feasible Approach

Let us cite |[Doe2006b]_|,
using citation reference syntax *within* a substitution reference.
(This is just a convention; it is not enforced by reST.)

Now we provide the substitution definition,
which looks like this::

        .. |[Doe2006b]_| replace:: `Doe (2006b)`_

which will link via a citation reference like this::

        .. _Doe (2006b): citationlabel
And finally we provide the citation,
which looks like this::

        .. [citationlabel]
           Doe, John, 2006, Some Useful Article

.. note::
   This approach becomes even more attractive
   when combined with David Goodger's expressed
   intent to allow phrase reference names as
   citation reference labels.
.. note::
   There have been many requests for parameterized citation references.
   The present document does not address this,
   but it shd be noted that the `Currently Feasible Approach`_
   could readily be extended to allow such parameterization as
   follows: we would need to agree on a rigid syntax for the
   substitution references used for citation, and the preprocessor
   (e.g., Bibstuff) would have to be modified to recognize this syntax.


Here is an example workflow.
The illustrative preprocessor is Bibstuff_, [1]_
but that is not essential to the example.

1. Work on document draft, example.txt,
   which contains
        - citation references whose citation labels are citation keys in a
          bibliographic database, example.bib
        - an 'include' directive to include example.cites,
          which will be generated at the next step

2. Process example.txt with a preprocessor to produce
   example.cites based on example.bib
3. Process example.txt (and thus the included example.cites)
   with an docutils writer compoenent to produce output in the desired
   (HTML, LaTeX, text, etc)
4. If document need further work, return to step 1.

.. note::
   naturally, if the author desires, when the document example.txt is
   static, the citations in example.cites can be included physically rather
   than with an 'include' directive.  


For illustration,
let us cite two books: [goossens_1994]_ and [lutz_2003]_.
Here the citation labels are the keys in a bibliographic database.
Such citation references are easily recognized as such by
preprocessors: for example, Bibstuff_ already can compile a list of
citation references from an reST document.

[goossens_1994]_ is a handy guide to LaTeX.
[lutz_2003]_ is an introduction to the programming language Python.
Since the citation labels for these two books are keys in a citation
database, a preprocessor can be used to generate both the citation
and the citation-label-substitution text.

Here is what the database looks like::

      author = {Goosens, Michel and Frank Mittelbach
      and Alexander Samarin},
      year = 1994,
      title = {The LaTeX Companion},
      edition = {2nd},
      publisher = {Addison-Wesley},
      address = {Boston, MA},
      isbn = {0-201-54199-8}
      author = {Lutz, Mark and David Ascher},
      year = 2003,
      title = {Learning Python},
      publisher = {Oreilly},
      address = {Sebastopol, CA},
      isbn = {1-56592-464-9}

Here is what Bibstuff's addrefs.py currently produces when
run on this document::

        Goosens, Michel, Mittelbach, Frank and Samarin,
        Alexander. 1994. The LaTeX Companion.

        Lutz, Mark, and Ascher, David. 2003. Learning Python.

This is what would be used as the citation definition.
Note that the particular formatting of the citation definition
(name handling, order of parts, etc) is a style decision and
should be customizable by the author.  The addrefs.py module
in Bibstuff_ can be fairly easily hacked, so this would be the
initial approach to customization.  Looking forward, easy
specification of author-preferred styles is desirable.

Note that there is no obvious way to control where these
citations should be physically placed.  The current practice
of addrefs.py is to simply append them to the file.
A better (and easily implemented, even for me) approach is
to write them to a new file for inclusion with reST's
`include` directive.  (See the Comments below.)

.. note::
   David Goodger finds this "very reasonable".

At one point David Goodger raised the possibility of a
citation directive.  This would allow citations with
label replacement using a syntax like::

    .. citation:: goossens_1994
       :replace: Goossens (1994)

       Goosens, Michel and Frank Mittelbach and Alexander Samarin,
       *The LaTeX Companion*,
       Addison-Wesley, 1994

    .. citation:: lutz_2003
       :replace: Lutz (2003)

       Lutz, Mark and David Ascher, 2003,
       *Learning Python*,
       Oreilly, 2003

But David argues that just as easy and not requiring so much new
functionality would be the following::

        .. |[lutz_2003]| replace:: [`Lutz (2003)`]_
        .. _Lutz (2003): lutz_2003
        .. [lutz_2003] Lutz, Mark and David Ascher, 2003, *Learning
Python*, Oreilly, 2003

Since a preprocessor is required either way, the added complexity of
this approach shouldn't be a problem.::

.. note:: David comments as follows

        In your text, all you have to do is insert "|[lutz_2003]|" instead
        "[lutz_2003]_".  The first line above is a substitution definition,
        replacing the inline text with the final display citation label.  I
        included square brackets in the substitution reference ("|[...]|")
        simulate a regular citation reference, but arbitrary text could be
        used (e.g. "|lutz_2003|" or "|(lutz_2003)|"), whatever works well
        Bibstuff and the author's conventions.  The second line links "Lutz
        (2003)" to "lutz_2003", the citation key.  The last 4 lines are the
        citation itself.

But from the above, it seems that you want both citation reference and
citation definition to have the same display label.  In that case, the
above can be simplified to::

    .. |[lutz_2003]| replace:: [`Lutz (2003)`]_
    .. [Lutz (2003)]
       Lutz, Mark and David Ascher, 2003,
       *Learning Python*,
       Oreilly, 2003

Preprocessor Details

Given the ready existence of a preprocessor (Bibstuff_),
this proposal assumes the database is in .bib format.
(While Bibstuff_ remains crude, it is easily modifiable,
often even by relative programming novices.)

This does not make the proposal LaTeX specific:
this is just a common format for bibliographic databases.
Many databases export to this useful format.

This does not make the proposal Bibstuff specific:
this particular processor is just used for illustration.

Bibstuff recognizes citation
references (roughly) by looking for the
currently allowed syntax for citation
references: a reference name in brackets
followed by an underscore.
This is probably not hard to modify
to any regular expression,
but of course for preprocessing to work
there must be *some* regular expression
that can identify citation references.

Bibstuff assumes each citation reference name is 
a key in a bibliographic database.
The database is searched and the bibliographic
information is collected and reformatted.

Bibstuff currently appends the bibliography 
to the end of the input document, but it is
trivial to make it write the document to
a separate file that can be included (with
the `include` directive) by the reST document.

.. note::
   It has been proposed that phrase reference names be allowed as citation
   labels.  This would substantially increase the complexity of the allowed
   citation labels, and therefore it would require enhancing the
   ability to correctly extract the citation label.  The following has been
   offered as an intentionally complex example::

    [`this::could-be, "a" [citation?\`]_ reference!`]_

The existing preprocessor (Bibstuff_) will not currently handle arbitrarily
complex phrase reference names and would have to be extended to do so.
Note it may be that most of the motivation for allowing phrase reference
disappears if the proposal under consideration is implemented.


.. [Doe2006]
   This is the citation for the above citation reference.

.. [citationlabel]
   Doe, John, 2006, Some Useful Article

.. [goossens_1994]
   Goosens, Michel, Mittelbach, Frank and Samarin, Alexander. 1994.
   The LaTeX Companion.

.. _Lutz (2003): lutz_2003
.. [lutz_2003] Lutz, Mark and David Ascher, 2003, *Learning Python*,
Oreilly, 2003


.. [1]
   Some functionality will have to be added to Bibstuff,
   but the parsing capacity is already in place.


.. |[Doe2006b]_| replace:: `Doe (2006b)`_

.. |[lutz_2003]| replace:: [`Lutz (2003)`]_


.. _`Doe (2006b)`: citationlabel_

.. _Bibstuff: http://www.pricklysoft.org/software/bibstuff.html




A "citation reference" such as [Doe2006]_ occurs in text.
It is a reference to a citation, which looks like the following::

        .. [Doe2006]
           This is the citation for the above citation reference.

A "citation" has two parts:
a citation label,
which can be referenced by a citation reference,
and a "citation definition",
which contains the details of the work being cited.

A citation reference and a citation each have a "citation label",
which is a reference name that links them.
In the example above, the citation label is "Doe2006".

A "citation key" is a bibliographic database lookup key.

Proposed is "citation label substitution",
which will replace the citation label in the final formatted output,
resulting in a "display citation label".

Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
opinions on IT & business topics through brief surveys -- and earn cash
CD: 3ms