16 Aug 16:36
Guidelines for documenting parameter types
From: Neil Crighton <neilcrighton <at> gmail.com>
Subject: Guidelines for documenting parameter types
Newsgroups: gmane.comp.python.scientific.devel
Date: 2008-08-16 14:38:54 GMT
Subject: Guidelines for documenting parameter types
Newsgroups: gmane.comp.python.scientific.devel
Date: 2008-08-16 14:38:54 GMT
A few of us participating in the doc marathon (http://sd-2116.dedibox.fr/pydocweb/wiki/Front%20Page/) have some questions about documenting parameter types, and I thought it would be good to get others' opinions. If we can agree on some guidelines, perhaps they could be incorporated into the docstring standard (http://projects.scipy.org/scipy/numpy/wiki/CodingStyleGuidelines#docstring-standard)? I don't mind what we end up deciding on, but I think it's a good idea to address these situations in the guidelines so new people know what to do, and can feel comfortable about cleaning up someone else's docstring to match the guidelines (if necessary). Maybe some of these are pedantic, but I think they'll help to give the docs a more unified feel and make sure it's always clear what parameter types are meant. (1) When we mention types in the parameters, we are mostly using the following abbreviations: integer : int float : float boolean : bool complex : complex list : list tuple : tuple i.e. the same as the python function names for each type. It would be nice to say in the guidelines that these should be followed where possible. (2) Often it's useful to state the type of an input or returned array. If we want to say the array returned by np.all is of type bool, what should we say? Possibilities used so far are int array array of int array of ints I prefer 'array of ints', because it is also suitable for tuples and lists ('tuple of ints', or 'list of dtypes'). 'int tuple' is just bad :) . (3) Many functions accept either sequences or scalars as input, and then return arrays if the input was a sequence, or an array scalar if the input was a scalar. For example: >>> a = np.sin(np.pi/2) >>> type(a) <type 'numpy.float64'> >>> a = np.sin([np.pi/2,-np.pi/2]) >>> type(a) <type 'numpy.ndarray'> There was some discussion about the best way to handle this: http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.arcsin/#discussion-sec http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.arctan/#discussion-sec http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.greater_equal/#discussion-sec Stefan proposed that for these functions we just refer to the input parameter type as array_like, and the return type as ndarray, since these are both described as including scalars in the glossary, http://sd-2116.dedibox.fr/pydocweb/doc/numpy.doc.reference.glossary/. I think this is a good rule. (Note that there is at least one proofed docstring that breaks this rule http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.greater/) (4) Sometimes we need to specify more than one kind of type. For example, the shape parameter of zeros can be either an int or a sequence of ints (but is not array_like, since it doesn't accepted nested sequences). How should we write this? Some possibilities are: int or sequence of ints {int, sequence of ints} I much prefer 'int or sequence of ints' as to me it's clearer and looks nicer. Also the curly brackets are used when a parameter can assume one of a set of fixed values (e.g. the kind keyword of argsort, which can be one of {'quicksort','mergesort','heapsort'}), so I think it is confusing to also use them in this case. (5) For keyword arguments, the default value is often None. In this case we've been omitting None from the parameter types. However, sometimes None is a valid input type but is not the default (e.g. axis keyword for argsort). In this case I think it's a good idea to include None as an explicit parameter. I've posted to both the scipy-dev and numpy lists - I wasn't sure which best for this. Neil
RSS Feed