Brian Schweitzer | 1 May 2009 02:26
Picon

Re: RFC: PartNumberStyle rewrite (was "Foo, Parts 1-3" vs, "Foo, Parts 1 - 3")

>> For common practice in English, the argument for "Parts 1?3" was that it was
>> the most correct for English, to not have the spaces.  That was noted at
>> Wikipedia, however, within a section which also noted that the most correct,
>> for English, is that an en-dash be used.
>>

That's nice and all, but MBz is a collection of data, not a written
work.  Therefore, typography rules do not need to apply.  You may prefer
them to, but that is a preference not a requirement.

Well, the *only* reason given, other than "I like it better", for using "1-3" instead of "1 - 3" was that it is more typographically correct.  I fail to see the real logic in arguing typographic correctness for spacing, but not for the character in between the spaces.
 
>> I'd also note that just about every modern word processor automatically
>> makes this substitution, for [0-9] numeric ranges, transparently converting
>> 1-9 into 1?9;

So?  We don't use word processors to edit MBz data, nor do people tag
their files using Word.

I wasn't suggesting that we do use word processors to edit MB.  However, I was suggesting that the use of correct typography, the en-dash included, is not unusual in the modern day, with modern software.  As an alternate example, I would point to Wikipedia; I think it could be suggested, without offending anyone here, that far more people edit there than edit MusicBrainz.  Yet, consulting their manual of style, it not only suggests, but *directs* the use of an en-dash, when appropriate.  http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Dashes
 
>> The new Guess Case is intelligent enough to do this substitution on
>> the fly, also transparently.
>>

Nice.  That would certainly be helpful, but that is only half the
problem.  I am sitting here reading and responding to this e-mail in
Thunderbird, on a reasonably modern Windows machine (ok, I'm still
running XP :P) and notice how your n-dashes displayed in the couple of
paragraphs above ... as question marks (i.e. unknown characters that I
can only guess are en-dashes).  I'm betting some mp3 players will have
problems too.  I'm willing to accept that some reverse cyrillic
character or some Kanji text doesn't display right on my screen or in my
mp3 player, but I am not willing to accept that what should (to me) be a
commonly used character - a dash (in the generic sense) - doesn't even
display correctly.

I can't speak to why your install of Thunderbird isn't showing the en-dash.  Checking their docs, I *can*, however, confirm that, according to the Thunderbird docs, the en-dash, em-dash, and all other Unicode characters are supported.  They also have been in the default Windows font since at least Windows 98, as well as supported by Mac and Linux, by default, for at least half a decade.  The only thing I can think that might possibly by causing you this problem would be that your mailserver itself is possibly mangling UTF-8 into something else, such as basic ASCII.  However, this sounds more like an argument for a new mailserver, not an argument for or against en-dashes.  :P
 
>> I think we could all agree, however, that, even if it's the easiest to type,
>> the hyphen-minus (the key on most keyboards) is the least correct range
>> indication character.

You would be incorrect that we could all agree to that. (I would say
that any character out of the dash-family would be much worse).

The hyphen-minus, by definition, has no typographical meaning.  It is not even a valid punctuation mark in *any* language or script.  It simply exists because, at one time in history, given no room to fit multiple dash, hyphen, and minus keys, and given that the output was pretty rough (and thus the distinction between them could not be detected anyhow), a compromise was made, *specifically for typewriters*.  The key then carried over to computer keyboards because they initially used, what else, typewriter keyboards.  Thus how can it be argued that it still is the best character, now that the correct typographical characters do exist, and have existed for sufficient enough time that every single computer (ok, save perhaps some of those still running Windows 95) on the planet supports them, without even changing fonts?  "Easiest", perhaps, but definitely not "best".

We don't
need a typographically correct character to indicate a range in a
database.  We could choose to use one if we like, but we don't need to.
We just need to agree on one that represents what we want it to.  Hell,
we could agree to adopt the phrase " to " if we wanted to, or how about
two dots ("..") like some programming languages use?  I personally think
we should pick the character closest to what people expect to see (i.e.
a dash of some sort) and that is easy for anyone to enter (i.e. it
exists on western keyboards without the need of any macros or special
gymnastics to type.)  To me, that means the thing that is next to the 0
and above the o and the p on my keyboard.  I don't know (and don't care)
whether that is a hyphen, a dash, a minus, or a thingamawhatchacallit.

We could also agree that red is blue.  :P

Seriously, I understand the argument about the hyphen-minus being the easiest to type.  We don't "need" to support anything at all, right?  However, we're talking about something here which is unarguably the more correct character to use.  The suggested guideline also, quite specifically, does not say that using a hyphen-minus is incorrect or unallowed, only that using an en-dash is preferred.  Also, the number of cases where there is something funky going on, and guess case is not used, is pretty small, with regards to the totality of the database.  So most of these would be autocorrected to en-dashes, even without the user doing anything at all.  For those that still end up entered using hyphen-minuses, there's nothing in the suggested guideline which would make those edits in any way whatsoever incorrect; definitely nothing in this guideline would suggest that an edit using a hyppehn-minus should be voted against, just on that basis.
 
So, to sum up my feelings:
- MBz collects data, not printed text, and therefore does not need to
follow typographical rules that are intended to make printed text "look
better".

But "data" becomes text.  MusicBrainz data ends up used in many different contexts, not just as a source for taggers or a raw data dump (such as a release's listing on the MB site.)  Why should we not, when we so easily can support it, suggest that correct typography indeed then be used?  We don't need to include all the accented characters either - "Johann Johannsson" is just as comprehensible as "Jóhann Jóhannsson".  However, it's not as correct, so we use the accented o's.  (And for the record, Jóhann Jóhannsson is a lot more difficult for me to type, using a US keyboard with Linux English/US layout, without reference to character lookups, than is Johann Johannsson.)
 
- Dash types are font dependent, and the differences between them will
be lost on many people.

The apparent difference may be lost, but the inherent meaning is not.  Just because a hyphen-minus and an en-dash may look identical, in a given font, in a given context, that does not then make them identical characters, nor is the computer suddenly then rendered unable to recognize that they are different characters with differring typographical meaning.
 
- If we add characters that don't display properly in applications
people use, when a very close facsimile character is available, we risk
alienating MBz contributors.

What applications?  Windows 95?  Windows 3.1?  Very old mp3 players that don't support even the very most basic Unicode characters?  Any software that today cannot display an en-dash is at least ten years old - for any software or hardware unable to render an en-dash correctly, there's bigger problems present, when attempting to use MB data, than whether or not we allow the use of correct typography.
 
- The more MBz grows, the more inclusive we need to be, so we should be
encouraging people to contribute by making it easy for them to do so.

Hence we allow the use of the hyphen-minus, and only "prefer" the en-dash.  Hence we provide a tool (the new Guess Case) which is capable to detecting proper situations to use an en-dash, at least with regards to Part Number Style.  However, I don't think that this is really a good reason to not use correct typography.  If Wikipedia can require the use of correct typography, we can at least suggest it, without our then making it "too hard" for the new editor to figure things out.  (Personally, given the number of people who enter things in ALL CAPS, I think some new users won't care how we word the guideline, or what we do or don't suggest re: typography...  but that's just me :P).
 
- Even if it is not "mandatory" but just "preferred", then we will be
encouraging some typographically-minded editors to spend untold hours
running around the database and cleaning up people's dashes - and with
all the work that needs to be done, is that really the best use of these
people's time?  Already, I have spent 15 minutes or more writing this

Well, as I mentioned a while back, the number of cases where even "Parts 1-3" (per the guideline as it currently is written) occur are quite few, vs those where all sorts of other mess are present and not in compliance with *any* official or proposed PartNumberStyle.  So should some editor(s) decide to try to clean some of that mess up, more power to them.  :)  But seriously, if someone is typographically minded, and cares to spend time changing hyphen-minuses into en-dashes, with regards to Part Number Style, why should we argue against that?  Everyone who edits contributes his or her own time to MusicBrainz, and no matter how they edit, (hopefully), the data benefits.  Is it really for you or me to decide that, say, adding 100 ARs to a release really would be a better use of some other editor's time, vs their going through to convert the hyphen-minuses?  No one is telling anyone to do it, if he or she thinks it a waste of time; anyone doing it would be doing it *because he or she wanted to*.

Brian
_______________________________________________
Musicbrainz-style mailing list
Musicbrainz-style@...
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style

Gmane