1 Dec 2005 12:21
Martin Duerst <duerst <at> it.aoyama.ac.jp>
2005-12-01 11:21:07 GMT
2005-12-01 11:21:07 GMT
At 06:14 05/12/01, Eli Zaretskii wrote: >> Date: Wed, 30 Nov 2005 16:04:18 +0900 >> From: Martin Duerst <duerst <at> it.aoyama.ac.jp> >> Cc: emacs-bidi <at> gnu.org >> >> >2. When the created file is read (again) for editing, these strings >> > should be seen (on screen) EXACTLY as it was when it was entered. >> >> If using the same editor, definitely. But we also found that there >> can be some personal preferences, so if you changed your preferences, >> the display would change. > >That's actually a bad idea, IMHO: the text should be saved so that any >other bidi-enabled editor will display it the same. For plain running text, this is definitely true. I'm not sure this applies also for structured stuff such as XML. >That is why >fiddling with Unicode character properties is something I feel we >should not do: those properties are the only common denominator of all >bidi editors. I definitely understand your feeling. But I just want to mention that the Unicode bidi algorithm explicitly allows things like this. If you look e.g. at: http://www.unicode.org/reports/tr9/#HL3 and http://www.unicode.org/reports/tr9/#HL5, you won't see changing of properties mentioned explicitly, but you'll be able to figure out that "Provide artificial context" does very much equivalent things. >I still don't understand why you are opposed to using RLM, LRM, and >other special characters reserved by Unicode: after all, if the result >is displayed by a bidi-compatible program, they will always behave >according to UAX#9. (In XML and similar files, we could use HTML/XML >directives instead of the literal RLM etc. when saving the file, but >the principle remains the same: whenever the default character >properties yield wrong display, use the explicit directional marks to >fix that.) Well, it just doesn't work that easily. Let's look some examples. ESAC REPPU (upper case) stands for right-to-left. Let's take a very simple example, first logical: <x y='ABC'>DEF ghi</x> Not applying anything but the bidi algorithm, this gets displayed as <x y='FED<'CBA ghi</x> which I hope you can agree is a useless mess; what we would like to see is something like <x y='CBA'>FED ghi</x> Now to get that, it would be okay to add a single [LRM], e.g. like so (logical): <x y='ABC'[LRM]>DEF ghi</x> But with the [LRM] in this position, you will get an XML parsing error. In this specific case, there are two other possible positions for the [LRM] (logical): <x y='ABC[LRM]'>DEF ghi</x> and <x y='ABC'>[LRM]DEF ghi</x> But now, we have made the [LRM] part of the attribute value or the element content. In both cases, we have changed the content of our *data* just to solve some display problems of the *markup*. It should be clear that this is completely out of question (we may want to have some marks in the content if the content warrants that, but that's a different issue). Now about using HTML directives (the HTML dir attribute, and the <bdo> element): Again, these are for indicating directionality of the marked-up content. Trying to use them directly to fix some markup details as in the examples above doesn't seem appropriate. Actually, in our experimental simulation, we parse this information, and consider it for reordering the HTML/XML source (that part is not yet available in the online simulation as far as I'm aware of). We do this for two reasons: 1) to try to be able to display the content close to what the content in a browser would look, and 2) to try to get some information about the base directionality/embedding that we should apply to the markup. I have to admit that this area is still rather experimental. In any case, we are still left with two problems: Generic XML does not have these directives, and we don't have an editor yet that would react in the right way to such directives. Regards, Martin.