At 06:14 05/12/01, Eli Zaretskii wrote:
>> Date: Wed, 30 Nov 2005 16:04:18 +0900
>> From: Martin Duerst
>> Cc: firstname.lastname@example.org
>> >2. When the created file is read (again) for editing, these strings
>> > should be seen (on screen) EXACTLY as it was when it was entered.
>> If using the same editor, definitely. But we also found that there
>> can be some personal preferences, so if you changed your preferences,
>> the display would change.
>That's actually a bad idea, IMHO: the text should be saved so that any
>other bidi-enabled editor will display it the same.
For plain running text, this is definitely true. I'm not sure
this applies also for structured stuff such as XML.
>That is why
>fiddling with Unicode character properties is something I feel we
>should not do: those properties are the only common denominator of all
I definitely understand your feeling. But I just want to mention
that the Unicode bidi algorithm explicitly allows things like this.
If you look e.g. at:
you won't see changing of properties mentioned explicitly,
but you'll be able to figure out that "Provide artificial
context" does very much equivalent things.
>I still don't understand why you are opposed to using RLM, LRM, and
>other special characters reserved by Unicode: after all, if the result
>is displayed by a bidi-compatible program, they will always behave
>according to UAX#9. (In XML and similar files, we could use HTML/XML
>directives instead of the literal RLM etc. when saving the file, but
>the principle remains the same: whenever the default character
>properties yield wrong display, use the explicit directional marks to
Well, it just doesn't work that easily. Let's look some examples.
ESAC REPPU (upper case) stands for right-to-left.
Let's take a very simple example, first logical:
Not applying anything but the bidi algorithm, this gets displayed as
which I hope you can agree is a useless mess;
what we would like to see is something like
Now to get that, it would be okay to add a single [LRM],
e.g. like so (logical):
But with the [LRM] in this position, you will get an
XML parsing error. In this specific case, there are
two other possible positions for the [LRM] (logical):
DEF ghi and
But now, we have made the [LRM] part of the attribute
value or the element content. In both cases, we have
changed the content of our *data* just to solve some
display problems of the *markup*. It should be clear
that this is completely out of question (we may want
to have some marks in the content if the content warrants
that, but that's a different issue).
Now about using HTML directives (the HTML dir attribute,
and the element): Again, these are for indicating
directionality of the marked-up content. Trying to use
them directly to fix some markup details as in the examples
above doesn't seem appropriate. Actually, in our experimental
simulation, we parse this information, and consider it
for reordering the HTML/XML source (that part is not yet
available in the online simulation as far as I'm aware of).
We do this for two reasons: 1) to try to be able to display
the content close to what the content in a browser would look,
and 2) to try to get some information about the base
directionality/embedding that we should apply to the
markup. I have to admit that this area is still rather
In any case, we are still left with two problems:
Generic XML does not have these directives, and
we don't have an editor yet that would react in
the right way to such directives.