Mark Miller | 4 Jan 07:20 2005

Simplifying abstract syntaxes (was: On kernel-E, operators, and properties)

Mark Miller wrote:
> By way of context, the E abstract syntax allows any arbitrary string as a
> message selector. When the message selector is an E identifier, the E 
> concrete
> syntax allows the selector to be written without the quotes. However, an E
> language processor that starts with ASTs would not need to know or care 
> about,
> for example, the Unicode tables saying what is an identifier character.

Dean Tribble wrote:
> I'm learning C# and noticed that it has a similar but more
> general mechanism:  <at> "blah blah" is the identifier <blah blah> and can be
> used anywhere an identifier can be used. This provides the ability to
> access libraries from other languages, etc.

The particular choice above doesn't work for us, because of our conflicting
use of ' <at> '. But yes, I'd like to remove such distinction from the E abstract
syntax everywhere, and from the Term tree syntax as well. Proposal: Instead of
saying that an E variable-name may only be an identifier, define it as

     <variable-name> ::= <ident>
     |                   '::' <ident>
     |                   '::' <literal-string>

As of 0.8.33p, these productions can be enabled by
pragma.enable("noun-string")'. The reason for using '::' is that the
experimental property access syntax already uses '::<property-name>', where
<property-name> can be an identifier or a literal string, for property access.
As has previously been discussed, this expands as shown below:

     ? pragma.enable("dot-props")
     ? interp::expand := true
     # value: true

     ? interp::expand := false
     # expansion: interp.__getPropertySlot("expand").setValue(\
     #                def ares__3 := false)
     #            ares__3

     # value: false

ignoring the return value, and given that interp doesn't override the Miranda
__getPropertySlot/1 method, this expansion is equivalent to


So, the intuition is to think of '::' as a naming-path separator. An initial
'::' begins a naming path, and an initial name on a naming path is always a
variable name in the current scope. Even if we never accept the dot-props
suggestion into the language definition, the '::' syntax for introducing
non-identifier variable-names isn't much worse than anything else I could 
think of.

A silly example of doing arithmetic in a more Scheme-like style:

     ? pragma.enable("noun-string")
     ? def ::"+"(a, b) :any { return a + b }
     # value: <+>

     ? ::"+"(3,4)
     # value: 7

The current Term-tree abstract grammar, or "infoset" as w3c folks like to say, 
is documented at <>. A 
"Tag" (<>) is 
conceptually a list of segments, where the current definition of segment 
depends on Unicode tables in a fashion similar to the way that the definition 
of Java and current E identifiers do. I propose the same kind of fix as above:

* The abstract syntax for segment should be an arbitrary string of Unicode 
characters. A Tag would then be a list of arbitrary strings. The abstract 
Term-tree syntax would then be independent of complex Unicode distinctions.

* The concrete syntax would depend on Unicode character tables so that only a 
segment that was an identifier, for some suitable notion of identifier, could 
be written without quotes.

* For consistency with E, we go to '::' rather than ':' as the segment separator.

* To distinguish a Tag from a literal String, an initial segment, if it's 
quoted, must be preceded by a '::'. In terms of the current grammar, perhaps

     <Tag> ::= (<ident> | <special>) ('::' <segment>)*
     |         '::' <segment> ('::' <segment>)*

     <segment> ::= <ident> | <special> | <String> ;

This would give us the opportunity to revive an appealing old proposal:

                Quasi-JSON back from the dead

Mark Miller wrote:
 > Note: I hadn't realized till writing this response that the term-tree
 > grammar is already so close to accepting JSON as a subset. [...]
 > if I changed from "=" to ":" [...] then, ignoring [other]
 > annoying Unicode issues, JSON would indeed be a syntactic subset of the
 > term-tree language. [...] If we did this, then
 > we could probably dispense with creating a separate JSON quasi-parser.

> [...] If no one objects, the next release of E
> * will accept either '=' or ':' in the term-tree grammar and quasi-grammer as
>   synonyms,
> * '=' will be deprecated,
> * the default pretty printer will be changed to print using ':' rather
>   than '='.
> Leaving aside the annoying Unicode issues, E's term-trees will then be a
> proper superset of JSON, and no separate parser will be needed.
> The previous examples will then work, with ':' rather than '='. This means
> you'll be able to process JSON data using quasi-literal JSON expressions and
> patterns immediately.
> In some later release, I hope to retire the soon-to-be-deprecated '='.

Kevin Reid wrote:
> I object.
> Currently, term`x:y` is a term with the tag <x:y>. If this syntax 
> change were introduced, then term`x: y` would be surprisingly different 
> from term`x:y` (currently, the former is a syntax error).
> Disallowing ':' in term tags would break the TermL embedding of XML as 
> described in <>, which 
> I have implemented in some of my projects.
> Specifically, something other than ':' would need to be used to 
> represent separation between the XML namespace prefix or URI and the 
> local name. This would make the embedding farther from XML, and also 
> require choosing a character which can be used in TermL tags but is not 
> a character which might appear in an XML Name. (An escaping syntax 
> could be used instead, but it would be far more complex than the 
> current embedding.)

So this proposal does constitute a form of escape syntax. In addition, uri 
strings as segments would always need to be quoted.

So how objectionable is it?


Text by me above is hereby placed in the public domain