Mark Miller | 4 Jan 07:20 2005

Simplifying abstract syntaxes (was: On kernel-E, operators, and properties)

Mark Miller wrote:
> By way of context, the E abstract syntax allows any arbitrary string as a
> message selector. When the message selector is an E identifier, the E 
> concrete
> syntax allows the selector to be written without the quotes. However, an E
> language processor that starts with ASTs would not need to know or care 
> about,
> for example, the Unicode tables saying what is an identifier character.

At http://www.eros-os.org/pipermail/e-lang/2004-April/009821.html
Dean Tribble wrote:
> I'm learning C# and noticed that it has a similar but more
> general mechanism:  <at> "blah blah" is the identifier <blah blah> and can be
> used anywhere an identifier can be used. This provides the ability to
> access libraries from other languages, etc.

The particular choice above doesn't work for us, because of our conflicting
use of ' <at> '. But yes, I'd like to remove such distinction from the E abstract
syntax everywhere, and from the Term tree syntax as well. Proposal: Instead of
saying that an E variable-name may only be an identifier, define it as

     <variable-name> ::= <ident>
     |                   '::' <ident>
     |                   '::' <literal-string>
     ;

As of 0.8.33p, these productions can be enabled by
pragma.enable("noun-string")'. The reason for using '::' is that the
experimental property access syntax already uses '::<property-name>', where
<property-name> can be an identifier or a literal string, for property access.
As has previously been discussed, this expands as shown below:

     ? pragma.enable("dot-props")
     ? interp::expand := true
     # value: true

     ? interp::expand := false
     # expansion: interp.__getPropertySlot("expand").setValue(\
     #                def ares__3 := false)
     #            ares__3

     # value: false

ignoring the return value, and given that interp doesn't override the Miranda
__getPropertySlot/1 method, this expansion is equivalent to

     interp.setExpand(false)

So, the intuition is to think of '::' as a naming-path separator. An initial
'::' begins a naming path, and an initial name on a naming path is always a
variable name in the current scope. Even if we never accept the dot-props
suggestion into the language definition, the '::' syntax for introducing
non-identifier variable-names isn't much worse than anything else I could 
think of.

A silly example of doing arithmetic in a more Scheme-like style:

     ? pragma.enable("noun-string")
     ? def ::"+"(a, b) :any { return a + b }
     # value: <+>

     ? ::"+"(3,4)
     # value: 7

The current Term-tree abstract grammar, or "infoset" as w3c folks like to say, 
is documented at <http://www.erights.org/elang/quasi/terms/term-spec.html>. A 
"Tag" (<http://www.erights.org/elang/quasi/terms/term-spec.html#Tag>) is 
conceptually a list of segments, where the current definition of segment 
depends on Unicode tables in a fashion similar to the way that the definition 
of Java and current E identifiers do. I propose the same kind of fix as above:

* The abstract syntax for segment should be an arbitrary string of Unicode 
characters. A Tag would then be a list of arbitrary strings. The abstract 
Term-tree syntax would then be independent of complex Unicode distinctions.

* The concrete syntax would depend on Unicode character tables so that only a 
segment that was an identifier, for some suitable notion of identifier, could 
be written without quotes.

* For consistency with E, we go to '::' rather than ':' as the segment separator.

* To distinguish a Tag from a literal String, an initial segment, if it's 
quoted, must be preceded by a '::'. In terms of the current grammar, perhaps

     <Tag> ::= (<ident> | <special>) ('::' <segment>)*
     |         '::' <segment> ('::' <segment>)*
     ;

     <segment> ::= <ident> | <special> | <String> ;

This would give us the opportunity to revive an appealing old proposal:

                Quasi-JSON back from the dead

At http://www.eros-os.org/pipermail/e-lang/2004-September/010074.html
Mark Miller wrote:
 > Note: I hadn't realized till writing this response that the term-tree
 > grammar is already so close to accepting JSON as a subset. [...]
 > if I changed from "=" to ":" [...] then, ignoring [other]
 > annoying Unicode issues, JSON would indeed be a syntactic subset of the
 > term-tree language. [...] If we did this, then
 > we could probably dispense with creating a separate JSON quasi-parser.

> [...] If no one objects, the next release of E
> 
> * will accept either '=' or ':' in the term-tree grammar and quasi-grammer as
>   synonyms,
> * '=' will be deprecated,
> * the default pretty printer will be changed to print using ':' rather
>   than '='.
> 
> Leaving aside the annoying Unicode issues, E's term-trees will then be a
> proper superset of JSON, and no separate parser will be needed.
> 
> The previous examples will then work, with ':' rather than '='. This means
> you'll be able to process JSON data using quasi-literal JSON expressions and
> patterns immediately.
> 
> In some later release, I hope to retire the soon-to-be-deprecated '='.

At http://www.eros-os.org/pipermail/e-lang/2004-September/010077.html
Kevin Reid wrote:
> I object.
> 
> Currently, term`x:y` is a term with the tag <x:y>. If this syntax 
> change were introduced, then term`x: y` would be surprisingly different 
> from term`x:y` (currently, the former is a syntax error).
> 
> Disallowing ':' in term tags would break the TermL embedding of XML as 
> described in <http://www.erights.org/data/terml/embeddings.html>, which 
> I have implemented in some of my projects.
> 
> Specifically, something other than ':' would need to be used to 
> represent separation between the XML namespace prefix or URI and the 
> local name. This would make the embedding farther from XML, and also 
> require choosing a character which can be used in TermL tags but is not 
> a character which might appear in an XML Name. (An escaping syntax 
> could be used instead, but it would be far more complex than the 
> current embedding.)

So this proposal does constitute a form of escape syntax. In addition, uri 
strings as segments would always need to be quoted.

So how objectionable is it?

--

-- 
Text by me above is hereby placed in the public domain

     Cheers,
     --MarkM

Gmane