4 Jan 2005 07:20
Simplifying abstract syntaxes (was: On kernel-E, operators, and properties)
Mark Miller wrote: > By way of context, the E abstract syntax allows any arbitrary string as a > message selector. When the message selector is an E identifier, the E > concrete > syntax allows the selector to be written without the quotes. However, an E > language processor that starts with ASTs would not need to know or care > about, > for example, the Unicode tables saying what is an identifier character. At http://www.eros-os.org/pipermail/e-lang/2004-April/009821.html Dean Tribble wrote: > I'm learning C# and noticed that it has a similar but more > general mechanism: <at> "blah blah" is the identifier <blah blah> and can be > used anywhere an identifier can be used. This provides the ability to > access libraries from other languages, etc. The particular choice above doesn't work for us, because of our conflicting use of ' <at> '. But yes, I'd like to remove such distinction from the E abstract syntax everywhere, and from the Term tree syntax as well. Proposal: Instead of saying that an E variable-name may only be an identifier, define it as <variable-name> ::= <ident> | '::' <ident> | '::' <literal-string> ; As of 0.8.33p, these productions can be enabled by pragma.enable("noun-string")'. The reason for using '::' is that the experimental property access syntax already uses '::<property-name>', where <property-name> can be an identifier or a literal string, for property access. As has previously been discussed, this expands as shown below: ? pragma.enable("dot-props") ? interp::expand := true # value: true ? interp::expand := false # expansion: interp.__getPropertySlot("expand").setValue(\ # def ares__3 := false) # ares__3 # value: false ignoring the return value, and given that interp doesn't override the Miranda __getPropertySlot/1 method, this expansion is equivalent to interp.setExpand(false) So, the intuition is to think of '::' as a naming-path separator. An initial '::' begins a naming path, and an initial name on a naming path is always a variable name in the current scope. Even if we never accept the dot-props suggestion into the language definition, the '::' syntax for introducing non-identifier variable-names isn't much worse than anything else I could think of. A silly example of doing arithmetic in a more Scheme-like style: ? pragma.enable("noun-string") ? def ::"+"(a, b) :any { return a + b } # value: <+> ? ::"+"(3,4) # value: 7 The current Term-tree abstract grammar, or "infoset" as w3c folks like to say, is documented at <http://www.erights.org/elang/quasi/terms/term-spec.html>. A "Tag" (<http://www.erights.org/elang/quasi/terms/term-spec.html#Tag>) is conceptually a list of segments, where the current definition of segment depends on Unicode tables in a fashion similar to the way that the definition of Java and current E identifiers do. I propose the same kind of fix as above: * The abstract syntax for segment should be an arbitrary string of Unicode characters. A Tag would then be a list of arbitrary strings. The abstract Term-tree syntax would then be independent of complex Unicode distinctions. * The concrete syntax would depend on Unicode character tables so that only a segment that was an identifier, for some suitable notion of identifier, could be written without quotes. * For consistency with E, we go to '::' rather than ':' as the segment separator. * To distinguish a Tag from a literal String, an initial segment, if it's quoted, must be preceded by a '::'. In terms of the current grammar, perhaps <Tag> ::= (<ident> | <special>) ('::' <segment>)* | '::' <segment> ('::' <segment>)* ; <segment> ::= <ident> | <special> | <String> ; This would give us the opportunity to revive an appealing old proposal: Quasi-JSON back from the dead At http://www.eros-os.org/pipermail/e-lang/2004-September/010074.html Mark Miller wrote: > Note: I hadn't realized till writing this response that the term-tree > grammar is already so close to accepting JSON as a subset. [...] > if I changed from "=" to ":" [...] then, ignoring [other] > annoying Unicode issues, JSON would indeed be a syntactic subset of the > term-tree language. [...] If we did this, then > we could probably dispense with creating a separate JSON quasi-parser. > [...] If no one objects, the next release of E > > * will accept either '=' or ':' in the term-tree grammar and quasi-grammer as > synonyms, > * '=' will be deprecated, > * the default pretty printer will be changed to print using ':' rather > than '='. > > Leaving aside the annoying Unicode issues, E's term-trees will then be a > proper superset of JSON, and no separate parser will be needed. > > The previous examples will then work, with ':' rather than '='. This means > you'll be able to process JSON data using quasi-literal JSON expressions and > patterns immediately. > > In some later release, I hope to retire the soon-to-be-deprecated '='. At http://www.eros-os.org/pipermail/e-lang/2004-September/010077.html Kevin Reid wrote: > I object. > > Currently, term`x:y` is a term with the tag <x:y>. If this syntax > change were introduced, then term`x: y` would be surprisingly different > from term`x:y` (currently, the former is a syntax error). > > Disallowing ':' in term tags would break the TermL embedding of XML as > described in <http://www.erights.org/data/terml/embeddings.html>, which > I have implemented in some of my projects. > > Specifically, something other than ':' would need to be used to > represent separation between the XML namespace prefix or URI and the > local name. This would make the embedding farther from XML, and also > require choosing a character which can be used in TermL tags but is not > a character which might appear in an XML Name. (An escaping syntax > could be used instead, but it would be far more complex than the > current embedding.) So this proposal does constitute a form of escape syntax. In addition, uri strings as segments would always need to be quoted. So how objectionable is it? -- -- Text by me above is hereby placed in the public domain Cheers, --MarkM
RSS Feed