Burak Emir | 12 Aug 2004 12:01
Picon
Picon
Favicon

LL(1) parsing and LL(k) parsing with combinators from example

Moez A. Abdel-Gawad wrote:

>Btw, on another note, it seems that, contrary to
>what is implied in ch.13 of ScalaByExample (pp.
>119-120, in the April 2004 version), the parsers
>provided there do NOT support LL(K) grammars, but
>only LL(1) grammars... unless 'intype' is a *value
>  
>
Yes and no. The type of intype is not crucial, rather it is the
way the parser receives its input. If you have a string, or an
array, then intype can be Int (the index into the string or the
array), upon backtracking the old index still makes sense,
pointing to the old place where we branched.

>type*! If 'intype' is a reference type, using 'in'
>again, for the ||| combinator which needs backtra-
>cking, does NOT guarantee that it is the OLD 'in'.
>If 'in' is of reference type then actually it re-
>fers to the same new object, just as before att-
>empting the ||| alternatives. I hope this is cl-
>  
>
Sure, this is what you will do if you have input that does not
allow backtracking, as e.g. Iterators. Then the intype
is the Iterator itself, and the apply methods just call its next-
method to get the next input item.

>ear to someone, so that I can discuss with him/her
>further about how to make the parsers more general
>(accept non-LL(1) grammars). I was considering for
>example using clone() and/or mark()/reset() (of
>Java Readers), all of which are solutions that
>have problems, and are not "clean" (expose to the
>parsers more than necessary about the type of 'in').
>
>  
>
For most languages I wanted to parse, I found LL(1) adequate -
many LL(k) grammars can by simple left-factoring turned into
an LL(1) grammar. The only nuisance is that we don't yet have
a parser generator that would write the code itself.

But I agree it is an engineering challenge to extend the framework of 
combinator
parsing to k tokens lookahead. I thing the way to go is to make intype
something like an iterator with a buffer of size k. Then, when you
need to backtrack, you know that you never have to go back more than
k steps, which should still be in the buffer.

For having a useful intype, you would need thus need the source Iterator,
an array of length k, an integer which says where we are (0..k-1) means
in the buffer, k means one has to get the next one from the iterator).
(You will not find literature about this, because most people interested in
combinator parsing work in purely functional languages - they simple don't
like iterators are arrays. Another example where Scala gives the best of 
both
worlds :-).

The big problem is of course, that if your grammar is LL(j) with j>k, your
parser will fail, and you have to manually verify this. There is no way you
can attach static information to the parsers you build from combinators.

This is a known restriction of parsers based on the monad interface. Other
approaches to combinator parsing exist, e.g. John Hughes' arrows,
or Swierstra and Duponcheel's combinators. With both, you would be allowed
to annotate your grammars with the amount of lookahead they need, and
you can then adjust the combinators (like |||) so they check that it 
does not
go over k.

On the other hand, those combinators can also avoid backtracking altogether
(precisely because of the information attached to every parser), so you 
would
be able to write an LL(k) grammar more directly.

Cheers,
Burak


Gmane