12 Aug 2004 12:01
LL(1) parsing and LL(k) parsing with combinators from example
Burak Emir <Burak.Emir <at> epfl.ch>
2004-08-12 10:01:49 GMT
2004-08-12 10:01:49 GMT
Moez A. Abdel-Gawad wrote: >Btw, on another note, it seems that, contrary to >what is implied in ch.13 of ScalaByExample (pp. >119-120, in the April 2004 version), the parsers >provided there do NOT support LL(K) grammars, but >only LL(1) grammars... unless 'intype' is a *value > > Yes and no. The type of intype is not crucial, rather it is the way the parser receives its input. If you have a string, or an array, then intype can be Int (the index into the string or the array), upon backtracking the old index still makes sense, pointing to the old place where we branched. >type*! If 'intype' is a reference type, using 'in' >again, for the ||| combinator which needs backtra- >cking, does NOT guarantee that it is the OLD 'in'. >If 'in' is of reference type then actually it re- >fers to the same new object, just as before att- >empting the ||| alternatives. I hope this is cl- > > Sure, this is what you will do if you have input that does not allow backtracking, as e.g. Iterators. Then the intype is the Iterator itself, and the apply methods just call its next- method to get the next input item. >ear to someone, so that I can discuss with him/her >further about how to make the parsers more general >(accept non-LL(1) grammars). I was considering for >example using clone() and/or mark()/reset() (of >Java Readers), all of which are solutions that >have problems, and are not "clean" (expose to the >parsers more than necessary about the type of 'in'). > > > For most languages I wanted to parse, I found LL(1) adequate - many LL(k) grammars can by simple left-factoring turned into an LL(1) grammar. The only nuisance is that we don't yet have a parser generator that would write the code itself. But I agree it is an engineering challenge to extend the framework of combinator parsing to k tokens lookahead. I thing the way to go is to make intype something like an iterator with a buffer of size k. Then, when you need to backtrack, you know that you never have to go back more than k steps, which should still be in the buffer. For having a useful intype, you would need thus need the source Iterator, an array of length k, an integer which says where we are (0..k-1) means in the buffer, k means one has to get the next one from the iterator). (You will not find literature about this, because most people interested in combinator parsing work in purely functional languages - they simple don't like iterators are arrays. Another example where Scala gives the best of both worlds. The big problem is of course, that if your grammar is LL(j) with j>k, your parser will fail, and you have to manually verify this. There is no way you can attach static information to the parsers you build from combinators. This is a known restriction of parsers based on the monad interface. Other approaches to combinator parsing exist, e.g. John Hughes' arrows, or Swierstra and Duponcheel's combinators. With both, you would be allowed to annotate your grammars with the amount of lookahead they need, and you can then adjust the combinators (like |||) so they check that it does not go over k. On the other hand, those combinators can also avoid backtracking altogether (precisely because of the information attached to every parser), so you would be able to write an LL(k) grammar more directly. Cheers, Burak
.
The big problem is of course, that if your grammar is LL(j) with j>k, your
parser will fail, and you have to manually verify this. There is no way you
can attach static information to the parsers you build from combinators.
This is a known restriction of parsers based on the monad interface. Other
approaches to combinator parsing exist, e.g. John Hughes' arrows,
or Swierstra and Duponcheel's combinators. With both, you would be allowed
to annotate your grammars with the amount of lookahead they need, and
you can then adjust the combinators (like |||) so they check that it
does not
go over k.
On the other hand, those combinators can also avoid backtracking altogether
(precisely because of the information attached to every parser), so you
would
be able to write an LL(k) grammar more directly.
Cheers,
Burak
RSS Feed