jastrachan | 7 Oct 09:35 2004

Re: [groovy-dev] making the bytecode generation more understandable

On 7 Oct 2004, at 08:21, John Rose wrote:
> On Oct 6, 2004, at 10:08, John Wilson wrote:
>> I'm quite keen in generating Java and compiling that. It doesn't look 
>> too hard to walk an AST and generate Java. It's not a mechanism that 
>> you would use in production but I think it might be very helpful in 
>> investigating optimisations as you can try out different strategies 
>> by just editing the generated code (a decompiler is another option, 
>> of course).
> I agree with this.  Source generation generally makes development 
> easier, since there are more ways to look at and use source code.  The 
> new backend should be designed with the intention of generating both 
> source and binary.  This means that the source code has to be kept 
> fairly low-level, without too much effort invested in making it 
> beautiful to human readers.

Agreed. However its pretty trivial to walk a Java AST and generate 
source code. So I'd expect that to be done for Janino/Serp anyways. 
i.e. we could probably reuse this - or worst case we just write a 
simple Java-AST walker/visitor.

Most importantly, we should just have 1 mapping of Groovy AST -> Java 
AST and not have to maintain 2 mappings which could easily get out of 
wack (Groovy AST -> bytecode and Groovy AST -> Java source).

>> Going through a Java  compiler (either by generating Java or by 
>> generating an AST) imposes Java limitations on us.

Other than names, there's little limitations really. e.g. pretty much 
every language feature of Groovy maps to some pretty straight forward 
Java code under the hood.

>> The Java is far more restrictive in the allowed spelling of 
>> identifiers than the JVM is. If I read him right John Rose is 
>> considering extending the Groovy definition of identifier to a 
>> superset of Java's. This would be a problem.
> The Groovy language can support Unicode names without significant harm 
> to interoperability with Java, by providing a mapping to JVM names 
> ("bytecode names") that respects Java practices.  I mean name 
> mangling, something akin to what's done for nested classes, but with 
> hex numbers for code points.  The Borneo language provides a sketch of 
> this sort of technique in the case of operator names.
> I'm thinking of something pretty low-impact, which does not conflict 
> with other kinds of Java identifiers in wide use.  I also want it to 
> be relatively readable:  A mangling should be short and easy to 
> recognize, and should not encode "normal" characters.  I just put a 
> detailed proposal into the wiki:  
> http://docs.codehaus.org/display/GroovyJSR/extended+names .


Incidentally one of the main drivers of making the bytecode generation 
more understandable is to be able to really tune things. e.g. if you 
use static typing (or the compiler can easily deduce the type of an 
expression) we really should be able to generate bytecode which is as 
efficient as Java. As well as being a completely dynamic scripting 
language, I'd also like to use Groovy as a drop in replacement for Java 
for high performance coding as well.