Loup Vaillant | 8 Feb 18:24 2012
Picon

Re: Raspberry Pi

Alan Kay wrote:
> Hi Loup
>
> Actually, your last guess was how we thought most of the optimizations
> would be done (as separate code "guarded" by the meanings).  […]
 > In practice, the  optimizations we did do are done in the translation
 > chain and in the run-time, […]

Okay, thanks.

I can't recall the exact reference, but I have read once about a middle
ground: mechanical optimization passes that are brittle in the face of
meaning change.  I mean, if you change the concise version of your
program, you may have to re-think your optimizations passes, but you
don't necessarily have to re-write your optimized version directly.

Example
{
   The guy where at the university, and was assigned to write optimized
   multiplication for big numbers.  Each student would  be graded
   according to the speed of their program.  No restriction on the
   programming language.

   Everyone started coding in C, but _he_ preferred to start with
   scheme.  He coded a straightforward version of the algorithm, then
   set out to manually (but mostly mechanically) apply a set of
   correctness-preserving transformations, most notably a CPS
   transformation, and a direct translation to C with gotos.  His final
   program, written in pure C, was the second fastest of his class (and
   very close to the first, which used assembly heavily).  Looking back
   at what he could have done better, he saw that his program spent most
   of his time in malloc().  He didn't know how to do it at the time,
   but he had managed his memory directly, his program would have been
   first.

   Oh, and of course, he had much less trouble dealing with mistakes
   than his classmates.

   So, his conclusion was that speed comes from beautiful programs, not
   "prematurely optimized" ones.
}

About Frank, we may imagine using this method in a more automated way,
for instance by annotating the source and intermediate code with
specialized optimizations that would only work in certain cases.  It
could be something roughly like this:

Nile Source            <- Annotations to optimize the Nile program
    |
    |                      Compiler pass that check the validity of the
    |                      optimizations then applies them.
    v
Maru Intermediate code <- Annotations to optimize that maru program
    |
    |                      Compiler pass like the above
    v
C Backend code         <- Annotations (though at this point…)
    |
    |                   <- GCC
    v
Assembly               <- (no, I won't touch that :-)

(Note that instead of annotating the programs, you could manually
  control the compilers.)

Of course, the second you change the Nile source is the second your
annotations at every level won't work any more.  But (i) you would only
have to redo your annotations, and (ii), maybe not all of them anyway,
for there is a slight chance that your intermediate representation
didn't change too much when you changed your source code.

I can think of one big caveat however: if the generated code is too big
or too cryptic, this approach may not be feasible any more.  And I
forgot about profiling your program first.

 > But Dan Amelang did such a great job at the meanings that they ran
 > fast enough tempt us to use them directly [so] Cairo never entered
 > the picture.

If I had to speculate from an outsider's point of view, I'd say these
good surprises probably apply to almost any domain specific language.
The more specialized a language is, the more domain knowledge you can
embed in the compiler, the more efficient the optimizations may be. I
know it sounds like magic, but I recall a similar example with Haskell,
applied to bioinformatics (again, can't find the paper).

Example
{
   The goal was to implement a super-fast algorithm.  The catch was, the
   resulting program has to accept a rather big number of parameters.
   Written directly in C, the fact that those parameters changed meant
   the main loop had to make several checks, slowing the whole thing
   down.

   So they did an EDSL based on monads that basically generated a C
   program after the parameters were read and knows, then ran it.  Not
   quite Just-In-Time compilation, but close.  The result was of course
   noticeably faster than the original C program.
}

Therefore, I'm quite optimistic.  My best guess right now is that smart
compilers will be more than enough to make Frank "fast enough", "as fast
as C"[1] or possibly even faster, for 2 reasons:

1. As far as I know, most languages in Frank are quite specialized.
    That prior knowledge can be exploited by compilers.
2. The code volume is sufficiently small that aggressive whole program
    optimizations are actually feasible.

Such compilers may cost 10 to 100 thousands lines or more, but at least
those lines would be strictly contained. Then, potential end users
wouldn't give up too much hackability in the name of performance.

[1]: http://c2.com/cgi/wiki?AsFastAsCee

Loup.
_______________________________________________
fonc mailing list
fonc <at> vpri.org
http://vpri.org/mailman/listinfo/fonc

Gmane