Martin Duerst | 2 Oct 06:33
Gravatar

[ruby-core:19103] Re: Encoding.default_internal

At 07:59 08/10/02, Michael Selig wrote:
>On Thu, 02 Oct 2008 00:15:01 +1000, James Gray <james <at> grayproductions.net>  
>wrote:

>> To be honest, I doubt I would have made the effort if I had known this  
>> change was coming.  It was challenging and I'm a wimp.  ;)

>Someone had to be the trailblazer, James, even if it was only to find out  
>that it wasn't the best path :-)

Yes indeed. I think your experience helped Matz quite a bit
for his decision.

>But I agree with you: if a library can be confident that its inputs are at  
>least ASCII-comptible, quite a bit of your efforts could be saved.
>If on top of that, if it can be reasonably sure that all its inputs are  
>encoding compatible, then it's even better.

I think this is not about confidence. In the software world,
there is no confidence about input. It's much more about what
expectation a library sets and documents. I think there are
quite a few possibilities:

a) The library accepts and produces only UTF-8. Best used with -U.

b) The library accepts, in one run, a single arbitrary encoding,
   and returns the same encoding, if that encoding is ASCII-compatible.

c) Same as before, but extended for non-ASCII-compatible.
   (what James has done with the CVS library, as far I understand it)

d) The library accepts multiple encodings and handles all the
   conversions internally.

There are of course other cases, such as a library only accepting
some specific encoding different from UTF-8, for some special processing.

 From an overall Ruby standpoint, b) should be the 'default', but
in all cases, things should be clearly documented.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst <at> it.aoyama.ac.jp     


Gmane