Thomas Manson | 7 Oct 19:23

Re: CVS migration help

Hi Brian,
 
on my new system :
 
LANG=en_US.UTF-8
 
thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$ ll
total 32
drwxr-xr-x 2 thomas thomas  4096 2008-10-06 18:07 .
drwxr-xr-x 9 thomas thomas  4096 2008-10-06 18:07 ..
-r--r--r-- 1 thomas thomas 23274 2008-01-20 00:56 Sp?cifications.doc,v
thomas <at> home:~/temp/cvsrepo/crf-irp/Ressources/documentation$ ls -N | hexdump -C
00000000  53 70 e9 63 69 66 69 63  61 74 69 6f 6e 73 2e 64  |Sp.cifications.d|
00000010  6f 63 2c 76 0a                                    |oc,v.|
00000015

On my old system, from which the files came from  :
 
 
[root <at> home documentation]# ll
total 24
-r--r--r--  1 paquerette dev 23274 jan 20  2008 Spécifications.doc,v
[root <at> home documentation]# ls -N | hexdump -C
00000000  53 70 e9 63 69 66 69 63  61 74 69 6f 6e 73 2e 64  |Spécifications.d|
00000010  6f 63 2c 76 0a                                    |oc,v.|
00000015
 
>Have you checked that the names in the .dat files actually are encoded in UTF-8?
 
How would  I do that ?
 
 
On my old system, python is too old
 
[root <at> home documentation]# yum info python
==============================================================
WARNING: Additional commands may be required after running yum
==============================================================
Loading "smeserver" plugin
Loading "installonlyn" plugin
Loading "fastestmirror" plugin
Setting up repositories
Loading mirror speeds from cached hostfile
Reading repository metadata in from local files
Installed Packages
Name   : python
Arch   : i386
Version: 2.3.4
Release: 14.4.el4_6.1
Size   : 20 M
Repo   : installed
Summary: An interpreted, interactive, object-oriented programming language.

I'll try to build python from sources and then get bzr...
 
 
On Tue, Oct 7, 2008 at 18:29, Brian de Alwis <bsd <at> cs.ubc.ca> wrote:
Hi Thomas.

On 7-Oct-2008, at 9:03 AM, Thomas Manson wrote:
 unfortunately it crashes in the same way that bzr cvsps-import does :
 
 
thomas <at> home:~/temp/bzr$ cat ../cvs2svn-tmp/git-blob.dat  ../cvs2svn-tmp/git-dump.dat |  bzr fast-import -
bzr: ERROR: exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 43-45: invalid data

So that indicates bzr thinks that the filenames in the dumpfile are in UTF-8.  Have you checked that the names in the .dat files actually are encoded in UTF-8?  Maybe cvs2svn's dumps aren't re-encoding the filenames?   If not, try fiddling with your LANG/LC_* env vars to match whatever encoding is in use in the file-system  You might get some traction by ensuring your LANG=fr_FR.ISO8859-1, or whatever works on your system.

It might be worth doing an `ls -N | hexdump -C' or something similar to ensure that the filenames are encoded in latin1.

[I tried using cvs2svn to create a dumpfile from a toy project with accents, but gave up after 10 minutes.  I personally used tailor to convert my projects, but none of the files involved accents.]

Brian.

-- 
"Amusement to an observing mind is study." - Benjamin Disraeli



Gmane