Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: sushil ronghe <ronghester-Re5JQEeQqe8AvxtiuMwx3w <at> public.gmane.org>
Subject: empty paragraph
Newsgroups: gmane.comp.nlp.moses.user
Date: Wednesday 3rd September 2008 14:49:47 UTC (over 8 years ago)
hi,

while doing sentence alignment for english and spanish (en es)
i got several (error?)  messages like this

ep-99-10-06.txt (speaker 78) different number of paragraphs 9 != 13
> ep-99-10-06.txt (speaker 87) different number of paragraphs 8 != 9
> ep-99-10-06.txt (speaker 113) different number of paragraphs 8 != 9
> ep-99-10-06.txt (speaker 170) different number of paragraphs 8 != 7
> ep-99-10-06.txt (speaker 171) different number of paragraphs 14 != 16
> ep-99-10-06.txt (speaker 181) different number of paragraphs 4 != 3
> ep-99-10-06.txt (speaker 219) different number of paragraphs 8 != 7
> Warning: No known abbreviations for this language
>

THen i compared the text in file 99-10-06 for both the languages

English
>
> 
> Ladies and gentlemen, as you can well imagine, this is neither the time
nor
> the place to start a debate. In fact, the vote is under way.
> 

> (Parliament adopted the decision) >

> Report (A5-0017/1999) by Mr H.-P. Martin, on behalf of the Committee on > Industry, External Trade, Research and Energy, on the proposal for a Council > Decision providing further macro-financial assistance to Bulgaria > (COM(1999)403 - C5-0098/1999 - 1999/0165(CNS)) >

> (Parliament adopted the legislative resolution) >

> Report (A5-0018/1999) by Mr H.-P. Martin, on behalf of the Committee on > Industry, External Trade, Research and Energy, on the proposal for a Council > Decision providing supplementary macro-financial assistance to the former > Yugoslav Republic of Macedonia (COM(1999)404 - C5-0099/1999 - > 1999/0166(CNS)) >

> (Parliament adopted the legislative resolution) >

> Report (A5-0019/1999) by Mr H.-P. Martin, on behalf of the Committee on > Industry, External Trade, Research and Energy, on the proposal for a Council > Decision providing supplementary macro-financial assistance to Romania > (COM(1999)405 - C5-0097/1999 - 1999/0167(CNS)) >

> (Parliament adopted the legislative resolution) >

> Joint motion for resolution on the International AIDS Conference in Zambia > > > spanish > > > Señorías, como pueden suponer, no es ni el lugar ni el momento de iniciar > un debate. Estamos procediendo a la votación. >

> (El Parlamento aprueba la decisión) >

> >

> Informe (A5-0017/1999) del Sr. H.-P. Martin, en nombre de la Comisión de > Industria, Comercio Exterior, Investigación y Energía, sobre la propuesta de > decisión del Consejo por la que se concede una ayuda macrofinanciera > suplementaria a Bulgaria (COM(1999)403 - C5-0098/1999 - 1999/0165(CNS)) >

> (El Parlamento aprueba la resolución legislativa) >

> >

> Informe (A5-0018/1999) del Sr. H.-P. Martin, en nombre de la Comisión de > Industria, Comercio Exterior, Investigación y Energía, sobre la propuesta de > decisión del Consejo por la que se concede una ayuda macrofinanciera > suplementaria a la Antigua República Yugoslava de Macedonia (COM(1999)404 - > C5-0099/1999 - 1999/0166(CNS)) >

> (El Parlamento aprueba la resolución legislativa) >

> >

> Informe (A5-0019/1999) del Sr. H.-P. Martin, en nombre de la Comisión de > Industria, Comercio Exterior, Investigación y Energía, sobre la propuesta de > decisión del Consejo por la que se concede una ayuda macrofinanciera > suplementaria a Rumania (COM(1999)405 - C5-0097/1999 - 1999/0167(CNS)) >

> (El Parlamento aprueba la resolución legislativa) >

> >

> Propuesta de resolución común sobre la Conferencia Internacional sobre el > sida en Lusaka > we can see the cause of the error :Spanish content is having extra

tokens but they are empty . After the alignment i observed these file and found that though the error log was shown the content is still present in aligned files.. see the same portion in aligned files... English: > > > Ladies and gentlemen , as you can well imagine , this is neither the time > nor the place to start a debate . > In fact , the vote is under way . >

> ( Parliament adopted the decision ) >

> Report ( A5-0017 / 1999 ) by Mr H.-P. Martin , on behalf of the Committee > on Industry , External Trade , Research and Energy , on the proposal for a > Council Decision providing further macro-financial assistance to Bulgaria ( > COM ( 1999 ) 403 - C5-0098 / 1999 - 1999 / 0165 ( CNS ) ) >

> ( Parliament adopted the legislative resolution ) >

> Report ( A5-0018 / 1999 ) by Mr H.-P. Martin , on behalf of the Committee > on Industry , External Trade , Research and Energy , on the proposal for a > Council Decision providing supplementary macro-financial assistance to the > former Yugoslav Republic of Macedonia ( COM ( 1999 ) 404 - C5-0099 / 1999 - > 1999 / 0166 ( CNS ) ) >

> ( Parliament adopted the legislative resolution ) >

> Report ( A5-0019 / 1999 ) by Mr H.-P. Martin , on behalf of the Committee > on Industry , External Trade , Research and Energy , on the proposal for a > Council Decision providing supplementary macro-financial assistance to > Romania ( COM ( 1999 ) 405 - C5-0097 / 1999 - 1999 / 0167 ( CNS ) ) >

> ( Parliament adopted the legislative resolution ) >

> Joint motion for resolution on the International AIDS Conference in Zambia > > > spanish: > > > Señorías , como pueden suponer , no es ni el lugar ni el momento de iniciar > un debate . > Estamos procediendo a la votación . >

> ( El Parlamento aprueba la decisión ) >

> >

> Informe ( A5-0017 / 1999 ) del Sr . H.-P. Martin , en nombre de la Comisión > de Industria , Comercio Exterior , Investigación y Energía , sobre la > propuesta de decisión del Consejo por la que se concede una ayuda > macrofinanciera suplementaria a Bulgaria ( COM ( 1999 ) 403 - C5-0098 / 1999 > - 1999 / 0165 ( CNS ) ) >

> ( El Parlamento aprueba la resolución legislativa ) >

> >

> Informe ( A5-0018 / 1999 ) del Sr . H.-P. Martin , en nombre de la Comisión > de Industria , Comercio Exterior , Investigación y Energía , sobre la > propuesta de decisión del Consejo por la que se concede una ayuda > macrofinanciera suplementaria a la Antigua República Yugoslava de Macedonia > ( COM ( 1999 ) 404 - C5-0099 / 1999 - 1999 / 0166 ( CNS ) ) >

> ( El Parlamento aprueba la resolución legislativa ) >

> Questions: -> Does it mean that the aligned files i have generated are not suitable for training the model? -> Can we modify the pre-precessing script to replace the empty paragraphs? Thanks -- ******************************** sushil ronghe *********************************

 
CD: 3ms