Anjaly | 1 Oct 2007 11:02
Picon

Re: u32regex_search crashes

I am sorry the last message had an mistake.I wanted to say that I want
to do a search that would take all the data as though it is  Utf32
rather than utf8 ( as i incorrectly wrote). I don't know whether i am
making myself clear (I am not very good in expressing the opnion).

What i really want to do is a unicode search on the available data.

						Anjaly G S

On Mon, 2007-10-01 at 09:42 +0100, John Maddock wrote:
> Anjaly wrote:
> > In the regex document it was said that the size of data type of the
> > variable passed to the make_u32regex  that determines character
> > encoding (utf8,utf16 or utf32) .
> 
> *For construction of the regex object*.
> 
> The search algorithms operate independently on any of UTF8/16/32.
> 
> > I passed wchar_t (which i think size
> > is 4) so that the buffer encoding is considered as utf8  by
> > u32regex_search irrespectively.  Actually i am trying to do a utf8
> > search.
> 
> Except the data file you sent *was not valid UTF8* !
> 
> It looks like it's probably UTF16LE, it's up to you in that case to decode 
> the byte order mark and read the text into something that Boost.Regex can 
> handle (for example platform-native UTF16).  ICU should have some file IO 
> routines for doing that kind of thing: for example for loading a file into a 
> UnicodeString type.
> 
> HTH, John. 
> 
> _______________________________________________
> Boost-users mailing list
> Boost-users <at> lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/boost-users

______________________________________
Scanned and protected by Email scanner

Gmane