28 Jan 11:50
Problem getting Xapian working with Burmese
<emmanuel <at> engelhart.org>
2010-01-28 10:50:12 GMT
2010-01-28 10:50:12 GMT
On Fri, Aug 21, 2009 at 02:44:44PM +0200, emmanuel at engelhart.org wrote: >> I want to update my request. >> Is my question bad formulated? too trivial? ... or maybe pretty >> complicated/unclear? > >I think nobody answered as it was hard to follow your example because >the Burmese characters seem to have been mangled (at least the message I >received wasn't valid utf-8). > >But looking at the code, I see an issue: > >> my $db = Search::Xapian::Database->new( './xapdb' ); >> my $enq = $db->enquire( $ARGV[0] ); > >What this does is to create an Enquire object and set Query($ARGV[0]) as >the query. That works OK if $ARGV[0] is a single word which gets >indexed as a single term, but you really want to parse the query string >to get a Query object: > > my $db = Search::Xapian::Database->new( './xapdb' ); > my $queryparser = Search::Xapian::QueryParser->new(); > my $query = $queryparser->parse_query( $ARGV[0] ); > my $enq = $db->enquire( $query ); > >I'd guess that is probably your problem, but I can't tell for sure as I >can't test your examples... > >For further information on debugging this sort of problem, see: > >http://trac.xapian.org/wiki/FAQ/NoMatches > Hi Olly, thank vor your answer (and sorry not having answered before). Your answer helped me and I think I now understand why "it does not work". For test purpose I index one document with one string with index_text_without_positions() (C++ API) the string "ဝီကီပိသုံးစွဲသူများက" See this log: http://tmp.kiwix.org/tmp/kiwix-index.log (utf8 encoded) But if I run "delve -r 1 /path/to/db" on the index I get following answer: Term List for record #1: test က စ ပ မ ဝ သ (utf8 encoded) See the log : http://tmp.kiwix.org/tmp/delve.log So, it seems to be clear for me why "it does not work" : my word is splitted in single lletters and a lot of letters are removed. Do I'm right? Do we can avoid that and index "ဝီကီပိသုံးစွဲသူများက" as only one word? Regards Emmanuel _______________________________________________ Xapian-discuss mailing list Xapian-discuss <at> lists.xapian.org http://lists.xapian.org/mailman/listinfo/xapian-discuss
RSS Feed