11 Oct 2007 12:22
Re: Tweak stemmer: (say the German stemmer)
Martin Porter <martin.porter <at> grapeshot.co.uk>
2007-10-11 10:22:54 GMT
2007-10-11 10:22:54 GMT
Santosh, (I hope I understand your email correctly. Anyway, here goes with an answer ...) The stemmers do not replace the language specific characters with 'English' equivalents as a standard operation, but will occasionally remove accents as part of the stemming process. For example, in German, removal of umlaut in the last syllable will help conflate singular and plural forms. The German eszet (double s) is similarly usefully replaced by 'ss'. If you feel that the normalisation is incorrect, the solution is to modify the snowball scripts and recompile. Incidentally we have also had the opposite suggestion to your own, that, in Spanish for example, it would be be better for the stemmer to remove ALL accents, since their use is not so very consistent in the written language. Italian acute/grave usage is very variable, and the Snowball Italian stemmer maps both to the same form. But the general approach in the Snowball stemmers is to leave accents alone unless their is a clear reason for removing or altering them. Martin On Sat, 2007-10-06 at 14:28 +0530, Santosh Pai wrote: > Hi, > > > > Using the snowball stemmer is providing me with great results. > > I had a question though… > > > > For some languages (say German, French), the stemmer replaces the > special language specific characters with corresponding English > equivalents > > For eg in German: > > > > ä àa > > ö ào > > ü àu > > Ä àA > > Ö àO > > Ü àU > > ß àss” > > > > Is there a way to preserve the special characters in the stems? > > > > Thanks > > > > > > Santosh Pai > > > 6th Floor, Windsor > Manor, > > Baner Road, > Pune, INDIA > > www.quagnito.com > > > E-Mail : > Tel. : > > Mobile : > > > > > santosh <at> quagnito.com > +91 20 27292039 > +91 98501 60015 > > > > > > _______________________________________________ > Snowball-discuss mailing list > Snowball-discuss <at> lists.tartarus.org > http://lists.tartarus.org/mailman/listinfo/snowball-discuss
RSS Feed