Tommaso Teofili (JIRA | 4 Jul 12:06 2011
Picon

[jira] [Updated] (UIMA-2110) Turn the HMMTagger class into a more generic class for tagging tasks


     [
https://issues.apache.org/jira/browse/UIMA-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tommaso Teofili updated UIMA-2110:
----------------------------------

    Attachment: UIMA2110updated.patch

I updated the patch, tests run correctly, now I am going to test this patch in a running system

> Turn the HMMTagger class into a more generic class for tagging tasks  
> ----------------------------------------------------------------------
>
>                 Key: UIMA-2110
>                 URL: https://issues.apache.org/jira/browse/UIMA-2110
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-Tagger
>    Affects Versions: 2.3
>         Environment: OS
> Linux version 2.6.32-30-generic (buildd <at> vernadsky) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) )
#59-Ubuntu SMP Tue Mar 1 21:30:21 UTC 2011
> JVM
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>            Reporter: Nicolas Hernandez
>            Priority: Minor
>         Attachments: AMoreGenericHMMTaggerDesc.patch, AMoreGenericHMMTaggerSrcClass.patch, UIMA2110updated.patch
>
>   Original Estimate: 1.5h
>  Remaining Estimate: 1.5h
>
> Despite its name, the code of the org.apache.uima.examples.tagger.HMMTagger 
> class is not totally independant from the pos tagging task. 
> In addition it assumes that the feature path to update with the result of the 
> tagging is org.apache.uima.TokenAnnotation:posTag.
> We propose to let the possibility to users to specify by parameter the feature 
> path to set. This parameter is optional. If it is left free, the tagger will 
> work as usually using the org.apache.uima.TokenAnnotation:posTag as default value.
>  
> By the way, we propose to add three optional parameters : InputView, SentenceType and ModelFile.
> Since the HMM Learner has got the possibility to specify the view to use to 
> train a model, we consequently decide to give the same possibility for the 
> tagger. By default, it works on the _InitialView. It is actually quite useful in practice!
> The org.apache.uima.TokenAnnotation type is not the only annotation type which is assumed 
> to be present in the CAS. Actually, the HMMTagger processes tokens sentence by sentence. It uses the   
> org.apache.uima.SentenceAnnotation to select the tokens. The SentenceType parameter aims at 
> letting the users free to specify their own sentence annotation Type. The default value is 
> org.apache.uima.SentenceAnnotation. 
> The ModelFile parameter is a concurrent way to the resource declaration way to specify a model.
> Left empty, it won t be considered. Otherwise it will predomine over the resource declaration. 
> When specified, the multiple deployement of the tagger cannot be allowed but in practice for the user it
may be easier to configure a parameter through Eclipse.    
> Two distincts patches will be provided, one for the class and the other for the descriptor.
> Future improvement of the class might offer the possibility to create new annotations not only to update
existing ones.  
> Future improvement of the descriptor may dissociate what it is up to the tagger and what it is relevant for
the pos tagger...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Gmane