17 May 20:58
Re: UIMA internals memory footprint
Kirk, In this test are you running a CPE or just an AnalysisEngine? If it is a CPE do you know what your CAS Pool size is? When a CAS is created it does allocate a large heap which is then filled as you create annotations. By default I believe this is 500,000 cells (2MB) per CAS, but this can be overridden (see UIMAFramework.getDefaultPerformanceTuningPropeties()). So this can defintely be one source of memory overhead. As you saw it does not grow with larger documents, it will only grow if you create enough annotations to fill up the allocated space. -Adam On 5/17/07, Kirk True <kirk@...> wrote: > Hi all, > > I have begun getting seeing heavy memory use when processing largish > documents through a UIMA pipeline. I wanted to make sure what I'm > seeing with regard to UIMA's internal memory use is on par with > expectations. > > It looks like either for a 1,500,000 byte or a 15,000,000 byte document > with the same annotations (100,000 10-character annotations), we incur > a ~13 MB "overhead" for internal UIMA data structures. Is this in line > with expectations? > > Details: > > In the interest of narrowing down the issue, I made a very simple test > annotator which mimics what my annotators do. The annotator creates a > document of N bytes which is set in a view in the CAS, then it > transforms the bytes to an HTML string that is then set in a view in > the CAS. Next, for each view, the annotator creates 50,000 annotations. > Each annotation has two 5-character attributes. I profiled my > application using two profilers (JProbe and YourKit) and took heap > snapshots before and after processing was performed and saw similar > results. > > I know there's a lot going on under the hood, so I'm trying to get an > idea of what kind of size factor I can expect for a given document > size. Right now, according to my calculations and verified by the > profiler, the expected memory usage for just my data (i.e. the two > views of the document and the strings making up the annotations) is: > > For a 1,500,000 byte document: > > Original document 1,500,000 > HTML document 2,800,000 > TestCaseAnnotation 1,600,000 > Annotation strings 4,800,000 > Annotation char[]s 2,400,000 > Integer 1,600,000 (UIMA internal (Annotation)) > int[] 9,300,000 (UIMA internal) > java.util.HashMap$Entry 2,400,000 (UIMA internal) > ----------------------------------- > 26,400,000 > > For a 15,000,000 byte document: > > Original document 15,000,000 > HTML document 28,000,000 > TestCaseAnnotation 1,600,000 > Annotation strings 4,800,000 > Annotation char[]s 2,400,000 > Integer 1,600,000 (UIMA internal (Annotation)) > int[] 9,300,000 (UIMA internal) > java.util.HashMap$Entry 2,400,000 (UIMA internal) > ----------------------------------- > 65,100,000 > > I can post the code for the test cases if it helps. > > Thanks, > Kirk >
RSS Feed