Disk cached datasets [message #855233] |
Tue, 24 April 2012 16:21 |
john mcteague Messages: 15 Registered: July 2009 |
Junior Member |
|
|
Im experimenting with the DataEngine.MEMORY_BUFFER_SIZE settings in order to manage large datasets in a more memory efficent manner. I am using the java api to integrate BIRT.
I notice that when I set the MEMORY_BUFFER_SIZE I get both a goalFile and data.data file of approx equal sizes generated in my temporary directory. Given the size of my datasets these are in excess of 200MB each so keeping the under control would be good.
The directory structure that is generated is as follows:
Directory of C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4
24/04/2012 17:12 <DIR> .
24/04/2012 17:12 <DIR> ..
24/04/2012 17:12 <DIR> BirtDataTemp13352837620643
24/04/2012 17:12 <DIR> BirtDataTemp13352839510834
24/04/2012 17:12 <DIR> DataSetCacheObject_1558846527_3
0 File(s) 0 bytes
Directory of C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\BirtDataTemp13352837620643
24/04/2012 17:12 <DIR> .
24/04/2012 17:12 <DIR> ..
0 File(s) 0 bytes
Directory of C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\BirtDataTemp13352839510834
24/04/2012 17:12 <DIR> .
24/04/2012 17:12 <DIR> ..
24/04/2012 17:12 <DIR> session_13352839511324
0 File(s) 0 bytes
Directory of C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\BirtDataTemp13352839510834\session_133528395113
24
24/04/2012 17:12 <DIR> .
24/04/2012 17:12 <DIR> ..
24/04/2012 17:12 214,996,498 goalFile
1 File(s) 214,996,498 bytes
Directory of C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\DataSetCacheObject_1558846527_3
24/04/2012 17:12 <DIR> .
24/04/2012 17:12 <DIR> ..
24/04/2012 17:12 216,631,128 data.data
24/04/2012 17:12 6,809 meta.data
24/04/2012 17:12 21 time.data
3 File(s) 216,637,958 bytes
In addition, I am storing the rptdocument on disk to enable exporting in different formats and paging of HTML data (its over 450MB itself). So all together, each time I run the report I am consuming up to 1GB of temp space (some of which doesnt get deleted until the server reboots - bugs.eclipse.org/bugs/show_bug.cgi?id=369172)
So my first question is, what is the relationship between goalFile and data.data. What triggers the creation of these files and why are there two?
I have seen data.data and its related files created in other scenarios but only since I have started using the memory buffer option has goalFile appeared.
Thanks,
John
|
|
|
Re: Disk cached datasets [message #856344 is a reply to message #855233] |
Wed, 25 April 2012 15:24 |
|
John,
Caching only occurs when multiple passes of the data are required or
datasets are used multiple times
If you set this value below, to some value and your data set exceeds
it the data will start to write to disk.
/**
* Indicate the size of data cached for each result set.We only accept
non-negative integer as input,
* the unit of which would be MB.
* If this setting is 0, all temporary rows will be cached in memory
during query processing.
*/
public static String MEMORY_BUFFER_SIZE =
"org.eclipse.birt.data.query.ResultBufferSize";
This happens for grouping and aggregations and will create the files in
the temp directory under DataEngine_code/BirtDataTempcode where code is
unique. You set the value in MB.
By default if you re-use a data set a different cache is created of the
processed results in the directory:
DataEngine_code/DataSetCacheObject_code/multiple files
This operation is written to disk. If you set
reportContext.getAppContext().put("org.eclipse.birt.data.cache.memory",
new Integer(-1)) All of the rows will be written to memory instead of
the above files and nothing will get written to
DataEngine_code/DataSetCacheObject_code
public static String MEMORY_DATA_SET_CACHE =
"org.eclipse.birt.data.cache.memory";
0 value will just write to disk, postive value will only write that
number of rows to memory.
MEMORY_USAGE_AGGRESSIVE
This setting just initializes list maxes that will determine if large
files with many aggregation and grouping components will get written to
disk. Take a look at the
BasicCachedList class that is used for aggregations. If you use
crosstabs or other heavy aggregation items they get cached.
Jason
On 4/24/2012 12:21 PM, john mcteague wrote:
> Im experimenting with the DataEngine.MEMORY_BUFFER_SIZE settings in
> order to manage large datasets in a more memory efficent manner. I am
> using the java api to integrate BIRT.
>
> I notice that when I set the MEMORY_BUFFER_SIZE I get both a goalFile
> and data.data file of approx equal sizes generated in my temporary
> directory. Given the size of my datasets these are in excess of 200MB
> each so keeping the under control would be good.
>
> The directory structure that is generated is as follows:
> Directory of C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4
>
> 24/04/2012 17:12 <DIR> .
> 24/04/2012 17:12 <DIR> ..
> 24/04/2012 17:12 <DIR> BirtDataTemp13352837620643
> 24/04/2012 17:12 <DIR> BirtDataTemp13352839510834
> 24/04/2012 17:12 <DIR> DataSetCacheObject_1558846527_3
> 0 File(s) 0 bytes
>
> Directory of
> C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\BirtDataTemp13352837620643
>
>
> 24/04/2012 17:12 <DIR> .
> 24/04/2012 17:12 <DIR> ..
> 0 File(s) 0 bytes
>
> Directory of
> C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\BirtDataTemp13352839510834
>
>
> 24/04/2012 17:12 <DIR> .
> 24/04/2012 17:12 <DIR> ..
> 24/04/2012 17:12 <DIR> session_13352839511324
> 0 File(s) 0 bytes
>
> Directory of
> C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\BirtDataTemp13352839510834\session_133528395113
>
> 24
>
> 24/04/2012 17:12 <DIR> .
> 24/04/2012 17:12 <DIR> ..
> 24/04/2012 17:12 214,996,498 goalFile
> 1 File(s) 214,996,498 bytes
>
> Directory of
> C:\Users\joe\AppData\Local\Temp\DataEngine_619898374_4\DataSetCacheObject_1558846527_3
>
>
> 24/04/2012 17:12 <DIR> .
> 24/04/2012 17:12 <DIR> ..
> 24/04/2012 17:12 216,631,128 data.data
> 24/04/2012 17:12 6,809 meta.data
> 24/04/2012 17:12 21 time.data
> 3 File(s) 216,637,958 bytes
>
>
> In addition, I am storing the rptdocument on disk to enable exporting in
> different formats and paging of HTML data (its over 450MB itself). So
> all together, each time I run the report I am consuming up to 1GB of
> temp space (some of which doesnt get deleted until the server reboots -
> bugs.eclipse.org/bugs/show_bug.cgi?id=369172)
>
> So my first question is, what is the relationship between goalFile and
> data.data. What triggers the creation of these files and why are there two?
>
> I have seen data.data and its related files created in other scenarios
> but only since I have started using the memory buffer option has
> goalFile appeared.
>
> Thanks,
> John
|
|
|
Powered by
FUDForum. Page generated in 0.02021 seconds