[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[cdt-dev] Fwd: Opening and reading less to build the CModel
|
Cross Posted to cdt-core
Looking at the project creation time I was looking at ways to
"defer" some of the IO action which occurs when you open/close/
import a project. Using the ELF parser as an example this is
what happens when you look at a particular file:
if(fileHasSourceEnding()) {
--> Optimized import of source creates object and early outs here.
} else {
--> Extrace the binary parser
--> Open file, read 128 bytes, close file
--> Pass the 128 bytes and filename to binary parser to determine
if this file is a binary object or not
if(isABinaryObjectSaysTheParser) {
--> Pass the filename to the binary parser to create an object
to put into the CModel for this entry
--> For an ELF object this means:
--> Extract the Elf.Attributes to determine if this is a EXE,LIB,etc
--> Opens the file, reads the ELF header, closes the file
--> Create the appropriate container
}
}
Obviously you have to look at the contents in many cases to get an
idea of what the file really is, its architectural attributes etc.
The question is, how can we avoid doing this on project creation,
and can we avoid re-reading the same data over and over again?
In order to optimize this particular case, and to see what effect
it might have, I did a couple of things:
- Only do the binary searching for extensions we know are likely
to contain binary things: {.o,.a,.so,.lib,.exe,.com,.dll ...}
Minor gain, likely not significant with my particular example
compared to extra overhead.
- Cache the results of the array passed to the binary parser if
the match was successfull in anticipation of being asked to
create an object.
In the ELFParser.getBinary() check the cached object and if
it matches, use the data array to attempt to extract the information
needed to create an IBinaryFile()
The results of these two changes:
Old Project Open Time: 6minutes 10sec
New Project Open Time: 3minutes 30sec
Of course since we aren't having to go to disk twice, just once,
and this is the major cost in opening up a new project, the "halving"
factor is about what I expected.
Thoughts and comments? For 2.0 I think that there are a couple of
things we should consider, other than the backgrounding of this
activity which may not be possible:
- Creating a virtual IBinaryFile() container that could defer much of
this work/IO until it is actually needed.
- Augmenting the API for binary parsers to be able to take this
data cache directly rather than having each one take it directly.
- Potentially putting in another check earlier than after we read
the initial 128 bytes so that potentially we can avoid the data
reading all together like we do with the source files.
Thoughts and comments?
Thanks,
Thomas ... preparing a 1.2 patch =;-)