Eclipse Community Forums: Eclipse Titan » Compilation of big files

Help

Home

Home » Eclipse Projects » Eclipse Titan » Compilation of big files

Show: Today's Messages :: Show Polls :: Message Navigator

Compilation of big files [message #1758825]

Mon, 03 April 2017 14:30

Naum Spaseski

Messages: 81
Registered: February 2016
Location: Sophia Antipolis

Member

Hello everyone,

I noticed that Titan has a problem with the memory when compiling large TTCN-3 modules (actually, Titan implicitly, it's GCC that has a problem - I didn't try clang to report the difference).

For example oneM2M, if I want to compile it, every file will be compiled relatively fast except the oneM2M_Types.ttcn file that contains all types defined in the oneM2M XSD file. This particular file contains 6000+ lines of ttcn3 code. When compiled to C++, it has a file of 1,330,000 lines of code (58.7 MB).

For compiling the above, I use a VM with 8GB RAM and it compiles at approx. 4min. 30sec. If I use the same VM with 6GB RAM (or 4), it will need 40 minutes because it would use the swap memory which is not very fast.

Let's bring this case to the extreme: What kind of PC I have to use if I want to compile the ITS tests (there is one ttcn3 file with security testcases that contains 29800 lines of ttcn3 code)? Very Happy

Is there any way to divide these ttcn3 files inside the TTCN-3 compiler into multiple files in order to speed up the compilation process? Because, as I can see, the C++ optimisation is already in place and it works very good.

Best regards,
Naum

Report message to a moderator

Re: Compilation of big files [message #1758834 is a reply to message #1758825]

Mon, 03 April 2017 15:19

Jeno Attila Balasko

Messages: 80
Registered: September 2013
Location: Budapest, Hungary

Member

Hi Naum,

we have a good news: there is a feature "Code Splitting" which is invented to handle your problem. There is two type of code splitting:
1. code splitting for types and
2. Code splitting defining the max number of splitting (per file).

For details see the referenceguide.doc/pdf, especially chapter 6.1.

Let me copy a detail from the description of the makefilegen:
makefilegen ...
-U none|type|'number'
Generates a Makefile skeleton to be used with the chosen code splitting option. For details see the compiler options in 6.1.1.

and 6.1.1:
compiler ...
-U none|type|'number'
Selects the code splitting mode for the generated code. The option "none" means that the old code generation method will be used. When using the option "type", TITAN will create separate source files for the implementation code of the following types (for each module): sequence, sequence of, set, set of, union. In this case a common header file and a source file holding everything else will also be created. The option can also be a positive number. In that case each file will be split into 'number' smaller files. The compiler tries to create files which have equal size and empty files may be created. The 'number' parameter must be chosen carefully to achieve compilation time decrease. The 'number' parameter should not be larger than the number of the CPU cores. This splitting mode only provides decreased compilation time, if the compilation is parallelized. For example, this can be achieved using the make command's -j flag which needs a number argument that controls how many cores the compilation may use. This number should be equal to the 'number' parameter.

BR

JENŐ BALASKÓ
Software Engineer
Titan Team
Test Competence Center
Operations and Test Solutions
Research & Development
Ericsson Hungary

Report message to a moderator

Re: Compilation of big files [message #1758835 is a reply to message #1758825]

Mon, 03 April 2017 15:24

Elemer Lelik

Messages: 1120
Registered: January 2015

Senior Member

Hi Naum,

yes there is;

please look into the following:

-using the -U option when generating the Makefile/compiling
(see referenceguide 6.1.1, 6.1.2)

-U none|type|'number'
Selects the code splitting mode for the generated code. The option "none"
means that the old code generation method will be used. When using the
option "type", TITAN will create separate source files for the implementation
code of the following types (for each module): sequence, sequence of, set,
set of, union. In this case a common header file and a source file holding
everything else will also be created. The option can also be a positive
number. In that case each file will be split into 'number' smaller files. The
compiler tries to create files which have equal size and empty files may be
created. The 'number' parameter must be chosen carefully to achieve
compilation time decrease. The 'number' parameter should not be larger
than the number of the CPU cores. This splitting mode only provides
decreased compilation time, if the compilation is parallelized. For example,
this can be achieved using the make command's -j flag which needs a
number argument that controls how many cores the compilation may use.
This number should be equal to the 'number' parameter.

so -U type will generate one C++ file for each type in a given module. This is an older option, I'd recommend -U 'number', this is a new one, and splits each file into 'number' pieces, e.g. -U 8 will generate 8 subfiles for each TTCN-3 module, and then the processing parallelisms can be exploited;

-using clang 3.8 also could shave off 10-20 % depending on content

-using the gold linker reduces the linking time significantly (see earlier post
https://www.eclipse.org/forums/index.php/t/1077882/ for both)

Please let us know your experience with the above

BR

Elemer

Report message to a moderator

Re: Compilation of big files [message #1758875 is a reply to message #1758835]

Tue, 04 April 2017 06:10

Kristof Szabados

Messages: 60
Registered: July 2015

Member

Hi Naum,

Just one more note.
Even though Titan does support splitting the generated code into several C++ files ... this will not really solve the initial issue with oneM2M_Types.ttcn (juts hide/delay it)

The last ~40 or so years of programming has shown that placing all of the code into one big file (without care for context/concern/usage) is not really a good program design.
These files are usually very complex, hard to navigate and maintain ... they very often contain code totally unnecessary or erroneous.

While upgrading hardware (faster CPU, more memory, etc... ) is costly ... "upgrading" people (to have more IQ, more information storage capacity, higher tolerance for working with unstructured code, etc... ) might not be possible at all.
Our research so far has shown, that tests as they grow start to show the exact same quality problems, that production code shows.
It might be a good idea to handle this issue, before it totally grows out of control, to re-think that module and separate different concerns/contexts/usages into different files, so that working with them later becomes much easier.

Report message to a moderator

Re: Compilation of big files [message #1758919 is a reply to message #1758875]

Tue, 04 April 2017 15:18

Naum Spaseski

Messages: 81
Registered: February 2016
Location: Sophia Antipolis

Member

Hello everyone,

Thank you everyone for the suggestions, it works better now Smile

I have some suggestions, as the basis for this is already there: is there a possibility to specify a size or number of lines as a separator, or at least a threshold from which the separation will start? Because if one should separate in 2 or 4 pieces, even the tiny modules will be separated into more pieces and this will be counter-productive.

@Kristof, you are right, I will talk to the people that manage oneM2M TTCN-3 code and hope to solve the problem. For ITS, maybe in the future Smile

Best regards,
Naum

Report message to a moderator

Re: Compilation of big files [message #1759506 is a reply to message #1758919]

Wed, 12 April 2017 12:26

Elemer Lelik

Messages: 1120
Registered: January 2015

Senior Member

Hi Naum,

here's some more detail about the code splitting functionality;

I'll try to exemplify it through a concrete case; say we have set number=4, that is, we want to split all generated C++ files into four parts.
First Titan finds the TTCN3 module with the largest generated C++ generated code.
In our example it will be 10000 characters (let's call it MAX).
So the largest generated C++ module contains 10000 characters.
Next Titan calculates the splitting threshold by dividing MAX with 'number', in this case 10000 / 4 = 2500.
Titan will only split the generated C++ files which are larger than 2500 characters.

On the other hand the makefilegen should generate a fix number of files ( in our case 4) , and here's why :
let's consider a situation when one creates a new ttcn file which is almost empty.
makefilegen will initially generate the appropriate Makefile, splitting the C++ in "n" parts, most of them empty.
the file will grow, so these C++ files will also start to fill up as the generated code will be distributed over them.
If only one C++ file would have been generated, this would grow indefinitely , so the advantage of using file splitting would be lost.
If the number of C++ files would not be fixed, the Makefile would be in need to be regenerated every now and then, which is to be avoided.

OK, let's continue with our example.

Say we have three TTCN3 modules:
• My_Types.ttcn with a generated C++ code of 10000 characters (MAX)
• My_Functions.ttcn with a generated C++ code of 6000 characters
• My_Constants.ttcn with a generated C++ code of 1000 characters

When we execute the command

compiler -U 4 My_Types.ttcn My_Functions.ttcn My_Constants.ttcn

the following C++ source files will be generated:

• My_Types_part_1.cc (contains approximately 2500 characters)
• My_Types_part_2.cc (contains approximately 2500 characters)
• My_Types_part_3.cc (contains approximately 2500 characters)
• My_Types_part_4.cc (contains approximately 2500 characters)

• My_Functions_part_1.cc (contains approximately 2500 characters)
• My_Functions_part_2.cc (contains approximately 2500 characters)
• My_Functions_part_3.cc (contains approximately 1000 characters)
• My_Functions_part_4.cc (contains approximately 0 effective characters)

• My_Constants_part_1.cc (contains approximately 1000 characters)
• My_Constants_part_2.cc (contains approximately 0 effective characters)
• My_Constants_part_3.cc (contains approximately 0 effective characters)
• My_Constants_part_4.cc (contains approximately 0 effective characters)

If the TTCN-3 file sizes change, the generated code size will change, the threshold will be recalculated and the generated code redistributed accordingly.

I hope this bring some clarity upon how Titan's code splitting works.

BR

Elemer

Report message to a moderator

Previous Topic:	Using TLS/DTLS with Titan test ports part 5
Next Topic:	Using TLS/DTLS with Titan test ports part 6

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 18 18:19:35 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter