[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
|
On 9/29/20 4:59 PM, Ed Bratt wrote:
Hi,
(I thought I had forwarded this to lists, but perhaps I didn't
do it correctly. If you see this more than once, sorry.)
Here's an update on the bug I filed "EE4J
Working Group needs way to monitor Resource Pack
Utilization". In the last comment, the
Web Admins have supplied a couple of spreadsheets showing data
sampling of vCPU counts and memory usage. I've plotted the CPU
usage samples and here is an image for Jakarta EE TCK project:

They provide limits and memory usage, but I don't think those
are actually independent. The bug also lists similar graphs for
Eclipse GlassFish and EclipseLink. Please note that the vCPU
shapes can vary so memory totals seem to be much more generous
for the TCK project.
Here is a similar graph, for Eclipse GlassFish:

We can discuss what these data are implying, but I was asked to
validate the data in these reports so I'm soliciting your
feedback.
My goals were to assess the allocations and to understand if we
were over or under allocated. Then, to use these data to decide
if we needed to fund more, or fewer "Resource Packs" for CY
2021.
Separately, we had asked for a different mechanism for
allocating resource packs across the working group but I don't
have any news on that aspect of this issue.
So, this e-mail is to kick-off some discussion about this and
see if these data will suffice for our monitoring and planning
purposes.
What do you think about the data-capture?
The presented data seems likely to be correct, I see no reasons
to not believe the data at this point.
Here are the average + max Memory/#CpuCores:
avg memory.limit Max Memory average cpu limits
Max CPU
===== =====
====== ========
61.58 Gi 378.00 Gi 12.1
vCPU 74.7 vCPU
There are some cpu/memory limits in Jenkinsfile (https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/Jenkinsfile#L147),
each memory limit is specifying the container/VM memory size
(since we didn't specify the initial memory request setting), so
the calculation is something like:
memory usage = 10Gi per VM * number of test
groups
CPU core = 2 * number of test groups
The data-capture does give us a high level view
of what the container level memory/CPU core usage has been.
Quoting from a previous TCK ml conversation (from David Blevins
with subject: "Resource Pack
Allocations & Maximizing Use"):
"
Over all of EE4J we have 105 resource packs paid for that give us
a total of 210 cpu cores and 840 GB RAM. These resource packs are
dedicated, not elastic. The actual allocation of 105 resource
packs is by project. The biggest allocation is 50 resource packs
to ee4j.jakartaee-tck (this project), the second biggest is 15
resource packs to ee4j.glassfish.
The most critical takeaway from the above is we have 50 resource
packs dedicated to this project giving us a total of 100 cores and
400GB ram at our disposal 24x7. These 50 are bought and paid for
-- we do not save money if we don't use them.
"
So, the Platform TCK is budgeted to use 100 cores and 400GB ram,
however, we haven't used more than 75 CPU cores and 378gb of memory
(as per numbers max memory/cpu numbers pasted above).
I think the fundamental question is: can we manage this
resource, hence the cost, based on these data?
Imo, I think there is memory/cpu tuning that we could do if there
is time to experiment before answers are needed regarding current
usage versus what usage could be.
You are also welcome to review any of the commentary and ask
questions directly via the issue.
I asked on https://bugs.eclipse.org/bugs/show_bug.cgi?id=565098
about measuring usage for a weekend or over a few days. If we can
request that, perhaps we could start a few weekend test runs that
use custom settings from a `git topic branch` to see if we could
reduce our CPU usage from two cores per container, to one core.
As well as reducing our memory limit from 10G per container/VM to
a lower number and tweak some related settings that have to be
adjusted as well.
I'd be happy to expand the distribution on this, but thought
I'd start with those who'd indicated interest and who had been
following the issue(s).
We haven't discussed EE 10 + TCK testing on this mailing list yet
but should soon as that could have an impact on next years CI
usage as well.
Thanks,
Scott
Cheers,
-- Ed
_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev