Re: [glassfish-dev] Tracking usage data for EE4J working group CI cloud

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems

From: Scott Marlow <smarlow@xxxxxxxxxx>
Date: Tue, 29 Sep 2020 18:00:58 -0400
Delivered-to: glassfish-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/glassfish-dev>
List-help: <mailto:glassfish-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/glassfish-dev>, <mailto:glassfish-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/glassfish-dev>, <mailto:glassfish-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

On 9/29/20 4:59 PM, Ed Bratt wrote:

Hi,

(I thought I had forwarded this to lists, but perhaps I didn't do it correctly. If you see this more than once, sorry.)

Here's an update on the bug I filed "EE4J Working Group needs way to monitor Resource Pack Utilization". In the last comment, the Web Admins have supplied a couple of spreadsheets showing data sampling of vCPU counts and memory usage. I've plotted the CPU usage samples and here is an image for Jakarta EE TCK project:

They provide limits and memory usage, but I don't think those are actually independent. The bug also lists similar graphs for Eclipse GlassFish and EclipseLink. Please note that the vCPU shapes can vary so memory totals seem to be much more generous for the TCK project.

Here is a similar graph, for Eclipse GlassFish:

We can discuss what these data are implying, but I was asked to validate the data in these reports so I'm soliciting your feedback.

My goals were to assess the allocations and to understand if we were over or under allocated. Then, to use these data to decide if we needed to fund more, or fewer "Resource Packs" for CY 2021.

Separately, we had asked for a different mechanism for allocating resource packs across the working group but I don't have any news on that aspect of this issue.

So, this e-mail is to kick-off some discussion about this and see if these data will suffice for our monitoring and planning purposes.

What do you think about the data-capture?

The presented data seems likely to be correct, I see no reasons to not believe the data at this point.

Here are the average + max Memory/#CpuCores:

avg memory.limit    Max Memory        average cpu limits       Max CPU
=====    =====                ======                    ========
61.58 Gi                   378.00 Gi             12.1 vCPU                    74.7 vCPU

There are some cpu/memory limits in Jenkinsfile (https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/Jenkinsfile#L147), each memory limit is specifying the container/VM memory size (since we didn't specify the initial memory request setting), so the calculation is something like:

memory usage = 10Gi per VM * number of test groups

CPU core = 2 * number of test groups

The data-capture does give us a high level view of what the container level memory/CPU core usage has been. Quoting from a previous TCK ml conversation (from David Blevins with subject: "Resource Pack Allocations & Maximizing Use"):

Over all of EE4J we have 105 resource packs paid for that give us a total of 210 cpu cores and 840 GB RAM. These resource packs are dedicated, not elastic. The actual allocation of 105 resource packs is by project. The biggest allocation is 50 resource packs to ee4j.jakartaee-tck (this project), the second biggest is 15 resource packs to ee4j.glassfish.

The most critical takeaway from the above is we have 50 resource packs dedicated to this project giving us a total of 100 cores and 400GB ram at our disposal 24x7. These 50 are bought and paid for -- we do not save money if we don't use them.

So, the Platform TCK is budgeted to use 100 cores and 400GB ram, however, we haven't used more than 75 CPU cores and 378gb of memory (as per numbers max memory/cpu numbers pasted above).

I think the fundamental question is: can we manage this resource, hence the cost, based on these data?

Imo, I think there is memory/cpu tuning that we could do if there is time to experiment before answers are needed regarding current usage versus what usage could be.

You are also welcome to review any of the commentary and ask questions directly via the issue.

I asked on https://bugs.eclipse.org/bugs/show_bug.cgi?id=565098 about measuring usage for a weekend or over a few days. If we can request that, perhaps we could start a few weekend test runs that use custom settings from a `git topic branch` to see if we could reduce our CPU usage from two cores per container, to one core. As well as reducing our memory limit from 10G per container/VM to a lower number and tweak some related settings that have to be adjusted as well.

I'd be happy to expand the distribution on this, but thought I'd start with those who'd indicated interest and who had been following the issue(s).

We haven't discussed EE 10 + TCK testing on this mailing list yet but should soon as that could have an impact on next years CI usage as well.

Thanks,

Scott

Cheers,

-- Ed

_______________________________________________
glassfish-dev mailing list
glassfish-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/glassfish-dev

Follow-Ups:
- Re: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Scott Marlow

References:
- [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
  - From: Ed Bratt

Prev by Date: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
Next by Date: [glassfish-dev] Glassfish nightly build issue
Previous by thread: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
Next by thread: Re: [glassfish-dev] Tracking usage data for EE4J working group CI cloud systems
Index(es):
- Date
- Thread

Breadcrumbs