|Performance problems [message #1070606]
||Thu, 18 July 2013 10:02
| Fabrizio Giudici
Registered: May 2013
Since a couple of weeks my Hudson is experiencing some performance problems that I can't understand. At first I thought I had a disk that was going to fault (it's part of a RAID 1 setup, and in my (limited) understanding of this stuff a disk unit that experience troubles could slow down the overall performance). After a few investigation I'm less and less sure of this explanation, so - while I can't find out why the problem arose all of a sudden, without changes in my configuration - I'm now trying to first focus on Hudson performance.
In short, a bunch of nightly jobs that started at 3 AM and were mostly completed within 8/9 AM now take until 1/2 PM - from the previous 5/6 hours we're now at 10/11, thus the time doubled.
Some numbers first:
* I have about 30 projects (all Maven)
* For each project, a "Compile from Scratch" job is started, trying to compile from a clean repo
* For each project, a "Metrics" job is started, which compiles with QA activated
* For each project, an archival project is started, which just collects the result of Metrics and archives them.
For what concerns the server, I copy the specs from my provider:
* Ubuntu 12.04 x64
* Intel Core i7-920 Quad Core
* 8 GB DDR3 RAM
* 2x 750 GB SATA II HDD in a RAID arrangement
Hudson is configured to have 6 parallel workers.
I've been monitoring the thing with top and iotop. The "Compile from Scratch" are very slow, but this is expected as they are mostly network-bound, having to download from scratch all the Maven artifacts from the internet (not from a local proxy, as the other jobs). When they are running, the load is pretty low (about 3) and the machine responsive. When Metrics jobs are active, the load is around 20/30 (with peaks of 40/50) and the matchine is mostly unresponsive. The CPU usage is very low (way below 10%) and iotop shows that the machine is disk bound. While this is expected, given the very nature of the jobs, I only see peaks of a few MB/sec in the disk I/O - SATA II disks are quite obsolete, still they should warranty at least one magnitudo better speed. All the 8GB of RAM are used, with just a few of swap memory used (about 250k) and iotop report absolutely no swapping.
Looking at the logs of Jobs, I see something like this:
11:06:45 Started by upstream job "Zephyr_Metrics" build number 34
11:06:47 Restoring workspace from build #108 of project Zephyr_Compile_and_Test_and_Deploy_Local_Snapshot
11:42:24 [JDK_1.7.0_21] $ /home/tomcat/Workspaces/Hudson/tools/Maven_3.0.4/bin/mvn ...
That is, it took more than 30 minutes just to retrieve the workspace from a previous jobs. During this phase, I could see Java threads on I/O with speeds of 150/200 Kb/sec - I'd expect that since they are just reading a workspace, they should be just reading from disk - why so slow?
Remember, my server could be obsolete, but up to a couple of weeks ago Hudson was 2x faster - what could be happened?
Thanks for any hint.
[Updated on: Thu, 18 July 2013 10:23]
Report message to a moderator
Powered by FUDForum
. Page generated in 0.01621 seconds