Re: [hudson-dev] Suggestions for improving Hudson performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [hudson-dev] Suggestions for improving Hudson performance

From: Henrik Lynggaard Hansen <henrik@xxxxxxx>
Date: Wed, 7 Mar 2012 21:22:00 +0100
Delivered-to: hudson-dev@xxxxxxxxxxx
List-archive: <http://dev.eclipse.org/mailman/private/hudson-dev>
List-help: <mailto:hudson-dev-request@eclipse.org?subject=help>
List-subscribe: <http://dev.eclipse.org/mailman/listinfo/hudson-dev>, <mailto:hudson-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <http://dev.eclipse.org/mailman/options/hudson-dev>, <mailto:hudson-dev-request@eclipse.org?subject=unsubscribe>

2012/3/6 Winston Prakash <winston.prakash@xxxxxxxxx>:
> On Feb 22, 2012 10:40 PM, "Winston Prakash" <winston.prakash@xxxxxxxxx>
> wrote:
>>
>> Before I get to the actual suggestions, here are some numbers from our
>> Hudson installation to give you a sense of scale I used
>> http://wiki.hudson-ci.org/display/HUDSON/Monitoring. Perhaps winston
>> or someone can provide similar from the internal oracle instances or
>> the eclipse instances?
>>
>> Number of jobs: around 600, but properly one around 400 active at the
>> moment
>> Number of builds kept: 30.000
>> Number of slaves:60 (many of these are host specific nodes which are
>> inactive for parts of the day)
>> Number of http requests per minute (daytime): average around 1400-1600
>> with peaks of 2500
>> Number of http request served in a day:  730.000 measured but we had
>> to reboot Hudson because it was hanging so I would estimate around
>> 800.000-900.000 would be more realistic
>
>
> I've analyzed several heap dumps from Eclipse foundation Hudson instance. It
> is clear retaining all the builds in the memory has tremendous effect on the
> memory consumption. As you mentioned JUnit are weakly referenced with in the
> builds. However, there are direct reference to them via Jelly tag which
> displays the JUnit which causes the retaining of CaseResults longer than
> intended. This is I found via the GC root analysis.

Perhaps it is dependent on what is actually being displayed on the
pages because if it only shows the count of passed/failed the weak
reference should be GC-able, but with the state of the hudson code I
certainly cannot rule out that there is a strong reference in other
places. I just wasn't visible in my setup, as writting in the "Hudson
Test results being loaded from disk" mail the reverse can also be
true.

> As we discussed in the Governance meeting, we need to visit the concept of
> "lighter metadata in memory".

Agreed

>> 2. Don't generate semi-static content:
>> It looks like semi-static content such as RSS feeds are generated
>> using jelly templates on each request which seems quite expensive.(I
>> suspect rss request make up the bulk of my 730.000 requests). Suggest
>> we move to a model where such
>> resources are written to disk when it changes which is far less often
>> than it is requested.
>>
>> We then make a fast path in stabler where the first thing it does is
>> look into a map of URL, and if the URL is present in the map the file
>> on disk is served directly instead of going through stabler "views".
>>
>> Plugins could even take part in this be writing files to disk and
>> registering the URL to file mapping via a service. I suspect this
>> could speed up things quite a bit since plugins could write all sorts
>> of resources that only needs to be calculated once a build is done
>> like change logs, test reports, dependency graphs etc. (we need to
>> figure out something with the dynamic sidebar but still)
>>
>> (if someone can give me a pointer to the good starting points in the
>> code I would like to give this a go)
>
>
> Your suggestion seems like a good option. I'll look in to the code and let
> you know

Great :-)
>
> IMO, RSS are served on demand. Wondering who is causing the traffic of
> 730.000 requests

We got a bad proxy I think because we are serving a lot of static
content (which we are looking into), the distribution look like this

Total	                          703883	
/static/9562870d	          309991	44.04%
<job>/api/xml	            98553	14.00%
buildHistory/ajax             70881	 10.07%
ajaxBuildQueue	            56494      8.03%
ajaxExecutor                  55863	  7.94%
Internal plugins	            52889	   7.51%
maven-repository-server  14232	   2.02%



>> 3. Don't reserve a thread per executor.
>>
>> As soon as a slave goes on line there is created a executor thread for
>> each executor. There really is no need for this, just create the
>> threads as needed
>
> Executor is a thread. Threads are lightweight, so should not a resource
> contender. However, I do noticed GC roots ending at these Executors.
> Modifying the code to create the thread as needed significantly changes the
> Execution model, but worth looking in to it.

After looking more at the threads I think the bigger issue is the
amount of throw-away threads which gets created each time there is a
new adhoc task like poll a scm. They could properly benefit from using
a threadpool



> IN case of JNLP handshake is done via server socket and the stream channel
> pipe is created using input and output stream of the connected socket.
> However in case of SSH slaves, the channels are bound to the standard output
> and input of the SSH session.  You mean to say we need to try to use server
> socket on both cases and just use SSH only to start/restart the slave?
>
> What do you mean by "master call slave"? Isn't it slave initiates the
> communication in case of Slave restart?

That is what I really don't know as I haven't dug into that part of
the code. I was just wondering if we could eliminate some of the
dedicated threads by having the slave open a socket
to a server socket on the master like you suggest for the SSH slaves.
If we used NIO on the master we could perhaps use a pool instead of
dedicated threds


>> 5. Pre-compile / pre-assemble jelly?
>> I don't know if it is possible but right now the jelly files are
>> re-parsed on request (as far as I can tell), is there a technique
>> where we can precompile sush files or perhaps just pre-resolve all the
>> includes and build a jelly file per page?
>
> I think this is not possible and should not happen. It defeats the purpose
> of server based rendering. However, we need to investigate the option of
> caching in stapler. That is served the cached rendered page, until the cache
> time out expires. Not sure if it is there already.

I think you misunderstood my suggestion,. I haven't dived deeply into
stapler but if I look at the jelly documentation I think the flow is

1. Read the jelly file from disk
2. Prepare the jelly context (essentially provide the data objects)
3. Ask jelly to run the jelly file which involves
3a. Parse the jelly file (using jellycontext.runScript or similar)
3b. Compile the jelly file
3c. run the jelly file

My suggestion is that we split up step #3, and use the JellyContext's
compileScript which return a result based on step 3a and 3b and put
that result in a cache If the jelly file is requested repeatedly we
get the script out of the cache and runs it. That way we would
eliminate any re-parsing and compilation of the jelly file.

Since we are still running the script each time, we are still
rendering the page on each request.

Best regards
Henrik

References:
- [hudson-dev] Suggestions for improving Hudson performance
  - From: Henrik Lynggaard Hansen
- Re: [hudson-dev] Suggestions for improving Hudson performance
  - From: Winston Prakash
- Re: [hudson-dev] Suggestions for improving Hudson performance
  - From: Henrik Lynggaard Hansen
- Re: [hudson-dev] Suggestions for improving Hudson performance
  - From: Winston Prakash

Prev by Date: Re: [hudson-dev] Going to a event driven architecture?
Next by Date: Re: [hudson-dev] Going to a event driven architecture?
Previous by thread: Re: [hudson-dev] Suggestions for improving Hudson performance
Next by thread: [hudson-dev] hudson-ci.org - New Infrastructure Switchover
Index(es):
- Date
- Thread

Breadcrumbs