[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| Re: [che-dev] Single Host on OpenShift | 
Thanks Lukas, good summary. Is this summary of findings and decision captured in a GH issue so that it's visible for those outside the mailing list, and can be revisited if needed?
Hi all,
So we finally concluded all of our tests. Since the last time, where I 
informed you about our ruling out of Envoy because of somewhat surprisingly 
slower performance under high load compared to our other candidate reverse 
proxies and also somewhat more difficult debuggability of problems (due to the 
distributed nature of configuration), we've also ruled our HAProxy from our 
list of candidates mainly because of ease of use concerns.
That left us with having to make a choice between Traefik and Nginx. For that 
we felt it was necessary to confirm our original findings with another run of 
performance tests, because we saw some oddly high response times with Nginx 
leading us to think there might have been some environmental influence. Also, 
having more data would give us more confidence in our findings.
We found this:
1) With the increasing static load, Nginx has less and less performance 
advantage over Traefik and under a very high load, Nginx starts to show rare 
but severe erratic behavior (a couple of requests lasting over 16 minutes, 
high ratio of error responses in short time bursts (500 or even corrupt 
responses)).
2) When dynamically reconfiguring the reverse proxies (to simulate adding new 
workspaces), Traefik seems to have a slight edge over Nginx:
  a) Nginx again showing some odd outliers making its p99 response time a 3rd 
slower than Traefik (while p95 is roughly the same).
  b) Traefik is faster in establishing a new route but the difference is 
getting smaller with the increased static load on the servers.
3) Nginx seems to be slightly faster at handling websocket traffic.
4) Traefik cannot correctly handle path rewrites in Set-Cookie headers, while 
Nginx can. After discussing this, we concluded that this is not a blocker for 
us because applications generally need to be aware whether they are being 
deployed behind a reverse proxy and need to handle this in a way that Traefik 
supports (X-Forwarded-For headers and the like).
Given the overall comparable results with both of the solutions, we decided 
for Traefik because of its more predictable and stable performance and ease of 
use.
At the same time, given the similarities in how the two solutions are 
configured, we feel confident that if we needed to change our minds later when 
we properly integrated the solution into Che as a whole, it would not be 
difficult to swap them around.
Lukas
On Thursday, July 2, 2020 12:55:43 PM CEST Lukas Krejci wrote:
> To follow up on this,
> 
> we have finally finished our performance tests with Envoy and while it
> offers very nice option for dynamic reconfiguration we have found it
> performing significantly slower under highly dynamic load (e.g. when we
> simulated adding new workspaces) than the others (traefik, nginx, haproxy).
> 
> We have not yet made a team decision but IMHO for the reasons above, we're
> going to be looking at the other alternatives.
> 
> On Saturday, June 6, 2020 9:47:09 PM CEST Lukas Krejci wrote:
> > We have not! I will definitely look into it.
> > 
> > On Saturday, June 6, 2020 9:22:10 AM CEST Gorkem Ercan wrote:
> > > Have you considered Envoy[1] as an alternative?
> > > Knative Kourier has a similar usage which uses envoy underneath.
> > > 
> > > [1] https://www.envoyproxy.io/
> > > [2] https://github.com/knative/net-kourier
> > > 
> > > On Tue, Jun 2, 2020 at 8:22 AM Lukas Krejci <lkrejci@xxxxxxxxxx> wrote:
> > > > Hi all,
> > > > 
> > > > I am following up on the topic of enabling single-host on OpenShift.
> > > > 
> > > > We have concluded the performance tests and I would like to present to
> > > > you
> > > > the
> > > > results that we have found.
> > > > 
> > > > tl;dr There is no clear winning solution.
> > > > 
> > > > In our testing we concentrated on 3 areas. The performance of routing
> > > > of
> > > > the
> > > > HTTP traffic, performance of Websocket communication and correct
> > > > handling
> > > > of
> > > > cookies under path rewriting.
> > > > 
> > > > We were trying to choose between 3 candidates for the HTTP gateway
> > > > that
> > > > we
> > > > identified in the prior POCs:
> > > > 
> > > > * HAProxy
> > > > * Nginx
> > > > * Traefik
> > > > 
> > > > Unfortunately, none of them came out of our testing with flying
> > > > colors.
> > > > 
> > > > Generally, HTTP and websockets have somewhat unsurprisingly very
> > > > similar
> > > > performance profile in each of the solutions so I won't be discussing
> > > > them
> > > > separately.
> > > > 
> > > > == HAProxy
> > > > Pros:
> > > > * fast and hardware-efficient even under high load
> > > > Cons:
> > > > * Some issues with live reconfiguration
> > > > ** The slowest to establish a new route within the gateway
> > > > ** Rare routing errors
> > > > 
> > > > == Nginx
> > > > Pros:
> > > > * fast and hardware-efficient under moderate load
> > > > * Stable under live reconfiguration
> > > > Cons:
> > > > * "Flappy" performance under high load - high variance in response
> > > > times
> > > > * Rare routing errors under high load
> > > > 
> > > > == Traefik
> > > > Pros:
> > > > * Performant
> > > > * Best support for live reconfiguration
> > > > * Support for OAuth and other "modern" features that we could take
> > > > advantage
> > > > of in the future
> > > > Cons:
> > > > * BLOCKER - incorrect handling of cookies defined on a specific path.
> > > > Such
> > > > cookie paths are not rewritten along with requests. This is
> > > > essentially
> > > > a
> > > > security issue because it would enable auth cookie
> > > > overwriting/stealing.
> > > > * Higher hardware requirements (especially CPU under higher load)
> > > > 
> > > > Our current favorite is Traefik despite the blocking issue of
> > > > incorrect
> > > > cookie
> > > > handling. We think it might be worth trying to fix that and get a
> > > > solution
> > > > that seems to be the most stable of the 3. If fixing Traefik proves
> > > > too
> > > > difficult, our second choice would probably be nginx but that would
> > > > require
> > > > further testing.
> > > > 
> > > > We will present our findings with all the fancy graphs and discussion
> > > > on
> > > > the
> > > > next community call.
> > > > 
> > > > We have now concluded our performance testing though and are moving
> > > > forward
> > > > with the actual implementation (and will soon pick the gateway
> > > > solution).
> > > > 
> > > > We'd appreciate your feedback and advice on any of the above detailed
> > > > pros
> > > > or
> > > > cons.
> > > > 
> > > > You can check our progress on this epic at
> > > > https://github.com/eclipse/che/issues/12914.
> > > > 
> > > > Thanks,
> > > > 
> > > > Lukas
> > > > 
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > che-dev mailing list
> > > > che-dev@xxxxxxxxxxx
> > > > To unsubscribe from this list, visit
> > > > https://www.eclipse.org/mailman/listinfo/che-dev
_______________________________________________
che-dev mailing list
che-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/che-dev