[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [che-dev] Single Host on OpenShift
|
Hi all,
So we finally concluded all of our tests. Since the last time, where I
informed you about our ruling out of Envoy because of somewhat surprisingly
slower performance under high load compared to our other candidate reverse
proxies and also somewhat more difficult debuggability of problems (due to the
distributed nature of configuration), we've also ruled our HAProxy from our
list of candidates mainly because of ease of use concerns.
That left us with having to make a choice between Traefik and Nginx. For that
we felt it was necessary to confirm our original findings with another run of
performance tests, because we saw some oddly high response times with Nginx
leading us to think there might have been some environmental influence. Also,
having more data would give us more confidence in our findings.
We found this:
1) With the increasing static load, Nginx has less and less performance
advantage over Traefik and under a very high load, Nginx starts to show rare
but severe erratic behavior (a couple of requests lasting over 16 minutes,
high ratio of error responses in short time bursts (500 or even corrupt
responses)).
2) When dynamically reconfiguring the reverse proxies (to simulate adding new
workspaces), Traefik seems to have a slight edge over Nginx:
a) Nginx again showing some odd outliers making its p99 response time a 3rd
slower than Traefik (while p95 is roughly the same).
b) Traefik is faster in establishing a new route but the difference is
getting smaller with the increased static load on the servers.
3) Nginx seems to be slightly faster at handling websocket traffic.
4) Traefik cannot correctly handle path rewrites in Set-Cookie headers, while
Nginx can. After discussing this, we concluded that this is not a blocker for
us because applications generally need to be aware whether they are being
deployed behind a reverse proxy and need to handle this in a way that Traefik
supports (X-Forwarded-For headers and the like).
Given the overall comparable results with both of the solutions, we decided
for Traefik because of its more predictable and stable performance and ease of
use.
At the same time, given the similarities in how the two solutions are
configured, we feel confident that if we needed to change our minds later when
we properly integrated the solution into Che as a whole, it would not be
difficult to swap them around.
Lukas
On Thursday, July 2, 2020 12:55:43 PM CEST Lukas Krejci wrote:
> To follow up on this,
>
> we have finally finished our performance tests with Envoy and while it
> offers very nice option for dynamic reconfiguration we have found it
> performing significantly slower under highly dynamic load (e.g. when we
> simulated adding new workspaces) than the others (traefik, nginx, haproxy).
>
> We have not yet made a team decision but IMHO for the reasons above, we're
> going to be looking at the other alternatives.
>
> On Saturday, June 6, 2020 9:47:09 PM CEST Lukas Krejci wrote:
> > We have not! I will definitely look into it.
> >
> > On Saturday, June 6, 2020 9:22:10 AM CEST Gorkem Ercan wrote:
> > > Have you considered Envoy[1] as an alternative?
> > > Knative Kourier has a similar usage which uses envoy underneath.
> > >
> > > [1] https://www.envoyproxy.io/
> > > [2] https://github.com/knative/net-kourier
> > >
> > > On Tue, Jun 2, 2020 at 8:22 AM Lukas Krejci <lkrejci@xxxxxxxxxx> wrote:
> > > > Hi all,
> > > >
> > > > I am following up on the topic of enabling single-host on OpenShift.
> > > >
> > > > We have concluded the performance tests and I would like to present to
> > > > you
> > > > the
> > > > results that we have found.
> > > >
> > > > tl;dr There is no clear winning solution.
> > > >
> > > > In our testing we concentrated on 3 areas. The performance of routing
> > > > of
> > > > the
> > > > HTTP traffic, performance of Websocket communication and correct
> > > > handling
> > > > of
> > > > cookies under path rewriting.
> > > >
> > > > We were trying to choose between 3 candidates for the HTTP gateway
> > > > that
> > > > we
> > > > identified in the prior POCs:
> > > >
> > > > * HAProxy
> > > > * Nginx
> > > > * Traefik
> > > >
> > > > Unfortunately, none of them came out of our testing with flying
> > > > colors.
> > > >
> > > > Generally, HTTP and websockets have somewhat unsurprisingly very
> > > > similar
> > > > performance profile in each of the solutions so I won't be discussing
> > > > them
> > > > separately.
> > > >
> > > > == HAProxy
> > > > Pros:
> > > > * fast and hardware-efficient even under high load
> > > > Cons:
> > > > * Some issues with live reconfiguration
> > > > ** The slowest to establish a new route within the gateway
> > > > ** Rare routing errors
> > > >
> > > > == Nginx
> > > > Pros:
> > > > * fast and hardware-efficient under moderate load
> > > > * Stable under live reconfiguration
> > > > Cons:
> > > > * "Flappy" performance under high load - high variance in response
> > > > times
> > > > * Rare routing errors under high load
> > > >
> > > > == Traefik
> > > > Pros:
> > > > * Performant
> > > > * Best support for live reconfiguration
> > > > * Support for OAuth and other "modern" features that we could take
> > > > advantage
> > > > of in the future
> > > > Cons:
> > > > * BLOCKER - incorrect handling of cookies defined on a specific path.
> > > > Such
> > > > cookie paths are not rewritten along with requests. This is
> > > > essentially
> > > > a
> > > > security issue because it would enable auth cookie
> > > > overwriting/stealing.
> > > > * Higher hardware requirements (especially CPU under higher load)
> > > >
> > > > Our current favorite is Traefik despite the blocking issue of
> > > > incorrect
> > > > cookie
> > > > handling. We think it might be worth trying to fix that and get a
> > > > solution
> > > > that seems to be the most stable of the 3. If fixing Traefik proves
> > > > too
> > > > difficult, our second choice would probably be nginx but that would
> > > > require
> > > > further testing.
> > > >
> > > > We will present our findings with all the fancy graphs and discussion
> > > > on
> > > > the
> > > > next community call.
> > > >
> > > > We have now concluded our performance testing though and are moving
> > > > forward
> > > > with the actual implementation (and will soon pick the gateway
> > > > solution).
> > > >
> > > > We'd appreciate your feedback and advice on any of the above detailed
> > > > pros
> > > > or
> > > > cons.
> > > >
> > > > You can check our progress on this epic at
> > > > https://github.com/eclipse/che/issues/12914.
> > > >
> > > > Thanks,
> > > >
> > > > Lukas
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > che-dev mailing list
> > > > che-dev@xxxxxxxxxxx
> > > > To unsubscribe from this list, visit
> > > > https://www.eclipse.org/mailman/listinfo/che-dev