Re: [che-dev] How to collect and persist all workspace logs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [che-dev] How to collect and persist all workspace logs?
From: Michal Vala <mvala@xxxxxxxxxx>
Date: Thu, 20 Feb 2020 20:31:26 +0100
Delivered-to: che-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/che-dev>
List-help: <mailto:che-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/che-dev>, <mailto:che-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/che-dev>, <mailto:che-dev-request@eclipse.org?subject=unsubscribe>
Is `chectl` ready for this? I can't see any option how to run existing
workspace. Only create and run workspace with the devfile. And I'm not
sure whether it makes sense to debug just created workspace. Or am I
missing something?

On Wed, Feb 19, 2020 at 2:39 PM Mario Loriedo <mario.loriedo@xxxxxxxxx> wrote:
>
> Lukas is right. Deviles live in a git repo and therefore debug true/false do not have a lot of sense there.
> For the client part we could start with "chectl worspace:start --debug" first and then, as a second step, have it on dashboard side (after a UX review and approval).
>
> On Wed, Feb 19, 2020 at 2:17 PM David Festal <dfestal@xxxxxxxxxx> wrote:
>>
>> I tend to agree with Lukas here.
>> Debug mode is an instruction that defines *how the workspace should be started*, and not *what the workspace contains*
>> At least possibly adding this information as an annotation in the devfle would make more sense IMO than adding a dedicated field in the Devfile schema.
>>
>> But I also agree that probably the best solution is to add a parameter to the rest API.
>>
>> > Because attribute it's something that the user can already change in a devfile, we don't need to code anything.
>>
>> I'm not sure we should add a field in the Devfile schema just for the purpose of avoiding or simplifying implementation work.
>> Anyway, there is already an issue dedicated to Dashboard changes related to the Workspace Logs afaik.
>> So, on the Dashboard side, I assume that adding a `debug` parameter to the REST API call that creates a workspace would probably not be the hardest part of the changes.
>>
>> David.
>>
>> Le mer. 19 févr. 2020 à 13:15, Sergii Kabashniuk <skabashn@xxxxxxxxxx> a écrit :
>>>
>>>
>>>
>>> On Wed, Feb 19, 2020 at 8:54 AM Lukas Krejci <lkrejci@xxxxxxxxxx> wrote:
>>>>
>>>> Is the debug mode really a quality of the workspace that needs to be present in its definition? For me the debug mode is just a toggle that modifies how you start a workspace, but the workspace definition, i.e. the devfile, is still the same.
>>>>
>>>> In some way, it is similar to the target namespace which is extraneous to the devfile and can be specified on workspace creation. Similarly, debug mode could be specified as an extra attribute of the REST endpoint on the workspace start.
>>>
>>>
>>> How strong this beliving?
>>> Because attribute it's something that the user can already change in a devfile, we don't need to code anything.
>>> Otherwise, the query parameter is something that we need to implement on dashboard UI which can take SOME time.
>>>
>>>>
>>>> Lukas
>>>>
>>>> On Tuesday, February 18, 2020, Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> > Hello everyone,
>>>> > on today's demo (or here [1]) you could see what we already have
>>>> > implemented for collecting workspace startup logs. The feature is not
>>>> > merged yet and most probably will go into 7.10.0.
>>>> >
>>>> > Current approach is to have an attribute `debug` in the devfile. If it
>>>> > is set to `true`, we will watch and print the container logs. This
>>>> > will give us the option to set the debug mode per workspace or even
>>>> > per workspace run.
>>>> > So the current devfile proposal looks like this:
>>>> >
>>>> > apiVersion: 1.0.0
>>>> > metadata:
>>>> >   name: my-workspace
>>>> > attributes:
>>>> >   debug: 'true'
>>>> >
>>>> > In the code-review there is a comment from Angel [2], whether `debug`
>>>> > is a good name for this feature. He proposes `debugWorkspaceStart`.
>>>> >
>>>> > I'd like to hear others' opinions about this.
>>>> >
>>>> > Thanks!
>>>> >
>>>> > [1] - https://youtu.be/J6nHAFmc1Rg
>>>> > [2] - https://github.com/eclipse/che/pull/15988#pullrequestreview-360339321
>>>> >
>>>> > On Thu, Feb 13, 2020 at 4:25 AM Mario Loriedo <mario.loriedo@xxxxxxxxx> wrote:
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Tue, Feb 11, 2020 at 2:01 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> >>>
>>>> >>> Hello everyone,
>>>> >>>
>>>> >>> I've created this issue https://github.com/eclipse/che/issues/15983 to reflect the results of this discussion.
>>>> >>> So the current plan is:
>>>> >>>  - watch logs of all containers of workspace, plugin-broker and jwt-proxy pods
>>>> >>>  - send the logs to the dashboard startup screen
>>>> >>>  - once workspace successfully start or fails, stop watching
>>>> >>>  - I'd like to make it configurable, so it can be completely disabled with some configuration flag, because it might be quite heavy on resources. Basically we would need thread+connection per container.
>>>> >>>    - To make it more safe, we could have dedicated thread-pool for this, so we would ensure to have up-limit number of threads/connections at the same time. However, with this, it's the higher risk our get-logs request is too late and pod with the logs will be gone at this point.
>>>> >>>
>>>> >>> I have one UX question, where we should show the logs to the user? We're thinking of having some second tab on workspace startup screen log, or include it directly into current event-log. What do you think?
>>>> >>
>>>> >>
>>>> >> It's difficult to say without seeing it in action but I would say that a second tab may be a good idea.
>>>> >>
>>>> >>>
>>>> >>> For this it's important to say that in the case of successful startup, it's highly possible that no workspace logs will make it on time on the screen. We can grab the logs only once containers are started and it's the very last phase of the startup and it's usually very fast. Of course it's a different story for plugin-broker and jwt-proxy logs.
>>>> >>>
>>>> >>> For events, we're already watching and logging lot of them, so we need to revise what's missing and fill in the gaps. For now we'd like to focus on logs.
>>>> >>>
>>>> >>> On Wed, Feb 5, 2020 at 10:53 PM Mario Loriedo <mario.loriedo@xxxxxxxxx> wrote:
>>>> >>>>
>>>> >>>>
>>>> >>>> On Wed, Feb 5, 2020 at 5:17 PM Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> >>>>>
>>>> >>>>> Good, so let's focus on workspace startup and providing stdout logs of the containers from workspace pod.
>>>> >>>>>
>>>> >>>>> We're proposing to evaluate the following solution:
>>>> >>>>>   - watch workspace pod's logs (all containers) during the startup from che-server
>>>> >>>>>   - as soon as workspace is fully running or startup fails, compress and save the logs somewhere on che-server (database?)
>>>> >>>>>     - we will store the logs under workspace_id key and only last run of the logs will be stored. New logs always overwrites whatever is stored under same workspace_id key.
>>>> >>>>>   - we will create new simple API to download the logs, something like /logs/<workspace_id>. This could be then called from the dashboard or anywhere else.
>>>> >>>>>
>>>> >>>>> If this solution looks suitable to you, we will start to prototype it to evaluate the real practicability and discover the limits.
>>>> >>>>>
>>>> >>>>> What do you think?
>>>> >>>>
>>>> >>>>
>>>> >>>> - watch not only workspace pod logs but kubernetes events + jwt proxy events + plugin broker logs as well
>>>> >>>> - persisting the logs and exposing an API to retrieve them compressed for me is gold plating (not required): we just need clients like dashboard and chectl to easily subscribe and consume the logs streams after a workspace start request.
>>>> >>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Wed, Feb 5, 2020 at 3:23 PM Mario Loriedo <mario.loriedo@xxxxxxxxx> wrote:
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Wed, Feb 5, 2020 at 1:56 PM Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> >>>>>>>
>>>> >>>>>>> Hi Mario and everyone,
>>>> >>>>>>>
>>>> >>>>>>> when we were analysing and designing this, we've assumed these requirements:
>>>> >>>>>>>   - consumer is user of the workspace
>>>> >>>>>>>   - collect all workspace containers output logs and file logs inside
>>>> >>>>>>> the containers
>>>> >>>>>>>   - archive at least 5 runs of each workspace
>>>> >>>>>>>   - make the logs easily accessible for the user (dashboard)
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Now it looks like we're back to define what we're trying to achieve.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Yes that's my interpretation at least:
>>>> >>>>>> - we are asked to help users troubleshooting their workspaces errors (startup and runtime)
>>>> >>>>>> - a short time solution was proposed: allow users to download the logs of their last 5 workspaces. That's not ideal of course. Users should not look for evidences of errors in hundreds of lines of logs...
>>>> >>>>>> - a long time solution would be providing immediate error messages that answers exhaustively those questions:
>>>> >>>>>>     What happened?
>>>> >>>>>>     Why it happened?
>>>> >>>>>>     Is it a known issue?
>>>> >>>>>>     Is there a workaround?
>>>> >>>>>>     Where can I ask for help?
>>>> >>>>>>
>>>> >>>>>> The problem is that, at this stage, we have figured out that the short time solution is not as simple as expected.
>>>> >>>>>> So looking back at the original needs and thinking about alternative solutions (easier to implement but providing equivalent value) looks like a good idea now.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> So few questions:
>>>> >>>>>>>   - is consumer only the user?
>>>> >>>>>>>   - do we want provide the logs from full workspace life-time or only startup?
>>>> >>>>>>>   - do we want provide the logs when workspace fails to start?
>>>> >>>>>>>   - do we want provide the logs when workspace crashes at some point
>>>> >>>>>>> in running phase?
>>>> >>>>>>>   - do we want to keep logs history of previous runs (last 5)?
>>>> >>>>>>>   - do we want to keep logs history only previous crashes (last 5)?
>>>> >>>>>>>   - same applies for file logs inside the containers and container's stdout
>>>> >>>>>>>
>>>> >>>>>>> As a starting point, that covers the workspace start failure scenario
>>>> >>>>>>> and that can
>>>> >>>>>>> give the piece of information that is not obvious how to get for the
>>>> >>>>>>> regular user we can suggest this:
>>>> >>>>>>>
>>>> >>>>>>> - is consumer only the user?  -  yes
>>>> >>>>>>> - do we want provide the logs from full workspace life-time or only
>>>> >>>>>>> startup? - startup
>>>> >>>>>>> - do we want provide the logs when workspace fails to start? - not
>>>> >>>>>>> depends if the start was successful or not.
>>>> >>>>>>> - do we want provide the logs when workspace crashes at some point in
>>>> >>>>>>> the running phase? - no
>>>> >>>>>>> - do we want to keep logs history of previous runs (last 5)? - no.
>>>> >>>>>>> only the latest.
>>>> >>>>>>> - do we want to keep logs history only previous crashes (last 5)? - no
>>>> >>>>>>> - same applies for file logs inside the containers and container's
>>>> >>>>>>> stdout. - track only containers stdout.
>>>> >>>>>>>
>>>> >>>>>>> That should be enough to diagnose the majority of the failures that we
>>>> >>>>>>> can expect today.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> I agree with your answers. Except for the running phase errors: OOM errors at runtime for example are very common and a normal user doesn't have any clue about what's going on.
>>>> >>>>>> Anyway it makes sense to split the problem in 2: startup and runtime error. Let's solve the first one first.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> more comments/questions inlined.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> On Wed, Feb 5, 2020 at 8:42 AM Mario Loriedo <mario.loriedo@xxxxxxxxx> wrote:
>>>> >>>>>>> >
>>>> >>>>>>> > Hi Michal,
>>>> >>>>>>> >
>>>> >>>>>>> > Thanks for these analysis and sorry for the late reply.
>>>> >>>>>>> >
>>>> >>>>>>> > I don't think that building a log collecting system from scratch is the right approach (it's painful). What about going through the kube native direction you mentioned in your first mail? Grafana loki or fluentd are projects that may solve our problem.
>>>> >>>>>>>
>>>> >>>>>>> We were targeting the user, so cluster-wide logging solution is mostly
>>>> >>>>>>> out of game. Also changing cluster configuration and introducing new,
>>>> >>>>>>> possibly huge component is imho no-go. If cluster admin wants to do
>>>> >>>>>>> something like this, we shouldn't block him, but that's all.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> I agree: this is out of scope (it doesn't help users troubleshooting their workspaces errors)
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> >
>>>> >>>>>>> > Another aspect is that querying the workspaces logs is more an admin user story than a user one. Tools like elastic search and grafana provide a good UX for that, I would NOT build a Che UI component and a new wsmaster API for that. As for monitoring the logging collection should be optional and an admin could choose to activate it if he wants.
>>>> >>>>>>>
>>>> >>>>>>> I was not thinking about building some log analysis tool on Che UI.
>>>> >>>>>>> The idea was more like some simple download button that would provide
>>>> >>>>>>> you zip with all the logs for your workspace runs (last 5). That's
>>>> >>>>>>> all.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> ack
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> >
>>>> >>>>>>> > But anyway, although the admin scenario is important, I believe the original problem we were trying to solve was more a user problem. We want to make it easy for a user to troubleshoot a workspaces that:
>>>> >>>>>>> >
>>>> >>>>>>> > - fails to start
>>>> >>>>>>> > - is not behaving correctly (i.e. a LS doesn't work as expected)
>>>> >>>>>>> >
>>>> >>>>>>> > We have already made some good progress on troubleshooting (better messages) but there are still some cases where it's hard to figure out what's going. For those cases, providing the logs to the user would help. But I am not sure that persisting the logs is necessary:
>>>> >>>>>>>
>>>> >>>>>>> how we could provide the logs of crashed workspace without some level
>>>> >>>>>>> of persistence?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Attaching to the kube events and containers stdout streams as soon as they are created. The result should be something like the docker compose logs but in the dashboard (maybe a logs tab):
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> >
>>>> >>>>>>> > - when an error happens at workspaces start we should provide: wsmaster logs, kubernetes events, containers status and logs from the workspace pod and from the plugin broker.
>>>> >>>>>>>
>>>> >>>>>>> Should we really give wsmaster logs to the user?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> That's a fair point. The wsmaster log may be confusing and usually doesn't have interesting details about the workspaces startup. I am ok with not including it.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> > - at anytime when a workspace is running a user should be able to see/tail or download all the logs (theia, LS and other plugins) via a specific command within theia
>>>> >>>>>>>
>>>> >>>>>>> all file logs are already be accessible via component's terminal. I think
>>>> >>>>>>> countainer's stdout logs are currently out of reach from theia and I'm
>>>> >>>>>>> not sure if we need che-server for it.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Correct. che-server in this case should not be involved. And that's runtime troubleshooting, something we have decided to consider later.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> >
>>>> >>>>>>> >
>>>> >>>>>>> > On Tue, Feb 4, 2020 at 11:54 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> >>>>>>> >>
>>>> >>>>>>> >> fix: global collector is without the rice, of course... facepalm,
>>>> >>>>>>> >> clipboard went crazy or what...
>>>> >>>>>>> >>
>>>> >>>>>>> >> On Tue, Feb 4, 2020 at 11:10 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > Hello team,
>>>> >>>>>>> >> > we've got into troubles with the implementation of writing the logs of
>>>> >>>>>>> >> > container's stdout to the files. It is quite unfortunate as we've
>>>> >>>>>>> >> > spent some time analyzing the feature, but that's life.
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > Original idea was to modify the command of the image so it would
>>>> >>>>>>> >> > redirect the output into the file, something like `<command> | tee
>>>> >>>>>>> >> > c1.log`. However, it is very hard to impossible to achieve that. My
>>>> >>>>>>> >> > idea was to pass the `args` to the container command. This does not
>>>> >>>>>>> >> > work and I think that it's caused by arguments being passed in quotes
>>>> >>>>>>> >> > under the hood so it became something like `<command> '| tee c1.log'`,
>>>> >>>>>>> >> > which does not do what we want. To actually update the command, we
>>>> >>>>>>> >> > would need to have full image pulled and then somehow inspect it to
>>>> >>>>>>> >> > get the original command and update it. This would mean very high
>>>> >>>>>>> >> > intervention in current workspace startup logic with uncertain result
>>>> >>>>>>> >> > and high risk.
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > So where to go next? We have few ideas:
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > # Namespace log collector component (I have a working prototype of this)
>>>> >>>>>>> >> >   - will run in extra pod in the namespace of the workspace
>>>> >>>>>>> >> >   - will be watching for workspace pods and when there's some and
>>>> >>>>>>> >> > running, it will start follow the logs of all containers and write
>>>> >>>>>>> >> > them to files
>>>> >>>>>>> >> >   - one instance per namespace
>>>> >>>>>>> >> >   - lifecycle managed by che-server (can scale down when no workspace
>>>> >>>>>>> >> > is running and scale up before first workspace start)
>>>> >>>>>>> >> > pros:
>>>> >>>>>>> >> >   - should be quite gentle with hw resources (TODO: measure),
>>>> >>>>>>> >> > especially with many workspaces in the same namespace
>>>> >>>>>>> >> >   - outlive the workspace lifetime, so we should be able to get all the logs
>>>> >>>>>>> >> >   - logs could be provided to the backend within the same component
>>>> >>>>>>> >> >   - should be possible to manage file logs from inside the containers
>>>> >>>>>>> >> > with this component
>>>> >>>>>>> >> > cons:
>>>> >>>>>>> >> >   - needs extra PVC for logs XOR use workspace's PVC with limitation
>>>> >>>>>>> >> > that all workspaces will need to run on one node and the logic will
>>>> >>>>>>> >> > have to be more complex to reflect different Che PVC strategies
>>>> >>>>>>> >> >   - for "namespace per workspace" or "only one workspace per user"
>>>> >>>>>>> >> > scenarios, same hw requirements as a sidecar collector
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > # Workspace log collector sidecar
>>>> >>>>>>> >> >   - will run as a workspace pod as a sidecar
>>>> >>>>>>> >> >   - will follow all the container logs of the workspace and write them to PVC
>>>> >>>>>>> >> > pros:
>>>> >>>>>>> >> >   - no issues with PVC access from multiple pods
>>>> >>>>>>> >> >   - same lifecycle as the workspace, so it's easier to deploy with
>>>> >>>>>>> >> > current server logic ("just" add another sidecar)
>>>> >>>>>>> >> >   - easiest to get file logs from inside the containers as we're in the same pod
>>>> >>>>>>> >> > cons:
>>>> >>>>>>> >> >   - same lifecycle as the workspace, so we're not sure we get all the
>>>> >>>>>>> >> > logs before collector is killed
>>>> >>>>>>> >> >   - extra hw resources consumed per each workspace
>>>> >>>>>>> >> >   - we will need another component to send the logs to the backend as
>>>> >>>>>>> >> > we can't rely on workspace pod will manage it in time on workspace
>>>> >>>>>>> >> > crash
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > # Global che-server log collectora ze sojového masa
>>>> >>>>>>> >> > 200g rýže bas
>>>> >>>>>>> >> >   - che-server will watch and follow the logs of all workspaces and
>>>> >>>>>>> >> > write them to PVC/Database/?
>>>> >>>>>>> >> > pros:
>>>> >>>>>>> >> >   - no extra hw resources per workspace/namespace
>>>> >>>>>>> >> >   - logs are collected directly to the place where they can be
>>>> >>>>>>> >> > requested so not much extra coding needed to make them accessible on
>>>> >>>>>>> >> > server API
>>>> >>>>>>> >> > cons:
>>>> >>>>>>> >> >   - higher network traffic workspace ⇔ che-server
>>>> >>>>>>> >> >   - keep the connection to all workspaces open all the time
>>>> >>>>>>> >> >   - higher hw requirements on che-server
>>>> >>>>>>> >> >   - hard to impossible to get file logs from inside the containers,
>>>> >>>>>>> >> > probably will need another component that will run on-exit inside
>>>> >>>>>>> >> > workspace's namespace
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > Important question here is how hard requirement is to get the file
>>>> >>>>>>> >> > logs from the inside of the containers (e.g. language servers)? This
>>>> >>>>>>> >> > can be an important thing to decide which way to go.
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > Thanks!
>>>> >>>>>>> >> > Michal
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > On Thu, Jan 23, 2020 at 5:04 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > Hello team,
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > we're currently working on improving diagnosis capabilities[1] of workspaces, to
>>>> >>>>>>> >> > > be more concrete, how to get all logs from the workspace[2]. We're in phase of
>>>> >>>>>>> >> > > investigating options and prototyping and we've came up with several variants
>>>> >>>>>>> >> > > how to achieve the goal. We would like to know your opinion and new ideas.
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > Requirements:
>>>> >>>>>>> >> > >   - collect all logs of all containers from the workspace
>>>> >>>>>>> >> > >   - stdout/err as well as file logs inside the container
>>>> >>>>>>> >> > >   - keep history of last 5 runs of the workspace
>>>> >>>>>>> >> > >   - collect logs of crashed workspace
>>>> >>>>>>> >> > >   - make logs easily accessible to the user (rest API + dashboard view)
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > I've splitted the effort into two sections:
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >   ### How to collect:
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # log everything to files to mounted PV
>>>> >>>>>>> >> > >       - just mount PV and log everything there
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - not much extra overhead, only write stdout/err to the file
>>>> >>>>>>> >> > > and mount PV
>>>> >>>>>>> >> > >         - don't need extra hw resources (memory/cpu)
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - we might need to override `command` of all containers. They will
>>>> >>>>>>> >> > >           have to run with extra parameters to write stdout/err to the file.
>>>> >>>>>>> >> > >           Something like `<command> 2>&1 | tee ws.log`
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # workspace collector sidecar (kubernetes/client-go app?)
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - per workspace
>>>> >>>>>>> >> > >         - dynamic and powerful
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - very custom solution and might be hard to manage/maintain
>>>> >>>>>>> >> > >         - unknown performance and hw resources requirements
>>>> >>>>>>> >> > >         - hard when ws crash
>>>> >>>>>>> >> > >         - need more memory per workspace, even if user does not use it and
>>>> >>>>>>> >> > >           everything works as expected
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # watch and collect from master
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - easy to grab logs and events
>>>> >>>>>>> >> > >         - easy to access archived logs
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - only container's stderr/out
>>>> >>>>>>> >> > >         - keep the connection to ws
>>>> >>>>>>> >> > >         - more network traffic
>>>> >>>>>>> >> > >         - increase memory footprint of mastaer
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # kubernetes native
>>>> >>>>>>> >> > >       - change the logging backend of kubernetes [3]
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - standard k8s way, "googleable"
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - depends on kubernetes deployment
>>>> >>>>>>> >> > >         - needs extra cluster component/configuration
>>>> >>>>>>> >> > >         - only stdout/err of containers
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # push logs directly from containers to logging backend
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - customize all components to log to the backend
>>>> >>>>>>> >> > >         - performance and hw resources overhead
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # collect on workspace exit
>>>> >>>>>>> >> > >       - mount PV and log there. When workspace exits, start collector pod that
>>>> >>>>>>> >> > >           grabs the logs and "archive" them.
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - not much extra overhead
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - don't have logs of running workspace
>>>> >>>>>>> >> > >         - custom collector pod
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >   ### Where to store and how to access:
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # Workspace PV
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - easy to set quota per user
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - harder to access (need to start some pod at workspace's namespace)
>>>> >>>>>>> >> > >         - lost when delete namespace
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # Che PV
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - easier to access
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - harder to set quota per user
>>>> >>>>>>> >> > >         - harder to scale and manage
>>>> >>>>>>> >> > >         - possible performance bottleneck
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >     # PostgreSQL
>>>> >>>>>>> >> > >       - pros
>>>> >>>>>>> >> > >         - the easiest to access
>>>> >>>>>>> >> > >       - cons
>>>> >>>>>>> >> > >         - harder to set quota per user
>>>> >>>>>>> >> > >         - harder to scale and manage
>>>> >>>>>>> >> > >         - possible performance bottleneck
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > There is one remaining and very important question we have not investigated
>>>> >>>>>>> >> > > much. We need to somehow configure all plugins/editors and other components, to
>>>> >>>>>>> >> > > tell where they have all log files that should be collected. Otherwise, we
>>>> >>>>>>> >> > > would not be able to find the logs on containers. We would need to
>>>> >>>>>>> >> > > handle that in
>>>> >>>>>>> >> > > plugin's `meta.yaml` as well as in the devfile.
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > What's next?
>>>> >>>>>>> >> > >   We would like to investigate and prototype following solution:
>>>> >>>>>>> >> > >     - collect all ws logs into files and store in PV in the workspace
>>>> >>>>>>> >> > >     - watch ws events from master and on exit, start the collector pod that will
>>>> >>>>>>> >> > >       collect all the logs and pass them to the backend. Logs backend
>>>> >>>>>>> >> > > is something
>>>> >>>>>>> >> > >       to be done. It might be only PV dedicated to archiving log, or some new
>>>> >>>>>>> >> > >       service, or Che master.
>>>> >>>>>>> >> > >     - prototype new Che master API to access the logs. If we store
>>>> >>>>>>> >> > > them in workspace's PV,
>>>> >>>>>>> >> > >       start the collector pod on demand to access the logs.
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > We would very much welcome any opinions or ideas.
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > >
>>>> >>>>>>> >> > > [1] - https://github.com/eclipse/che/issues/15047
>>>> >>>>>>> >> > > [2] - https://github.com/eclipse/che/issues/15134
>>>> >>>>>>> >> > > [3] - https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> >
>>>> >>>>>>> >> > --
>>>> >>>>>>> >> > Michal Vala
>>>> >>>>>>> >> > Software Engineer, Eclipse Che
>>>> >>>>>>> >> > Red Hat Czech
>>>> >>>>>>> >>
>>>> >>>>>>> >>
>>>> >>>>>>> >>
>>>> >>>>>>> >> --
>>>> >>>>>>> >> Michal Vala
>>>> >>>>>>> >> Software Engineer, Eclipse Che
>>>> >>>>>>> >> Red Hat Czech
>>>> >>>>>>> >>
>>>> >>>>>>> >> _______________________________________________
>>>> >>>>>>> >> che-dev mailing list
>>>> >>>>>>> >> che-dev@xxxxxxxxxxx
>>>> >>>>>>> >> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>>>>>> >> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>>>>>> >
>>>> >>>>>>> > _______________________________________________
>>>> >>>>>>> > che-dev mailing list
>>>> >>>>>>> > che-dev@xxxxxxxxxxx
>>>> >>>>>>> > To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>>>>>> > https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> Michal Vala
>>>> >>>>>>> Software Engineer, Eclipse Che
>>>> >>>>>>> Red Hat Czech
>>>> >>>>>>>
>>>> >>>>>>> _______________________________________________
>>>> >>>>>>> che-dev mailing list
>>>> >>>>>>> che-dev@xxxxxxxxxxx
>>>> >>>>>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>>>>>> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>>>>>
>>>> >>>>>> _______________________________________________
>>>> >>>>>> che-dev mailing list
>>>> >>>>>> che-dev@xxxxxxxxxxx
>>>> >>>>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>>>>> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> Michal Vala
>>>> >>>>> Software Engineer, Eclipse Che
>>>> >>>>> Red Hat Czech
>>>> >>>>> _______________________________________________
>>>> >>>>> che-dev mailing list
>>>> >>>>> che-dev@xxxxxxxxxxx
>>>> >>>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>>>> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>>>
>>>> >>>> _______________________________________________
>>>> >>>> che-dev mailing list
>>>> >>>> che-dev@xxxxxxxxxxx
>>>> >>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>>> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Michal Vala
>>>> >>> Software Engineer, Eclipse Che
>>>> >>> Red Hat Czech
>>>> >>> _______________________________________________
>>>> >>> che-dev mailing list
>>>> >>> che-dev@xxxxxxxxxxx
>>>> >>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >>> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >>
>>>> >> _______________________________________________
>>>> >> che-dev mailing list
>>>> >> che-dev@xxxxxxxxxxx
>>>> >> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> >> https://www.eclipse.org/mailman/listinfo/che-dev
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Michal Vala
>>>> > Software Engineer, Eclipse Che
>>>> > Red Hat Czech
>>>> >
>>>> > _______________________________________________
>>>> > che-dev mailing list
>>>> > che-dev@xxxxxxxxxxx
>>>> > To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> > https://www.eclipse.org/mailman/listinfo/che-dev
>>>> > _______________________________________________
>>>> che-dev mailing list
>>>> che-dev@xxxxxxxxxxx
>>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>>> https://www.eclipse.org/mailman/listinfo/che-dev
>>>
>>>
>>>
>>> --
>>>
>>> Sergii Kabashniuk
>>>
>>> Principal Software Engineer, DevTools
>>>
>>> Red Hat
>>>
>>> skabashniuk@xxxxxxxxxx
>>>
>>> _______________________________________________
>>> che-dev mailing list
>>> che-dev@xxxxxxxxxxx
>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>> https://www.eclipse.org/mailman/listinfo/che-dev
>>
>>
>>
>> --
>>
>> David Festal
>>
>> Principal Software Engineer, DevTools
>>
>> Red Hat France
>>
>> dfestal@xxxxxxxxxx
>>
>>
>> _______________________________________________
>> che-dev mailing list
>> che-dev@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>> https://www.eclipse.org/mailman/listinfo/che-dev
>
> _______________________________________________
> che-dev mailing list
> che-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/che-dev



-- 
Michal Vala
Software Engineer, Eclipse Che
Red Hat Czech
Follow-Ups:
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Thomas Mäder
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
References:
- [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Lukas Krejci
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Sergii Kabashniuk
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: David Festal
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
Prev by Date: Re: [che-dev] Stale branches in che repos :: ACTION REQUIRED :: please take out your trash :D
Next by Date: Re: [che-dev] How to collect and persist all workspace logs?
Previous by thread: Re: [che-dev] How to collect and persist all workspace logs?
Next by thread: Re: [che-dev] How to collect and persist all workspace logs?
Index(es):
- Date
- Thread
Breadcrumbs