[config-dev] Starting discussion on empty values

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[config-dev] Starting discussion on empty values

From: David Lloyd <david.lloyd@xxxxxxxxxx>
Date: Fri, 10 Dec 2021 10:21:45 -0600
Delivered-to: config-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/config-dev/>
List-help: <mailto:config-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/config-dev>, <mailto:config-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/config-dev>, <mailto:config-dev-request@eclipse.org?subject=unsubscribe>

In the interest of moving things along, I would like to start a discussion on empty values and why SmallRye Config does what it does.

Our empty value handling is based on a few fundamental premises. Note that nothing here asserts that these premises are valid under the new specification (they might or might not be).

We wanted users to be able to "undefine" a value, in some uniform way. By "undefine" I mean, a lower-priority configuration source has defined the value and the user does not want that value; rather they want the value to be unspecified. The means of this undefinition need to be intuitive and need to work for any configuration source.

The way that MP Config iterates configuration sources means that effectively, a `null` value means "consult the next configuration source". Therefore if a configuration source was to indicate that something is explicitly absent, they needed to return *something*.

Because MP config always passes a non-`null` value for conversion, that *something* needs to have a representation that is understood by all `Converters`. This means that a simple, user-friendly key word like "undefined" is out - this is likely to be a valid value for some converters. An un-simple, user-hostile key word like "MICROPROFILE_UNDEFINED_VALUE" or something like that is slightly less likely to be a problem in terms of SPI, but adding burden to the user in order to avoid a perceived problem in the implementation is definitely not something we ever want to do in SmallRye or Quarkus, and I'm sure we don't want to do this to a specification either.

So, leave that on the back burner for a moment; the next premise is that we want array/list conversion to produce consistent results in a couple of different cases.

Mainly, we wanted property expansion to have reasonable semantics in the presence of a list for the case where properties are or are not defined.

For example, if the user had a properties file with a list that has a value of `${foo:},${bar:}`, the user's intent is likely to be "I want a list with foo (if it's defined) and bar (if it's defined)". So the results the user wants are "fooVal,barVal", "fooVal", "barVal", or the empty list. The user generally does not want a list of an empty value followed by a non-empty value.

This is in addition to frankly weird results that you can get with MP config when you give an empty string as a value to a list before our revisions. You could for example get an empty list, or a list with two empty strings - but it is impossible to get a list with one empty string! These rules are inconsistent and not thoroughly reasoned.

In order to reconcile these behaviors in a uniform and consistent manner, the solution I settled on was to have three concepts:

* A value that is present (not missing, not empty)

* A value that is missing

* A value that is empty (not missing but having no value)

These three concepts translate slightly differently on the SPI level versus the API level. Within a configuration source,

* Present values are non-empty strings

* Missing values are `null`

* Empty values are empty strings

When these values bubble up through converters, the behavior works like this:

* Present values are handled normally

* Missing values fall through to the empty value and are passed to converters as an empty string

* Empty values are passed as an empty string

This means that outside of configuration sources, the user does not have a different concept for empty versus missing, which is a critical concept. As it happens, the user does not need to distinguish these values (and I know, you all might want to argue about this... but it's true, I analyzed many many use cases and never found a single real and valid use case for distinguishing between an empty and null value as they could always be solved in a better way).

The concept of "required" and "optional" changes slightly in this regard. A "required" value must be non-empty, whereas an "optional" value may be empty.

This means that if I want to have a `String` property, and allow that property to be an empty `String`, then I must define it as `Optional<String>` and use `.orElse("")` when I dereference it. This is good or bad depending on your opinion! But importantly, it is *correct* and *consistent*. Opinions are only a minor consideration behind correctness and consistency.

This works very cleanly with our system of default values in Quarkus. Rather than treating default values as a per-injection-site concept, we treat them as a per-property concept. Each property may only have one default value, and that value is the value that is given when no other configuration source defines (or deletes) that value. This is trivially accomplished in Quarkus by using a low-priority configuration source which cannot be enumerated and matches property name patterns - a powerful and simple solution.

Additionally, this solves the aforementioned list expansion problem in a concise way. If a list property contains empty values, then those values are deleted unless the nested converter has a specific representation for empty values (for example `List<Optional<String>>`). This avoids a lot of headaches when you want to (for example) optionally add a second port on to your configuration depending on the presence of an env var.

This approach is not without drawbacks though. It doesn't perfectly solve the case where you want some list values to be empty and some to be removed for example. But it did make the MP Config spec consistent *enough* and correct *enough* that we could use it with Quarkus, and we have yet to discover a use case that does not have a clean and simple implementation with this approach, and that's across several hundreds of Quarkus extensions (including from the community) as well.

While I do not suggest this exact behavior for Jakarta Config (I keep saying we should define "what" not "how" first), I do propose that we MUST support the following use cases:

* Allow the user to delete a property that is specified in a lower configuration source

* Provide a means so that properties can expand in such a way that they do not produce empty list or map items

And I think we do need to define the difference between "default at the site of the property, in the event that the property has no value" and "default universally for a given property name" and decide which of these concepts we want to carry forward.

In addition the specification MUST NOT prevent the following behaviors:

* Allow an implementation to enforce a single default value globally within a configuration for a given property

* Allow an implementation to provide a non-enumerable, pattern-matching configuration source which provides these default values

* Allow an implementation to detect and reject configuration keys (within a particular namespace) which are not known to the implementation when the configuration is loaded

* Allow an implementation to detect and reject configurations containing invalid values

The common theme with these behaviors is that even if we cannot manage to do so in this specification, we MUST allow implementations to provide a *declarative* model for configuration, so that users can declare exactly what is expected and then take advantage of automated validation when loading (or reloading) a configuration. The reason this is important is any configuration error we catch up front is a configuration error that could have caused incorrect program behavior, or at worst a security vulnerability.

- DML • he/him

Prev by Date: Re: [config-dev] [BALLOT] Tree structure vs flat structure
Next by Date: Re: [config-dev] Tree structure vs flat structure discussion thread
Previous by thread: [config-dev] Meetings and list engagement
Next by thread: [config-dev] Today's config meeting summary
Index(es):
- Date
- Thread

Breadcrumbs