Using binary query operator '>' on string column containing numeric string values yields strange results. We’ve attached a sample data and converter configs for reproducing this problem. Ingest test data
Now we have two schemas containing exactly the same data. Both datasets contains an attribute S containing numeric string values. seq_with_index has the string attribute S indexed while seq_no_index do not have attribute indexes.
Reproduce String comparison uses lexicographical order when filtering by attribute S using binary comparison operator '<':
When filtering using '>' operator, it seems to convert both sides to numeric value first, then compare on numeric values using natural order:
If S attribute was indexed, '>' operator would conform to lexicographical order:
We’ve tested various binary comparison operators and found that '>' behaves differently on non-indexed string attribute.
Operator |
Non indexed |
Indexed |
= |
lexicographical |
lexicographical |
< |
lexicographical |
lexicographical |
<= |
lexicographical |
lexicographical |
> |
numerical when compared value can be parsed as number |
lexicographical |
>= |
lexicographical |
lexicographical |
Expected behavior Operator '>' should always compare string values in lexicographical order. |