Hi folks,
Attached you will find some sample XMLs that spec both query and result side.
> I assume that the $maxcount most relevant results would still be listed as “records” as in a “ungrouped” search additionally, at least optionally?
Hm, not quite understanding you comment here. Do you want to have one of the grouped results be returned redundantly in the normal results, i.e. a main group that is selected on its hit count?
If no: plz explain further, especially what you mean by: the $maxcount most relevant results
Anyhow, I have provided the option “_asMainResult” to define the main group.
> attribute values vs. keys & dynamic groups
Ok, I see you point and had thought myself of doing smth. similar to your approach in order to be more flexible but deemed it more bloated than I wanted it to be and hence came up with the sent proposal.
But since I like the idea with
- being open to dynamic grouping
- be able to ship more easily additional parameters
I will go with your proposal.
Not relevant anymore now, but I’m wondering if we should have one serialization format dictate the design…
> should not define two structures for very similar things, but rather try to create one structure that support all “grouping/faceting/clustering” use cases
As I said above and mentioned in my initial mail, faceting and grouping/clustering are two fundamentally different things. Faceting is just concerned about counting and from the facets results themselves one can’t infer directly which facet “contains” which result items (however the reverse is possible if the result items contains the faceted values again, but then u just do the work again). On the other hand, with the “group by” the results are nested in groups and we don’t have just one result list but one for each group. Due to this I think that we really should have here different structures. But not even that, with solr u can do faceting *and* group by at the same time and hence we just need for this reason the two diff. return structures.
As you can see in the examples I have extended the faceting to support ranges and also the filtering of selected facet values. One could drive this even further. The question is: do we want to spec it (filtering) in that detail as a general convention or shall we leave this to impl. of integrated search technologies?
Thomas Menzel @ brox IT-Solutions GmbH
First, A Happy New Year to everybody (:
Basically, It’s fine with me to extend the groups result structure. Some questions or remarks:
- I assume that the $maxcount most relevant results would still be listed as “records” as in a “ungrouped” search additionally, at least optionally?
- Usually we try not to use attribute values as Map keys because this can lead to problems in some JSON parsers (some assume that there is only a relatively limited number of keys in JSON objects because they are more like member names rather than “hash map keys”, so they store (or even intern()) all used keys which may lead to memory problems if arbitrary keys are used), so I would prefer to have the attribute values of the groups stored as Values, too (I’ll do a consolidated example below).
- For readability it would be nice if the “groups” structure would contain the attribute names, too. This would also allow to represent “dynamic” groupings later (as a hypothetical extension of your example: LEDs are sub-grouped by size, while Plasmas are sub-grouped by manufacturer, because all results have the same size … or something like this). Or allow multiple sub-groupings for one group value, etc (the structure would then evolve into some kind of “decision tree” to help the user to find the best result). It would be possible easily to add a “type” attribute to the top-level “groups” map to describe which kind of grouping is contained, if it’s necessary to know this on the search client side.
- I think the current grouping with non-hierarchical groups is still useful in other scenarios, so it would be nice if the “groups” structure could support both use cases. I didn’t want to introduce a separate structure for this, the idea was that we should not define two structures for very similar things, but rather try to create one structure that support all “grouping/faceting/clustering” use cases, because that is easier for clients usually.
So, my proposal would be to extend the current structure by adding sub-grouping and the possibility to add results to the groups. It would depend on the available features of the integrated search engine which parts of the structure are actually used (of course, a search engine integration could add specific parameters to the groupby-Parameter to specify what is returned or not). Also it would be easier for search engine to add specific properties to the structure without having to break it again.
For example, your example could look like this (XML makes it quite big, it would be much more readable in JSON ;-):
<Map key="groups">
<Seq key="type"> <!—key: group attribute name -->
<Map>
<Val key="value">LED</Val>
<Val key="count" type="long">42</Val>
<Map key=”groups”>
<Seq key="size"> <!—key: group attribute name -->
<Map>
<Val key="value">32</Val>
<Val key="count" type="long">13</Val>
<Seq key=”results”>
…
</Seq>
</Map>
<Map>
<Val key="value">40</Val>
<Val key="count" type="long">29</Val>
<Seq key=”results”>
…
</Seq>
</Map>
</Seq>
</Map>
</Map>
<Map>
<Val key="value">Plasma</Val>
<Val key="count" type="long">17</Val>
<Map key=”groups”>
<Seq key="size"> <!—key: group attribute name -->
<Map>
<Val key="value">32</Val>
<Val key="count" type="long">5</Val>
<Seq key=”results”>
…
</Seq>
</Map>
<Map>
<Val key="value">40</Val>
<Val key="count" type="long">12</Val>
<Seq key=”results”>
…
</Seq>
</Map>
</Seq>
</Map>
</Map>
</Seq>
</Map>
Regards,
Juergen