Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] search record: group by vs. faceting

First, A Happy New Year to everybody (:

 

Basically, It’s fine with me to extend the groups result structure. Some questions or remarks:

 

-          I assume that the $maxcount most relevant results would still be listed as “records” as in a “ungrouped” search additionally, at least optionally?

-          Usually we try not to use attribute values as Map keys because this can lead to problems in some JSON parsers (some assume that there is only a relatively limited number of keys in JSON objects because they are more like member names rather than “hash map keys”, so they store (or even intern()) all used keys which may lead to memory problems if arbitrary keys are used), so I would prefer to have the attribute values of the groups stored as Values, too (I’ll do a consolidated example below).

-          For readability it would be nice if the “groups” structure would contain the attribute names, too. This would also allow to represent “dynamic” groupings later (as a hypothetical extension of your example: LEDs are sub-grouped by size, while Plasmas are sub-grouped by manufacturer, because all results have the same size … or something like this). Or allow multiple sub-groupings for one group value, etc (the structure would then evolve into some kind of “decision tree” to help the user to find the best result). It would be possible easily to add a “type” attribute to the top-level “groups” map to describe which kind of grouping is contained, if it’s necessary to know this on the search client side.

-          I think the current grouping with non-hierarchical groups is still useful in other scenarios, so it would be nice if the “groups” structure could support both use cases. I didn’t want to introduce a separate structure for this, the idea was that we should not define two structures for very similar things, but rather try to create one structure that support all “grouping/faceting/clustering” use cases, because that is easier for clients usually.

 

So, my proposal would be to extend the current structure by adding sub-grouping and the possibility to add results to the groups. It would depend on the available features of the integrated search engine which parts of the structure are actually used (of course, a search engine integration could add specific parameters to the groupby-Parameter to specify what is returned or not).  Also it would be easier for search engine to add specific properties to the structure without having to break it again.

For example, your example could look like this (XML makes it quite big, it would be much more readable in JSON ;-):

 

<Map key="groups">

    <Seq key="type">  <!—key: group attribute name -->

        <Map>

            <Val key="value">LED</Val>

            <Val key="count" type="long">42</Val>

            <Map key=”groups”>

                <Seq key="size">   <!—key: group attribute name -->

                     <Map>

                         <Val key="value">32</Val>

                         <Val key="count" type="long">13</Val>

                         <Seq key=”results”>

                             …

                        </Seq>

                     </Map>

                     <Map>

                         <Val key="value">40</Val>

                         <Val key="count" type="long">29</Val>

                         <Seq key=”results”>

                             …

                         </Seq>

                    </Map>

                </Seq>

            </Map>

        </Map>

        <Map>

            <Val key="value">Plasma</Val>

            <Val key="count" type="long">17</Val>

            <Map key=”groups”>

                <Seq key="size">    <!—key: group attribute name -->

                     <Map>

                         <Val key="value">32</Val>

                         <Val key="count" type="long">5</Val>

                         <Seq key=”results”>

                             …

                        </Seq>

                     </Map>

                     <Map>

                         <Val key="value">40</Val>

                         <Val key="count" type="long">12</Val>

                         <Seq key=”results”>

                             …

                         </Seq>

                    </Map>

                </Seq>

            </Map>

        </Map>

    </Seq>

</Map>

 

Regards,

Juergen


Back to the top