Re: [collections-dev] Question about UnifiedMap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [collections-dev] Question about UnifiedMap

From: Mohammad Rezaei <mohdev@xxxxxxxxx>
Date: Fri, 8 Sep 2023 11:47:02 -0400
Delivered-to: collections-dev@xxxxxxxxxxx
Dkim-filter: OpenDKIM Filter v2.11.0 mail.modaiba.com 5AFB751382
List-archive: <https://www.eclipse.org/mailman/private/collections-dev/>
List-help: <mailto:collections-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/collections-dev>, <mailto:collections-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/collections-dev>, <mailto:collections-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

In JCF Hashmap, every entry, regardless of where it is, is wrapped in a Node object. In UnifiedMap, no entry, regardless of where they are is wrapped in a Node object.

For a typical map, we should expect 80% entries are not colliding. Those incur no wrapper object overhead in UnifiedMap (so you're evaluation of #2 as "minor" should be changed to "major"). And for the colliding ones, there is one extra object (the array), instead of one per.

Thanks

Moh

On 9/8/23 11:36, Siddharth Jain via collections-dev wrote:

ok so there are basically 2 differences i am seeing:
1. instead of using a linked list UnifiedMap uses an array to store (and disambiguate) values which get hashed to the same slot in the map. this is the main difference. the array itself will be stored separate from the main table. so again we have a reference in the main table just like JCF HashMap. the only difference is that in case of JCF HashMap the reference is to a Linked List whereas in case of UnifiedMap the reference is to an array. this is the only key difference i am seeing.

2. UnifiedMap is storing the value in the main array itself. this is minor as the value will just be a reference (pointer) to the actual object.
On Thu, Sep 7, 2023 at 4:15 PM Craig P. Motlin <cmotlin@xxxxxxxxx> wrote:
HashMap's data is stored in the field:
transient Node<K,V>[] table;
Each node is the beginning of a linked list that holds all collisions that hash to the same slot in table. Each node holds the key and value, a pointer to the next collision in the linked list, and the cached hash code.
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
These Node objects are the primary waste of memory in HashMap.
UnifiedMap's data is stored in the field:
protected transient Object[] table;
The table size is "doubled." Keys live at the indices 0, 2, 4, etc. and values live at the indices 1, 3, 5, etc. When there are collisions, the key is replaced with a sentinel and the value is replaced with an array, which is also "doubled." There are no Node wrappers, and nothing like the hash field to store cached hash codes.
I hope this helps!
On Thu, Sep 7, 2023 at 12:04 PM Siddharth Jain via collections-dev <collections-dev@xxxxxxxxxxx> wrote:

Hello,

I am trying to understand how does the UnifiedMap (UM) differ from the built-in HashMap that comes with Java. I read these two pages:

https://medium.com/oracledevs/unifiedmap-how-it-works-48af0b80cb37

https://eclipse.dev/collections/javadoc/9.0.0/org/eclipse/collections/impl/map/mutable/UnifiedMap.html

but am still confused and have some questions:

1. Quoting: "The array contains the key value pairs in consecutive slots, just like the main array, but it's a linear list with no hashing." ---> does this mean UM does not use getHashCode? the other link says: "Since UnifiedMap does not cache the hashcode, for each look up, hashcode needs to be computed. So, the performance of UnifiedMap is directly dependent on the hashcode implementation of the key." Btw, I don't think the Java HashMap caches the hashcode either. It has to be computed for every lookup. otherwise we are running into a circular problem. what data structure will be used to cache the hascode?

2. if there is no hashing then how does UM search for a key? are the keys stored in sorted order and does it perform a binary search?

i have more questions but want to start with these to get some clarity. can i get link to the source code of UM?

Thanks,

M.

_______________________________________________
collections-dev mailing list
collections-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/collections-dev
_______________________________________________
collections-dev mailing list
collections-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/collections-dev

References:
- [collections-dev] Question about UnifiedMap
  - From: Siddharth Jain
- Re: [collections-dev] Question about UnifiedMap
  - From: Craig P. Motlin
- Re: [collections-dev] Question about UnifiedMap
  - From: Siddharth Jain

Prev by Date: Re: [collections-dev] Question about UnifiedMap
Next by Date: [collections-dev] GHC 2023 Open Source PullRequest concerns
Previous by thread: Re: [collections-dev] Question about UnifiedMap
Next by thread: [collections-dev] GHC 2023 Open Source PullRequest concerns
Index(es):
- Date
- Thread

Breadcrumbs