How Java HashMaps Work – Internal Mechanics Explained
August 9, 2024 2024-08-11 8:41How Java HashMaps Work – Internal Mechanics Explained
A HashMap
is one of the most commonly used data structures in Java, and it’s known for its efficiency. Data in a HashMap
is stored in the form of key-value pairs.
In this article, I will introduce you to HashMap
s in Java. We will explore the common operations of HashMap
and then delve into how it operates internally. You will gain an understanding of the hash function and how index calculation takes place. Finally, we will look at the time complexities of the operations and touch upon the behavior in a concurrent environment.
What is a HashMap
in Java?
A HashMap
implements the Map
interface, which is part of the Java collection framework. It’s based on the concept of Hashing.
Hashing is a technique that transforms an input of arbitrary size into a fixed-size output using a hash function. The generated output is called the hash code and is represented by an integer value in Java. Hash codes are used for efficient lookup and storage operations in a HashMap
.
Common Operations
Like we discussed above, data in a HashMap
is stored in the form of key-value pairs. The key is a unique identifier, and each key is associated with a value.
Below are some common operations supported by a HashMap
. Let’s understand what these methods do with some simple code examples:
Insertion
-
This method inserts a new key-value pair to the
HashMap
. -
The insertion order of the key-value pairs is not maintained.
-
During insertion, if a key is already present, the existing value will be replaced with the new value that is passed.
-
You can insert only one null key into the
HashMap
, but you can have multiple null values.
The method signature for this operation is given below, followed by an example:
public V put(K key, V value)
Map map = new HashMap<>();
map.put("apple", 1);
map.put("banana", 2);
In the code above, we have a HashMap example where we add a key of type String and value of type Integer.
Retrieval:
The method signature for this operation is given below, followed by an example:
public V get(Object key)
Integer value = map.get("apple");
In the code above, we’re retrieving the value associated with the key apple
.
Other common operations include:
-
remove
: Removes the key-value pair for the specified key. It returnsnull
if the key is not found. -
containsKey
: Checks if a specific key is present in theHashMap
. -
containsValue
: Checks if the specified value is present in theHashMap
.
Internal Structure of a HashMap
Internally, a HashMap
uses an array of buckets or bins. Each bucket is a linked list of type Node
, which is used to represent the key-value pair of the HashMap
.
static class Node<K, V> {
final int hash;
final K key;
V value;
Node next;
Node(int hash, K key, V value, Node next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
}
Above, you can see the structure of the Node
class which is used to store the key-value pairs of the HashMap
.
The Node
class has the following fields:
-
hash
: Refers to thehashCode
of the key. -
key
: Refers to the key of the key-value pair. -
value
: Refers to the value associated with the key. -
next
: Acts as a reference to the next node.
The HashMap
is fundamentally based on a hash table implementation, and its performance depends on two key parameters: initial capacity and load factor. The original javadocs of the Hash table class define these two parameters as follows:
-
The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
-
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
Let’s now try and understand how the basic operations, put
and get
, work in a HashMap
.
Hash Function
During the insertion (put
) of a key-value pair, the HashMap
first calculates the hash code of the key. The hash function then computes an integer for the key. Classes can use the hashCode
method of the Object
class or override this method and provide their own implementation. (Read about the hash code contract here). The hash code is then XORed (eXclusive OR) with its upper 16 bits (h >>> 16) to achieve a more uniform distribution.
XOR is a bitwise operation that compares two bits, resulting in 1 if the bits are different and 0 if they are the same. In this context, performing a bitwise XOR operation between the hash code and its upper 16 bits (obtained using the unsigned right shift >>>
operator) helps to mix the bits, leading to a more evenly distributed hash code.
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Above, you can see the static hash method of the HashMap
class.
The idea is to have a unique hash code for each key, but the hash function could produce the same hash code for different keys. This leads to a situation known as a collision. We will see how to handle collisions in the next section.
Index Calculation
Once the hash code for a key is generated, the HashMap
calculates an index within the array of buckets to determine where the key-value pair will be stored. This is done using a bitwise AND operation, which is an efficient way to calculate the modulo when the array length is a power of two.
int index = (n - 1) & hash;
Here, we’re calculating the index where n is the length of the bucket array.
Once the index is calculated, the key is then stored at that index in the bucket array. However, if multiple keys end up having the same index, it causes a collision. In such a scenario, the HashMap
handles it in one of two ways:
-
Chaining/Linking: Each bucket in the array is a linked list of nodes. If a key already exists at a particular index and another key gets hashed to the same index, it gets appended to the list.
-
Treeify: If the number of nodes exceeds a certain threshold, the linked list is converted into a tree (This was introduced in Java 8).
static final int TREEIFY_THRESHOLD = 8;
This is the threshold that determines treeification.
Therefore, it is essential to have a good hash function that uniformly distributes the keys across the buckets and minimizes the chances of collisions.
The retrieval (get
) and deletion (remove
) operations work similarly to the insertion (put
) operation. Here’s how:
-
Retrieval (
get
): Computes the hash code using the hash function -> calculates the index using the hash code -> traverses the linked list or tree to find the node with the matching key. -
Deletion (
remove
): Computes the hash code using the hash function -> calculates the index using the hash code -> removes the node from the linked list or tree.
Time Complexity
The basic operations of a HashMap
, such as put
, get
, and remove
, generally offer constant time performance of O(1), assuming that the keys are uniformly distributed. In cases where there is poor key distribution and many collisions occur, these operations might degrade to a linear time complexity of O(n).
Under treeification, where long chains of collisions are converted into balanced trees, lookup operations can improve to a more efficient logarithmic time complexity of O(log n).
Synchronization
The HashMap
implementation is not synchronized. If multiple threads access a HashMap instance concurrently and iterate over the map, and if any one of the threads performs a structural modification (such as adding or removing a key-value mapping) on the map, it leads to a ConcurrentModificationException
.
To prevent this, you can create a thread-safe instance using the Collections.synchronizedMap
method.
Conclusion
In summary, understanding the internal workings of a HashMap
is crucial for developers to make informed decisions. Knowing how a key is mapped, how collisions happen, and how they can be avoided helps you use the HashMap
efficiently and effectively.