Hash Tables ā Dictionary & HashMap Guide
In this tutorial, you'll learn about Hash Tables. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
A Hash Table (also called hash map or dictionary) is a data structure that maps keys to values using a hash function. It provides O(1) average time for insertions, deletions, and lookups ā making it one of the most powerful data structures.
What You'll Learn
You will learn how hash functions work, collision resolution strategies (chaining and open addressing), load factor and resizing, and how to implement a Hash Table from scratch.
Why It Matters
Hash tables power databases, caches, symbol tables in compilers, and every dictionary/map in modern programming languages. They are the backbone of fast lookups in applications handling millions of records.
Real-World Use
Durga Antivirus Pro uses a Hash Table for its malware signature database. When scanning a file, the hash of each file chunk is computed and looked up in the signature Hash Table. A match in O(1) time identifies a known threat instantly.
How Hash Tables Work
graph TD
K1[Key: 'name'] --> H1[Hash: 4] --> B1[Bucket 4: 'Alice']
K2[Key: 'age'] --> H2[Hash: 1] --> B2[Bucket 1: 30]
K3[Key: 'city'] --> H3[Hash: 4] --> C1[Collision! Chaining]
C1 --> N1[Node: 'Paris']
style B1 fill:#4a90d9,stroke:#333,color:#fff
style B2 fill:#4a90d9,stroke:#333,color:#fff
style C1 fill:#e67e22,stroke:#333,color:#fff
The hash function converts a key into an array index. Ideally, keys are distributed uniformly across buckets. When two keys hash to the same index, a collision occurs.
Hash Table Implementation with Chaining
Chaining stores multiple entries in the same bucket using a Linked List:
class HashTable:
def __init__(self, capacity=10):
self.capacity = capacity
self.table = [[] for _ in range(capacity)]
self.size = 0
def _hash(self, key):
return hash(key) % self.capacity
def put(self, key, value):
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
self.size += 1
if self.size > self.capacity * 0.75:
self._resize()
def get(self, key):
index = self._hash(key)
for k, v in self.table[index]:
if k == key:
return v
raise KeyError(key)
def remove(self, key):
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
del bucket[i]
self.size -= 1
return
raise KeyError(key)
def _resize(self):
old_table = self.table
self.capacity *= 2
self.table = [[] for _ in range(self.capacity)]
self.size = 0
for bucket in old_table:
for key, value in bucket:
self.put(key, value)
def __len__(self):
return self.size
def __contains__(self, key):
try:
self.get(key)
return True
except KeyError:
return False
ht = HashTable()
ht.put("name", "Alice")
ht.put("age", 30)
ht.put("city", "Paris")
print("name:", ht.get("name"))
print("age:", ht.get("age"))
print("Has 'city':", "city" in ht)
ht.remove("age")
print("Has 'age':", "age" in ht)
print("Size:", len(ht))
Expected output:
name: Alice
age: 30
Has 'city': True
Has 'age': False
Size: 2
class HashTable {
constructor(capacity = 10) {
this.capacity = capacity;
this.table = new Array(capacity).fill(null).map(() => []);
this.size = 0;
}
_hash(key) {
let hash = 0;
for (let i = 0; i < String(key).length; i++) {
hash = (hash * 31 + String(key).charCodeAt(i)) >>> 0;
}
return hash % this.capacity;
}
put(key, value) {
const index = this._hash(key);
const bucket = this.table[index];
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket[i][1] = value;
return;
}
}
bucket.push([key, value]);
this.size++;
if (this.size > this.capacity * 0.75) this._resize();
}
get(key) {
const index = this._hash(key);
for (const [k, v] of this.table[index]) {
if (k === key) return v;
}
return undefined;
}
remove(key) {
const index = this._hash(key);
const bucket = this.table[index];
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket.splice(i, 1);
this.size--;
return;
}
}
}
_resize() {
const oldTable = this.table;
this.capacity *= 2;
this.table = new Array(this.capacity).fill(null).map(() => []);
this.size = 0;
for (const bucket of oldTable) {
for (const [key, value] of bucket) {
this.put(key, value);
}
}
}
}
let ht = new HashTable();
ht.put("name", "Alice");
ht.put("age", 30);
ht.put("city", "Paris");
console.log("name:", ht.get("name"));
console.log("Has 'city':", ht.get("city") !== undefined);
ht.remove("age");
console.log("Has 'age':", ht.get("age") !== undefined);
Expected output:
name: Alice
Has 'city': true
Has 'age': false
Collision Resolution Strategies
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| Chaining | Linked List per bucket | Simple, handles high load | Extra memory for pointers |
| Linear Probing | Find next empty slot | Cache-friendly | Primary clustering |
| Quadratic Probing | Jump by i² slots | Reduces clustering | May miss empty slots |
| Double Hashing | Second hash for step | Minimal clustering | More computation |
Load Factor and Resizing
The load factor is size / capacity. When it exceeds a threshold (typically 0.75), the table resizes (doubles capacity) to maintain O(1) performance. Resizing is O(n) but happens infrequently.
Built-in Hash Tables
# Python dict
d = {"name": "Alice", "age": 30}
d["city"] = "Paris"
print(d.get("name"))
print(d.get("missing", "default"))
// JavaScript Map
let map = new Map();
map.set("name", "Alice");
map.set("age", 30);
console.log(map.get("name"));
console.log(map.has("missing"));
Expected output:
Alice
default
Alice
false
Common Pitfalls
1. Using Mutable Keys
Lists, dicts, or objects that can change are not valid hash keys. Modifying them after insertion makes them unretrievable.
2. Poor Hash Function Design
A bad hash function causes collisions, degrading performance to O(n). Use built-in hash implementations when possible.
3. Ignoring Load Factor
If the load factor gets too high, performance collapses. Always resize before the load factor exceeds 0.7ā0.75.
4. Thread Safety
Python dicts and JavaScript Maps are not thread-safe. Concurrent modifications can corrupt internal state.
5. Hash Table Ordering
Standard hash tables do not guarantee insertion order. Python 3.7+ dicts preserve insertion order; JavaScript Maps do too. Older versions do not.
Practice Questions
1. Find the first non-repeating character in a string.
def first_non_repeating(s):
counts = {}
for ch in s:
counts[ch] = counts.get(ch, 0) + 1
for ch in s:
if counts[ch] == 1:
return ch
return None
print(first_non_repeating("swiss"))
Expected output: w
2. Two Sum ā find indices of two numbers that add to a target.
def two_sum(nums, target):
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
print(two_sum([2, 7, 11, 15], 9))
Expected output: [0, 1]
3. Group anagrams together from a list of strings.
4. Challenge: Implement an LRU cache that supports get and put in O(1) time, using a Hash Table and doubly Linked List.
5. Real-world task: Durga Antivirus Pro's threat detection system tracks file hashes. Write a function that reads a list of file paths, computes SHA-256 hashes, stores them in a Hash Table, and detects duplicates.
What's Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro