Direct Access Table
-
Hash Table is improvement over DAT
Direct Access Table DAT? Take huge array and use phone numbers as index in the array. if phone number is not present entry is empty, else the array entry stores pointer to records corresponding to phone number.
Searching in DA: O(1) but HUGE Extra Space required.
What is Hash Table
-
Data structure that associates keys with values. <key, value>
Supports constant time lookups. Average Case: O(1), Worst case: O(n)
Hash Function? Hash function maps a big number or string to a small integer that can be used as index in hash table.
key -> |Hash Function| -> index of array/table
Advantages of Hash Table: Search, Delete, Insert: O(1)=Average. O(n)=During resize
Disadvantages:
1. Elements are not sorted
2. Rehashing: Once all HT entries are filled it needs resized/rehashing which is a time-consuming operation. Let HT size = 100, we want to insert 101st element. Not only the size of hash table is enlarged to 150, all element in hash table have to be rehashed. This insertion operation takes O(n).
Internal Implementation of Hash Table
-
Let use consider simple example to insert following key,value pairs:
{"abc",1}, {"bac",2}, {"cab",3}
Keys("abc", "bac", "cab") are passed to hashfunction and indexes are returned in bucket which acts as hash table
Collision: if same index is returned, chaining is done, ie element is inserted into next position in vector
Hash table
|
|
How hashFunction() works
How insertIntoHashTable() works
|
Hash Collision
-
This means hash function provides same index for 2 different keys. See
example provided above in
Internal Implementation of Hash Table
2 methods to solve Hash Collission problem:
1. Open Addressing
-
All elements are stored in HT itself. Once same hash is derived, insert
element in hash table itself no seperate chains. Types of Open
addressing
1a. Linear/Sequential probing: Once same hash is derived, inserts the new item in the next open spot in the table ie next to already existent element with same hash.
Deletion: is tricky here removing one element might break a chain of insertions, making some elements inaccessible. We need to reinsert all the items into new holes.
1b. Quadratic Probing: look for i2th slot in i’th iteration. New hash function = (xmod7 + i2)%hash_table_size
1c. Double Hashing: use double hash function to re-calculate the hash if collision occurs. In case of collision: hash1(x) = (hash1(x) + i*hash2(x))%hash_table_size
2. Seperate Chaining
-
Each cell of HT point to a DOUBLY LINKED LIST of records that have same
hash function value. This requires additional memory outside the
table.
Insert(x): O(1). Insert at head of LL
Search(x): O(n). Need to search complete list
Delete(x): O(1). Assumed we are at element in chain, we delete x. copy prev to next pointer. In singly LL we need O(n) time to search prev node.
Advantages: Simple implementation, Space will never exhaust, Less sensitive to hash function
Disadvantages: Once LL/chain grows long, performance will degrade search Time=O(n) Space wastage, some parts of hash table may never be used
Rehashing
-
Let us consider unordered_map<int,string> storing unique keys. At
start of program sizeof hash table=3
Key | Value
----------
01 | amit
02 | never
03 | give
Now, (4, up) need to be stored, but hash table has no space so size of
hash table is increased to 6. (old Hash function = %3) we can only goto
index number=2. But we want to reach 5. Hence Hash function is changed (new Hash function = %6). So hash is again calculated for existing values.
Implementations
2-left hashing
-
2 equal sized hash tables having 2 seperate hash functions are used.
Seperate chanining is used in both. Table1(hash1), Table2(hash2).
Cuckoo Hashing is a type of 2-left hashing
Inserting key into table: Always keep Table1 loaded, key is inserted into Table2 only when too many collisions happen on Table1.
Advantages? 1. On parallel systems. 2 cores can query 2 hash tables, ie Thread-1 can query Table1 and Thread2 Table2.
d-left Hashing
-
Split 1 hash table into d blocks. Similar to 2-left hashing, left most
hashtable should be loaded.
d buckets, n entries. Each bucket has n/d entries
Code
-
1. Student Hash table
Hash table stores Student data (Key:Value)
(1,20) (2,70) (42,80) (4,25) (12,44) (13,78) (14,32) (17,11) (37,97)
2. Leetcode Solved. Design a HashMap without using any built-in hash
table libraries.