PSA Hash Collisions in competitive programming

Focusing on the basics is important. Here is one example -

A lot of people were hacked on problem C in yesterday’s codeforces div3 round. The reason was very simple. Hash collisions.

When beginners start competitive programming, they learn that hashmap (namely unordered_map/unordered_set in C++ and dictionaries in Python) have O(1) time complexity, while height-balanced binary search-based (namely map/set) has O(logN) time complexity.

So why use map with O(logN) time complexity, when we can use unordered_map with O(1) time complexity and optimise our code? Pretty simple right?

What beginners don’t focus on is why this is O(1) time complexity. Why does a normal map exist? And what are the tradeoffs of using one over the other?

Just like insurance policies have “TnC apply” written in fine print. Unordered_map also has such fine prints.

On average, they are O(1), but in the worst case, they can be O(n) if hash collisions happen.

In fact, it is possible to construct such test cases where time taken will be O(n). LGM neal has shared how to construct such testcases, and following the same procedure people have also discovered how to create anti hash table tests for Python dictionary.

On personal note, among 5000+ problems I have solved so far, only once I have used unordered_map, I knew very well why I’m using it, and using map did gave TLE on that problem.

The high constant hidden behind O(1) and considering how easy it is to hack unordered_map, it’s not worth using unordered_map over the map trying to decrease time complexity.

one should always use map/set over unordered_map/unordered_set unless they know why they need unordered_map/unordered_set.