author avatar BooleanStack

49. Group Anagrams

Master the art of grouping anagrams using hash maps and canonical representations to solve LeetCode 49 efficiently.

Group Anagrams: Finding Order in Chaos

The Group Anagrams problem is one of those classic interview questions that looks scary at first glance but is actually pretty satisfying once you see the trick. The goal? Take a messy list of words and organize them so that words made of the same letters sit together.

It’s a fantastic problem for learning how to design keys for hash maps and how to normalize data into what we call a “canonical form.”

If you’ve ever played Scrabble or solved word jumbles, you’ve essentially run this algorithm in your head: you look at a pile of letters and realize that “eat”, “tea”, and “ate” are just different arrangements of the same building blocks.

Real-World Analogy

Real-World Analogy: Think of a LEGO set. A LEGO car and a LEGO boat might look completely different, but if you take them apart brick by brick, they might consist of the exact same pile of pieces.

In this problem, that “pile of bricks” is our canonical form. We disassemble every word into its base components (letters), and if two words produce the exact same pile, they belong in the same box.

The Sorting Approach

The most intuitive way to handle this? Sort the letters. If you take any anagram and sort its characters alphabetically, they become identical.

By using this sorted string as a key in a Hash Map (or Dictionary), we can group all the matching original strings together. It’s simple, clean, and easy to explain.

Interactive Visualization: Hash Map Grouping

Explore how words are processed, sorted into keys, and stored in the hash map.

Group Anagrams - HashMap Method

Click 'Start Grouping' to begin

Input Words:

eat
tea
tan
ate
nat
bat

Anagram Groups (HashMap):

Algorithm Explanation

  1. Create a hash map to store groups of anagrams
  2. For each word, sort its characters to create a key (e.g., "eat" → "aet")
  3. Use the sorted string as the hash map key
  4. Add the original word to the array at that key
  5. All anagrams will have the same sorted key and end up in the same group

Time Complexity: O(N × K log K) where N is the number of words and K is the maximum length of a word

Core Concepts

To crack this efficiently, we need to lean on two main ideas:

  1. Canonical Representation : Think of this as a unique ID tag. For anagrams, we need a version of the word that looks exactly the same for every member of the group, but different for anyone else.

    Key Concept: A Canonical Representation transforms complex data into a standard format (like a sorted string) so we can easily compare and group things.

  2. Hash Map (Dictionary): This is where we store our groups. It gives us lightning-fast lookups (O(1)O(1) on average), which is exactly what we need for performance.

The Frequency Count Approach (Optimization)

Sorting is the most readable solution, but it isn’t always the fastest. It costs O(KlogK)O(K \log K) per string (where KK is the string length). If you’re dealing with massive strings, that math starts to hurt.

A faster (though slightly more complex) way uses Character Counts. Since anagrams have the exact same count of every character, we can use the frequency array (e.g., a:1, b:0, c:0, ..., e:1, ...) as the key.

Interactive Visualization: Character Counting

See how we can generate keys without sorting by counting character frequencies.

Group Anagrams - Character Count Method

Click 'Start Grouping' to begin

Input Words:

eat
tea
tan
ate
nat
bat

Anagram Groups:

Character Count Method

  1. Create a frequency array of size 26 (for a-z)
  2. For each word, count the occurrence of each character
  3. Convert the frequency array to a string key (e.g., "1#0#0#1#1#0...#1")
  4. Use this key in the hash map to group anagrams
  5. Words with identical character frequencies are anagrams

Time Complexity: O(N × K) where N is the number of words and K is the maximum length of a word
Advantage: Better than sorting approach (O(N × K log K))

Implementation

Here is the implementation using the Sorting Approach. I’m sticking with this one because it’s the most universal and, honestly, the easiest one to write out on a whiteboard without bugging out. Note that in Python, the dictionary simplifies the logic significantly.

class Solution:
  def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
      # Dictionary to map the sorted string (key) to list of anagrams
      anagram_map = defaultdict(list)
      
      for s in strs:
          # Create the key by sorting the string
          # 'eat' -> ['a', 'e', 't'] -> 'aet'
          key = "".join(sorted(s))
          
          # Append the original string to the correct group
          anagram_map[key].append(s)
          
      # Return just the grouped lists (values of the map)
      return list(anagram_map.values())

Complexity Analysis

You need to know the trade-offs here, especially if your interviewer asks, “Can we make this faster?”

Group Anagrams - Interactive Comparison

Word Collection

eat
tea
tan
ate
nat
bat

Anagram Checker

What are Anagrams?

Anagrams are words or phrases formed by rearranging the letters of another word or phrase, using all the original letters exactly once.

✓ Examples of Anagrams:

  • • "listen" ↔ "silent"
  • • "eat" ↔ "tea" ↔ "ate"
  • • "dormitory" ↔ "dirty room"

✗ Not Anagrams:

  • • "hello" vs "world" (different letters)
  • • "abc" vs "abcd" (different lengths)
  • • "aab" vs "abb" (different frequencies)

Time Complexity

Time Complexity: O(NKlogK)O(N \cdot K \log K)

Space Complexity

Space Complexity: O(NK)O(N \cdot K)

Common Pitfalls & Edge Cases

I’ve seen a lot of candidates trip up on these specific edge cases. Watch out for them:

⚠️ Pitfall (Python): Lists are mutable and cannot be used as dictionary keys. If you use the Character Count approach in Python, you’ll get an error if you try to use a list like [1, 0, 2...] as a key. You must convert it to a tuple (1, 0, 2...) first.

  1. Empty Inputs: Make sure you handle strs = [] or strs = [""]. The logic usually handles [""] naturally (the key is just an empty string), but an empty list input [] might need an early return in some languages.
  2. Array Hashing (Java): In Java, arrays int[] don’t hash by their content. If you try to use the count array as a key, it’ll use the memory address instead. You have to wrap it in Arrays.toString() or a List to get it to behave.
  3. Case Sensitivity: LeetCode 49 specifies lowercase English letters, but the real world is messy (“Tea” vs “eat”). In an interview, always ask if you need to normalize case (e.g., s.toLowerCase()) before generating the key.

Practical Applications

Believe it or not, this isn’t just a toy problem. This pattern—grouping by a “normalized” key—shows up all over the place:

  1. Data Deduplication: Identifying duplicate records that might have fields in different orders (like JSON objects where key order doesn’t matter).
  2. Fuzzy Search: Grouping search terms that are just misspellings or variations of each other.
  3. Log Aggregation: Grouping error logs that are identical except for the specific timestamp or ID.

Conclusion

The “Group Anagrams” problem is really just a lesson in data transformation. By converting messy, varied data into a strict canonical form, we can use Hash Maps to organize chaos instantly. Whether you choose sorting (for readability) or frequency counting (for raw speed), the core principle is the same: find the hidden structure that similar items share.

💡 Pro Tip: In a system design interview, if you’re asked to group billions of documents by similarity, this “Canonical Key + Hash Map” pattern is often the first step in a MapReduce job!

Similar Posts