author avatar BooleanStack

217. Contains Duplicate

Master the fundamental array problem of detecting duplicates using Hash Sets, Sorting, and Brute Force approaches.

Contains Duplicate: The Gateway to Hash-Based Algorithms

LeetCode 217, Contains Duplicate, is often one of the first hurdles you clear when moving from “I know syntax” to “I think algorithmically.” It looks deceptively simple—just determine if any number appears twice in a list. But the way you solve it actually teaches you the most important trade-off in computer science: time vs. space.

This problem is a favorite in interviews at places like Apple and Amazon not because it’s hard, but because it tests whether you can pick the right tool for the job.

Real-World Analogy

The Guest List Analogy: Imagine you are working the door at an exclusive event.

  1. Brute Force: You let everyone inside, then walk around the room asking literally every person to introduce themselves to every other person to see if names match. (This is a disaster).
  2. Sorting: You yell, “Everyone line up alphabetically!” Then you walk down the line and just check if the person standing next to you has the same name. (Faster, but organizing people is hard work).
  3. Hash Set: You have a clipboard (a Set). As each person walks up, you check if their name is already on your list. If it is, you stop them. If not, you write it down. (Fastest and most efficient).

Approach 1: Brute Force (Naive)

The most intuitive way to solve this is to just compare everything. Pick the first number, check it against the rest. Pick the second, check it against the rest. Rinse and repeat.

Visualization: Brute Force Comparison

Watch how the number of comparisons explodes even with a tiny array.

Contains Duplicate - Brute Force Approach

Click 'Start Algorithm' to begin
3
[0]
1
[1]
4
[2]
1
[3]
5
[4]
9
[5]
Outer Loop (i)
-
Inner Loop (j)
-

Algorithm Explanation

The brute force approach compares every element with every other element:

  1. Use nested loops to compare each pair of elements
  2. Outer loop (i) goes from 0 to n-1
  3. Inner loop (j) goes from i+1 to n
  4. Compare array[i] with array[j]
  5. If they match, we found a duplicate - return true
  6. If no matches found after all comparisons, return false

Time Complexity: O(n²) - Very inefficient!

Space Complexity: O(1) - No extra space needed

Note: This is the least efficient approach but requires no extra space.

⚠️ Performance Warning: This approach runs in O(n²) time. That’s fine for 10 items, but if you feed this solution 100,000 elements, your code will time out before it finishes.

Approach 2: Sorting

Here’s a different angle: if we organize the data first, the problem changes. Once the array is sorted, any duplicate values must be neighbors.

This transforms the problem from a global search to a local check. You don’t need to scan the whole list anymore; just check if nums[i] is the same as the guy next to him (nums[i+1]).

Visualization: Sorting Method

Notice how sorting brings the duplicate 2s together, allowing us to spot them in a single pass.

Contains Duplicate - Sorting Approach

Click 'Start Algorithm' to begin

Original Array:

4
[0]
2
[1]
7
[2]
1
[3]
9
[4]
2
[5]
5
[6]

Algorithm Explanation

The sorting approach first sorts the array, then checks adjacent elements:

  1. Sort the array in ascending order
  2. Iterate through the sorted array
  3. Compare each element with its next neighbor
  4. If any two adjacent elements are equal, we found a duplicate
  5. If we finish checking all pairs without finding duplicates, return false

Time Complexity: O(n log n) - due to sorting

Space Complexity: O(1) or O(n) - depending on sort implementation

While this is way better than brute force, sorting isn’t free. It modifies your input array (which might annoy your caller) and costs O(n log n) time.

Approach 3: Hash Set (Optimal)

This is the standard “correct” answer in an interview. We use a Hash Set.

Since Sets are designed to enforce uniqueness, they do the heavy lifting for us. We iterate through the array and try to add each number to the Set. The moment we hit a number that’s already in there? Boom, we found our duplicate.

Visualization: Hash Set Detection

See how the algorithm stops immediately upon finding the second 1? We don’t even need to look at the rest of the list.

Contains Duplicate - Hash Set Approach

Click 'Start Algorithm' to begin
1
[0]
2
[1]
3
[2]
4
[3]
5
[4]
6
[5]
7
[6]
3
[7]
9
[8]

Hash Set Contents:

Empty

Algorithm Explanation

The hash set approach uses a Set data structure to track numbers we've already seen:

  1. Initialize an empty hash set
  2. Iterate through each element in the array
  3. For each element, check if it exists in the set
  4. If yes, we found a duplicate - return true
  5. If no, add the element to the set and continue
  6. If we finish the loop without finding duplicates, return false

Time Complexity: O(n)

Space Complexity: O(n)

Key Concept: The Hash Set gives us O(1) average time for lookups. This brings our total runtime down to linear O(n).

Implementation

Here is the production-ready code for the Hash Set approach. Notice the early return—we bail out the second a duplicate is found rather than finishing the loop.

import java.util.HashSet;
import java.util.Set;

class Solution {
    public boolean containsDuplicate(int[] nums) {
        // Initialize a HashSet to store unique elements
        Set<Integer> seen = new HashSet<>();
        
        for (int num : nums) {
            // Set.add() returns false if the element is already present
            if (!seen.add(num)) {
                return true;
            }
        }
        
        // If loop finishes, no duplicates were found
        return false;
    }
}

Complexity Analysis

Understanding the trade-offs here is crucial for system design (and passing the interview).

ApproachTime ComplexitySpace ComplexityNotes
Brute ForceO(n2)O(n^2)O(1)O(1)Way too slow for big datasets.
SortingO(nlog⁥n)O(n \log n)O(1)O(1) or O(n)O(n)Changes the input order; space depends on the sort logic.
Hash SetO(n)O(n)O(n)O(n)Fastest, but eats up more RAM.

Why O(n) Space? In the worst case (where every number is unique), our Hash Set has to memorize every single integer from the input. If the input has 1 million integers, the Set grows to size 1 million.

Practical Applications

This isn’t just LeetCode trivia; duplicate detection is everywhere in real engineering:

  1. Data Validation: Stopping a user from signing up with an email address that’s already in the database.
  2. Caching: Checking if a specific request ID has already been handled so you don’t accidentally charge a credit card twice.
  3. Fraud Detection: Flagging it when the same IP address tries to make ten transactions in one second.

Common Pitfalls & Edge Cases

💡 Pro Tip: In Python, len(nums) != len(set(nums)) is super clean and Pythonic. However, in an interview, writing out the loop shows you understand the mechanism of early termination. It’s actually more efficient because you stop immediately if the second item is a duplicate, rather than building the whole set first.

Conclusion

Contains Duplicate is basically the “Hello World” of Hash-based algorithms. We trade a little bit of memory (O(n)O(n) space) for a massive speed boost (O(n)O(n) time).

Unless you are working on embedded systems where RAM is incredibly tight, the Hash Set is almost always the way to go. It’s cleaner, faster, and easier to read.

Similar Posts