217. Contains Duplicate

November 30, 2025 algorithm data-structures hash-set arrays sorting

Master the fundamental array problem of detecting duplicates using Hash Sets, Sorting, and Brute Force approaches.

Contains Duplicate: The Gateway to Hash-Based Algorithms

LeetCode 217, Contains Duplicate, is often one of the first hurdles you clear when moving from “I know syntax” to “I think algorithmically.” It looks deceptively simple—just determine if any number appears twice in a list. But the way you solve it actually teaches you the most important trade-off in computer science: time vs. space.

This problem is a favorite in interviews at places like Apple and Amazon not because it’s hard, but because it tests whether you can pick the right tool for the job.

Real-World Analogy

The Guest List Analogy: Imagine you are working the door at an exclusive event.

Brute Force: You let everyone inside, then walk around the room asking literally every person to introduce themselves to every other person to see if names match. (This is a disaster).
Sorting: You yell, “Everyone line up alphabetically!” Then you walk down the line and just check if the person standing next to you has the same name. (Faster, but organizing people is hard work).
Hash Set: You have a clipboard (a Set). As each person walks up, you check if their name is already on your list. If it is, you stop them. If not, you write it down. (Fastest and most efficient).

Approach 1: Brute Force (Naive)

The most intuitive way to solve this is to just compare everything. Pick the first number, check it against the rest. Pick the second, check it against the rest. Rinse and repeat.

Visualization: Brute Force Comparison

Watch how the number of comparisons explodes even with a tiny array.

Contains Duplicate - Brute Force Approach

Array:

Click 'Start Algorithm' to begin

[0]

[1]

[2]

[3]

[4]

[5]

Outer Loop (i)

Inner Loop (j)

Algorithm Explanation

The brute force approach compares every element with every other element:

Use nested loops to compare each pair of elements
Outer loop (i) goes from 0 to n-1
Inner loop (j) goes from i+1 to n
Compare array[i] with array[j]
If they match, we found a duplicate - return true
If no matches found after all comparisons, return false

Time Complexity: O(n²) - Very inefficient!

Space Complexity: O(1) - No extra space needed

Note: This is the least efficient approach but requires no extra space.

⚠️ Performance Warning: This approach runs in O(n²) time. That’s fine for 10 items, but if you feed this solution 100,000 elements, your code will time out before it finishes.

Approach 2: Sorting

Here’s a different angle: if we organize the data first, the problem changes. Once the array is sorted, any duplicate values must be neighbors.

This transforms the problem from a global search to a local check. You don’t need to scan the whole list anymore; just check if nums[i] is the same as the guy next to him (nums[i+1]).

Visualization: Sorting Method

Notice how sorting brings the duplicate 2s together, allowing us to spot them in a single pass.

Contains Duplicate - Sorting Approach

Click 'Start Algorithm' to begin

Original Array:

[0]

[1]

[2]

[3]

[4]

[5]

[6]

Algorithm Explanation

The sorting approach first sorts the array, then checks adjacent elements:

Sort the array in ascending order
Iterate through the sorted array
Compare each element with its next neighbor
If any two adjacent elements are equal, we found a duplicate
If we finish checking all pairs without finding duplicates, return false

Time Complexity: O(n log n) - due to sorting

Space Complexity: O(1) or O(n) - depending on sort implementation

While this is way better than brute force, sorting isn’t free. It modifies your input array (which might annoy your caller) and costs O(n log n) time.

Approach 3: Hash Set (Optimal)

This is the standard “correct” answer in an interview. We use a Hash Set.

Since Sets are designed to enforce uniqueness, they do the heavy lifting for us. We iterate through the array and try to add each number to the Set. The moment we hit a number that’s already in there? Boom, we found our duplicate.

Visualization: Hash Set Detection

See how the algorithm stops immediately upon finding the second 1? We don’t even need to look at the rest of the list.

Contains Duplicate - Hash Set Approach

Array:

Click 'Start Algorithm' to begin

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Hash Set Contents:

Empty

Algorithm Explanation

The hash set approach uses a Set data structure to track numbers we've already seen:

Initialize an empty hash set
Iterate through each element in the array
For each element, check if it exists in the set
If yes, we found a duplicate - return true
If no, add the element to the set and continue
If we finish the loop without finding duplicates, return false

Time Complexity: O(n)

Space Complexity: O(n)

Key Concept: The Hash Set gives us O(1) average time for lookups. This brings our total runtime down to linear O(n).

Implementation

Here is the production-ready code for the Hash Set approach. Notice the early return—we bail out the second a duplicate is found rather than finishing the loop.

java

cpp

python

import java.util.HashSet;
import java.util.Set;

class Solution {
    public boolean containsDuplicate(int[] nums) {
        // Initialize a HashSet to store unique elements
        Set<Integer> seen = new HashSet<>();
        
        for (int num : nums) {
            // Set.add() returns false if the element is already present
            if (!seen.add(num)) {
                return true;
            }
        }
        
        // If loop finishes, no duplicates were found
        return false;
    }
}

#include <vector>
#include <unordered_set>
using namespace std;

class Solution {
public:
    bool containsDuplicate(vector<int>& nums) {
        // Use unordered_set for O(1) average time complexity
        unordered_set<int> seen;
        
        for (int num : nums) {
            // Check if num is already in the set
            if (seen.count(num)) {
                return true;
            }
            // Insert num into the set
            seen.insert(num);
        }
        
        return false;
    }
};

class Solution:
    def containsDuplicate(self, nums: list[int]) -> bool:
        # Initialize an empty set
        seen = set()
        
        for num in nums:
            # Check if number was already seen
            if num in seen:
                return True
            seen.add(num)
            
        return False
        
    # Alternative One-Liner (Explanation below):
    # return len(nums) != len(set(nums))

Complexity Analysis

Understanding the trade-offs here is crucial for system design (and passing the interview).

Approach	Time Complexity	Space Complexity	Notes
Brute Force	$O(n^2)$	$O(1)$	Way too slow for big datasets.
Sorting	$O(n \log n)$	$O(1)$ or $O(n)$	Changes the input order; space depends on the sort logic.
Hash Set	$O(n)$	$O(n)$	Fastest, but eats up more RAM.

Why O(n) Space? In the worst case (where every number is unique), our Hash Set has to memorize every single integer from the input. If the input has 1 million integers, the Set grows to size 1 million.

Practical Applications

This isn’t just LeetCode trivia; duplicate detection is everywhere in real engineering:

Data Validation: Stopping a user from signing up with an email address that’s already in the database.
Caching: Checking if a specific request ID has already been handled so you don’t accidentally charge a credit card twice.
Fraud Detection: Flagging it when the same IP address tries to make ten transactions in one second.

Common Pitfalls & Edge Cases

Large Inputs: Always assume $N$ can be up to $10^5$ . The brute force solution will fail.
Negative Numbers: The constraints say numbers can be negative. If you try to get clever and use a simple array as a frequency counter (like arr[num]), your code will crash because arrays can’t have negative indexes. Hash Sets handle this automatically.
Input Modification: If you choose the sorting approach, ask your interviewer if you’re allowed to mess up the order of the nums array. If not, you have to copy it first, which increases your memory usage anyway.

💡 Pro Tip: In Python, len(nums) != len(set(nums)) is super clean and Pythonic. However, in an interview, writing out the loop shows you understand the mechanism of early termination. It’s actually more efficient because you stop immediately if the second item is a duplicate, rather than building the whole set first.

Conclusion

Contains Duplicate is basically the “Hello World” of Hash-based algorithms. We trade a little bit of memory ( $O(n)$ space) for a massive speed boost ( $O(n)$ time).

Unless you are working on embedded systems where RAM is incredibly tight, the Hash Set is almost always the way to go. It’s cleaner, faster, and easier to read.

217. Contains Duplicate

Contains Duplicate: The Gateway to Hash-Based Algorithms

Real-World Analogy

Approach 1: Brute Force (Naive)

Visualization: Brute Force Comparison

Contains Duplicate - Brute Force Approach

Algorithm Explanation

Approach 2: Sorting

Visualization: Sorting Method

Contains Duplicate - Sorting Approach

Original Array:

Algorithm Explanation

Approach 3: Hash Set (Optimal)

Visualization: Hash Set Detection

Contains Duplicate - Hash Set Approach

Hash Set Contents:

Algorithm Explanation

Implementation

Complexity Analysis

Practical Applications

Common Pitfalls & Edge Cases

Conclusion

Similar Posts