Master the fundamental array problem of detecting duplicates using Hash Sets, Sorting, and Brute Force approaches.
LeetCode 217, Contains Duplicate, is often one of the first hurdles you clear when moving from âI know syntaxâ to âI think algorithmically.â It looks deceptively simpleâjust determine if any number appears twice in a list. But the way you solve it actually teaches you the most important trade-off in computer science: time vs. space.
This problem is a favorite in interviews at places like Apple and Amazon not because itâs hard, but because it tests whether you can pick the right tool for the job.
The Guest List Analogy: Imagine you are working the door at an exclusive event.
The most intuitive way to solve this is to just compare everything. Pick the first number, check it against the rest. Pick the second, check it against the rest. Rinse and repeat.
Watch how the number of comparisons explodes even with a tiny array.
The brute force approach compares every element with every other element:
Time Complexity: O(n²) - Very inefficient!
Space Complexity: O(1) - No extra space needed
Note: This is the least efficient approach but requires no extra space.
â ď¸ Performance Warning: This approach runs in O(n²) time. Thatâs fine for 10 items, but if you feed this solution 100,000 elements, your code will time out before it finishes.
Hereâs a different angle: if we organize the data first, the problem changes. Once the array is sorted, any duplicate values must be neighbors.
This transforms the problem from a global search to a local check. You donât need to scan the whole list anymore; just check if nums[i] is the same as the guy next to him (nums[i+1]).
Notice how sorting brings the duplicate 2s together, allowing us to spot them in a single pass.
The sorting approach first sorts the array, then checks adjacent elements:
Time Complexity: O(n log n) - due to sorting
Space Complexity: O(1) or O(n) - depending on sort implementation
While this is way better than brute force, sorting isnât free. It modifies your input array (which might annoy your caller) and costs O(n log n) time.
This is the standard âcorrectâ answer in an interview. We use a Hash Set.
Since Sets are designed to enforce uniqueness, they do the heavy lifting for us. We iterate through the array and try to add each number to the Set. The moment we hit a number thatâs already in there? Boom, we found our duplicate.
See how the algorithm stops immediately upon finding the second 1? We donât even need to look at the rest of the list.
The hash set approach uses a Set data structure to track numbers we've already seen:
Time Complexity: O(n)
Space Complexity: O(n)
Key Concept: The Hash Set gives us O(1) average time for lookups. This brings our total runtime down to linear O(n).
Here is the production-ready code for the Hash Set approach. Notice the early returnâwe bail out the second a duplicate is found rather than finishing the loop.
import java.util.HashSet;
import java.util.Set;
class Solution {
public boolean containsDuplicate(int[] nums) {
// Initialize a HashSet to store unique elements
Set<Integer> seen = new HashSet<>();
for (int num : nums) {
// Set.add() returns false if the element is already present
if (!seen.add(num)) {
return true;
}
}
// If loop finishes, no duplicates were found
return false;
}
} Understanding the trade-offs here is crucial for system design (and passing the interview).
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | Way too slow for big datasets. | ||
| Sorting | or | Changes the input order; space depends on the sort logic. | |
| Hash Set | Fastest, but eats up more RAM. |
Why O(n) Space? In the worst case (where every number is unique), our Hash Set has to memorize every single integer from the input. If the input has 1 million integers, the Set grows to size 1 million.
This isnât just LeetCode trivia; duplicate detection is everywhere in real engineering:
arr[num]), your code will crash because arrays canât have negative indexes. Hash Sets handle this automatically.nums array. If not, you have to copy it first, which increases your memory usage anyway.đĄ Pro Tip: In Python, len(nums) != len(set(nums)) is super clean and Pythonic. However, in an interview, writing out the loop shows you understand the mechanism of early termination. Itâs actually more efficient because you stop immediately if the second item is a duplicate, rather than building the whole set first.
Contains Duplicate is basically the âHello Worldâ of Hash-based algorithms. We trade a little bit of memory ( space) for a massive speed boost ( time).
Unless you are working on embedded systems where RAM is incredibly tight, the Hash Set is almost always the way to go. Itâs cleaner, faster, and easier to read.