Master the O(n) solution for finding the longest consecutive sequence in an unsorted array using Hash Sets.
The Longest Consecutive Sequence problem is one of those interview questions that separates the āI can codeā folks from the āI understand algorithmsā folks. Hereās what you need to do: given an unsorted array of integers, find the length of the longest consecutive elements sequence.
Sounds simple, right? Hereās the catchāyou need to solve it in O(n) time.
That one constraint changes everything. It immediately throws out the obvious solution of sorting the array first, since comparison-based sorting takes at least . Instead, youāll need to think about data structures that give you constant-time lookups. Thatās where things get interesting.
The Shuffled Deck: Picture a deck of cards scattered all over your table. Youāre trying to find the longest straight (like 4, 5, 6, 7). Sure, you could spend time organizing all the cards in order first, but thatās slow. Instead, you can pick up any cardāsay, a 5āand just glance around to see if thereās a 4 or 6 nearby. By checking whatās available instantly, you can build sequences without organizing the whole mess.
Letās start with why sorting feels natural (even though it wonāt work for our time constraint).
If we sort [100, 4, 200, 1, 3, 2], we get [1, 2, 3, 4, 100, 200]. Now we can just walk through the array and count consecutive numbers that differ by 1. Easy enough.
The logicās solid, but hereās the problem: the sorting step kills our performance. We end up with complexity. For huge datasets, that difference between and really adds up.
To hit that O(n) sweet spot, we need to ditch sorting entirely. Enter the Hash Setāour secret weapon for O(1) lookups.
Hereās how it works:
num, check if num - 1 exists.num - 1 exists, then num isnāt the start of a sequence (itās part of one that started earlier). Skip it.num - 1 doesnāt exist, bingoānum is the start of a new sequence.while loop to check for num + 1, num + 2, etc., until the sequence breaks. Track the length.Watch how the algorithm identifies sequence starts and walks through them efficiently.
Click "Start" to begin visualization
0
-
0
-
Time Complexity: O(n) - Each number is visited at most twice
Space Complexity: O(n) - HashSet stores all numbers
Key Insight:
Only start counting from numbers that are the beginning of a sequence (num-1 not in set)
I know what youāre thinkingāthereās a while loop inside a for loop. Shouldnāt that be ?
Hereās the thing: the while loop only runs when we find the start of a sequence. And hereās the key insightāeach number gets processed at most twice: once when we check if itās a start, and once when we count it as part of a sequence. We never process the same sequence multiple times.
The Magic Trick: That if (num - 1) not in set check ensures we only do the expensive counting work exactly once per sequence. No redundant work.
Hereās the code in Java, C++, and Python. Itās production-ready and handles all the edge cases.
import java.util.HashSet;
import java.util.Set;
class Solution {
public int longestConsecutive(int[] nums) {
// Handle edge case: empty array
if (nums == null || nums.length == 0) {
return 0;
}
// Add all numbers to a HashSet for O(1) lookup
Set<Integer> numSet = new HashSet<>();
for (int num : nums) {
numSet.add(num);
}
int longestStreak = 0;
for (int num : numSet) {
// Only attempt to build a sequence if 'num' is the start
// (i.e., num - 1 is not in the set)
if (!numSet.contains(num - 1)) {
int currentNum = num;
int currentStreak = 1;
// Count consecutive numbers
while (numSet.contains(currentNum + 1)) {
currentNum += 1;
currentStreak += 1;
}
longestStreak = Math.max(longestStreak, currentStreak);
}
}
return longestStreak;
}
} Hereās something interestingāyou can also solve this using Union-Find (Disjoint Set Union). Think of each number as a node in a graph. When you find adjacent numbers (num and num+1), you union them. The answer becomes the size of the largest connected component.
See how connecting adjacent numbers creates components, where the biggest component is your answer.
Click "Start" to begin visualization
0
0 / -1
Time Complexity: O(n α(n)) ā O(n) - α is inverse Ackermann function
Space Complexity: O(n) - Parent and rank arrays
How it works:
Union-Find is a neat approach conceptually, but honestly? Itās more complex to implement and has slightly more overhead than the Hash Set solution. Iād stick with Hash Sets for interviews unless youāre specifically asked about Union-Find.
Letās break down why the Hash Set solution is so efficient.
Time Complexity: O(n)
Yeah, thereās a while loop inside a for loop, but donāt let that fool you. The inner loop only runs when num - 1 isnāt in the set. This means each number participates in the counting process at most once.
Space Complexity: O(n)
Weāre storing all unique elements in the Hash Set. Worst case (all elements are unique)? Linear space.
Let me save you from some mistakes Iāve seen (and made):
Forgetting the āStartā Check : If you skip the if (!numSet.contains(num - 1)) check, your algorithm becomes in the worst case. Imagine the input [1, 2, 3, 4, 5]āyouād recount the sequence 2,3,4,5, then 3,4,5, and so on. Not good.
Using a List for Lookups :
if num in list:
Checking if something exists in a List is . Do that inside a loop and youāre back to . Always use a Set for this problem.
Handling Duplicates: The problem wants consecutive integers. So [1, 2, 2, 3] has a longest sequence of 3 (1, 2, 3). The Hash Set automatically handles this by deduplicating the input. Nice.
Integer Overflow (the sneaky one): In C++ or Java, watch out for numbers near their max values. If num is Integer.MAX_VALUE, then num + 1 wraps around (overflow). If that wrapped value happens to be in your set, your loop might behave unexpectedly. Itās rare in typical test cases, but mentioning it in interviews shows you think about edge cases.
The Longest Consecutive Sequence problem is a perfect example of the classic space-time tradeoff. We use extra space (the Hash Set) to unlock time complexity, completely sidestepping the limitations of sorting. The real trick? Recognizing sequence boundaries so we donāt waste time recounting the same sequences.
This patternāusing a Hash Set to enable linear-time traversals of unsorted dataācomes up all the time in algorithm problems. Master it, and youāll have a powerful tool for your next interview.