The question is simple: how do you find the kth largest element in an unsorted array? They love asking it, and you sure as hell should know it. Here I’ll just list the possible solutions:
- Just sort it, easy peasy
O(n*log(n))
. - Min heap,
O(k+(n-k)*log(k))
. - Max heap,
O(n+k*log(n))
. - Some gimmick quick sort,
O(n^2)
worst,O(n)
average.
For min heap:
- build a min heap using k elements.
- for the rest elements:
- if it’s smaller than heap.min ignore it
- otherwise pop the heap and put the new one in.
- in the end heap.min is the kth largest.
For max heap:
- build a max heap using all n elements.
- pop k elements and the last pop is the kth largest.
Here are some questions you should be able to answer with ease:
-
Q: What about an almost sorted array?
A: use insertion sort.
-
Q: What’s the difference of min heap and max heap.
A: use min heap if you have a data stream.
-
Q: How do we guarantee that we always get the average time using quick sort.
A: shuffle the array first.
Now since this problem is so simple, interviewer often wants you to implement the binary heap, it is crucial that you know how to construct the heap in O(n)
time. Since ruby doesn’t have build-in heap, it’s good practice to write one yourself, (watch this):
class MinHeap
def initialize(arr=[])
@arr = [nil]
@arr = @arr.concat(arr)
(arr.size/2).downto(1).each { |i| sink(i) }
end
def insert(x)
@arr << x
swim(size)
end
def pop
x = @arr[1]
@arr[1] = @arr[-1]
@arr.pop
sink(1)
x
end
def size
@arr.size - 1
end
def min
@arr[1]
end
private
def swim(i)
while i > 1 && @arr[i/2] > @arr[i]
swap(i, i/2)
i = i/2
end
end
def sink(i)
while i*2 <= size
j = i*2
j += 1 if j < size && @arr[j+1] < @arr[j]
break if @arr[j] > @arr[i]
swap(i, j)
i = j
end
end
def swap(i, j)
@arr[i], @arr[j] = @arr[j], @arr[i]
end
end
Then to solve the kth largest using MinHeap
:
def kth_largest(arr, k)
h = MinHeap.new(arr[0..(k-1)])
arr[k..].each do |x|
if x > h.min
h.pop
h.insert(x)
end
end
h.min
end
Max heap is basically the same, I’ll leave it to you.
The best answer is the quick sort method, and it doesn’t require additional space. However I can only hope the interviewer doesn’t want you to analyze the time complexity because I myself have trouble understanding it. Anyway, the idea is simple, we simply abandon ‘half’ of the input on every recursion and stop when we find it. The key is the three_way_partition
method, it handles duplicates and it’s easy to implement, for example:
k = 3, nums = [5, 4, 1, 1, 4, 3, 5, 7]
three way partition, v = 5
[5, 4, 1, 1, 4, 3, 5, 7]
| | |
lt i gt
4<5, swap(lt,i) then lt++, i++
[4, 5, 1, 1, 4, 3, 5, 7]
| | |
lt i gt
1<5, swap(lt,i) then lt++, i++
[4, 1, 5, 1, 4, 3, 5, 7]
| | |
lt i gt
1<5, swap(lt,i) then lt++, i++
[4, 1, 1, 5, 4, 3, 5, 7]
| | |
lt i gt
4<5, swap(lt,i) then lt++, i++
[4, 1, 1, 4, 5, 3, 5, 7]
| | |
lt i gt
3<5, swap(lt,i) then lt++, i++
[4, 1, 1, 4, 3, 5, 5, 7]
| | |
lt i gt
5==5, i++
[4, 1, 1, 4, 3, 5, 5, 7]
| /\
lt i gt
7>5, swap(gt,i), then gt--
[4, 1, 1, 4, 3, 5, 5, 7]
| | |
lt gt i
i > gt, break, return [5, 6]
This is the first iteration of three way partition, we found that since k==3, length-k==5, it’s between [5,6] so we found the kth largest item. if k>3, we just repeat above step within [0,4], aka the left side else we search the right side.
def find_kth_largest(nums, k)
nums.shuffle!
helper(nums, 0, nums.length-1, k)
end
def helper(nums, lo, hi, k)
left, right = three_way_partition(nums, lo, hi)
pos = nums.length-k
if pos < left
helper(nums, lo, left-1, k)
elsif pos > right
helper(nums, right+1, hi, k)
else
nums[left]
end
end
def three_way_partition(nums, lo, hi)
i, lt, gt, v = lo+1, lo, hi, nums[lo]
while i <= gt
if nums[i] > v
swap(nums, i, gt)
gt -= 1
elsif nums[i] < v
swap(nums, i, lt)
i += 1
lt += 1
else
i += 1
end
end
[lt, gt]
end
def swap(nums, i, j)
nums[i], nums[j] = nums[j], nums[i]
end
There are some other gimmick questions, like what if we can’t load all the numbers into memory at once. Min heap is a good answer since it only has O(k)
space requirement. Anyway knowing all this should help you regardless.