Length of Shortest sub-array containing ‘k’ distinct elements of another Array | Sliding Window Problem

I came across this variant of the Sliding Window Problem during the coding round of a software engineering interview. I couldn’t find any references to it on the internet, hence this article. 😀

Problem Statement:
Given two arrays of integers (array A and array B) and a number (K), find the length of the shortest sub-array of array A which contains exactly K distinct elements of array B.

The problem statement might be a bit difficult to grasp at a single glance (and the original question used fancy words like stream of network data…), but the below details should make it clearer

Input Format:
The first line contains 3 space separated integers N and M and K, in this particular order. The next line contains N integers i.e. the elements of the array A, and the next line contains M integers denoting the elements of the array B.

Input Constraints:
1 ≤ KMN ≤ 2 x 10⁵
1 ≤ A[i], B[i] ≤ 10⁹

Sample Input:
5 3 2
1 1 1 2 1
1 2 3

Sample Output:
2

Explanation:
In this case K = 2, so the required sub-array of array A should contain 2 distinct elements of array B. The shortest sub-array fulfilling our requirements would be sub array [4…5] i.e. {2, 1}

Solution:
On analyzing the problem statement, it’s evident that this is another instance of the sliding window problem, with the window in this case representing the sub-array of array A.

To read more about sliding window problems, you can refer to this medium article by outco.io

Let’s have a look at the solution and then go through it’s major points

• We’ve created a set and added all elements of arrayB in the set. We’ll use this set to determine if the number being considered is part of arrayB or not
• The map will store numbers from arrayA, along with the indices at which the numbers occur. Every time we encounter a number again, we’ll put the number and the new index into the map. Hence the map will always store the index of the right-most occurrence of the number (that we’ve encountered so far)
• We’ll only insert a number in the map if it is also part of the set. Hence the size of the map will give us the number of distinct elements in the sub-array, which are also part of arrayB
• The variable minLen is initialized with the value Integer.MAX_VALUEand holds our interim result
• The variables start and end represent the starting index and ending index of the window
• With every iteration of the for loop (line 10), we increment (slide) the end of the window, increasing the length of the sub-array by 1
• If the size of the map is less than k, we’ll continue to the next iteration of the loop
• If the size of the map is greater than k, we’ll keep sliding the start of the window to remove elements which do not impact the size of the map. We ensure this by using the following conditions — either the number at the start of the window is not part of the set or it’s not the first occurrence of that number in our window (line 17)
• Then we’ll remove the number at the start of the window and increment the start by 1. This will make the size of the map equal to k (Since at every step since we only slide the end of the window by 1, the size of the map will never exceed k by more than 1)
• Line 20 will repeat the same operation as line 17, sliding the start of the window to the right, without affecting the size of the map
• We’ll update the interim result variable minLen by comparing it with the current length of the window
• At the end of the for loop, we’ll return the result if it holds a valid value. Else we’ll return -1 to indicate that no sub-array could be found which satisfies our requirements

Below is a runnable version of the same method, complete with input and output statements

Part time Tech. Enthusiast! Full time Software Engineer.

More from Sandeep Pandey

Part time Tech. Enthusiast! Full time Software Engineer.