Reservoir Sampling

By | May 26, 2015
Share the joy
  •  
  •  
  •  
  •  
  •  
  •  

I have a post about how to return a value by it’s weight by inputing a stream. http://allenlipeng47.com/PersonalPage/index/view/129/nkey

Based on that, I will develop this to another problem: Reservoir Sampling.

The probelm of Reservoir Sampling is like this. There are elements inputted by stream, which lenght is n. How can we uniformly choose m out of n elements. We assume n is too large to fit in memory.

The solution is similar to the previous post. We define an ResultArray of length m. At beginning, we push [0,…,m-1] elements to ResultArray. Then loop element from [m,…,n-1]. For element i, we generate a random number with range [0,…,i-1]. If random number is less than m, then we put element i to ResultArray[i]. Otherwise, we continue the loop.

package com.pli.project.algorithm.probability;

import java.util.Arrays;

/**
 * Created by lipeng on 2015/5/25.
 */
public class ReservoirSampling {

    /**
     * Suppose arr is a very large array, can't be stored in memory.
     * Randomly select m element among arr.
     * @param arr, the input array. Very large, can't be stored in memory.
     * @param m, the size of reservoir
     * @return, the sampling result.
     */
    public static int[] reservoirSampling(int[] arr, int m){
        if(arr==null || m > arr.length){
            return null;
        }
        int[] result = Arrays.copyOf(arr, m);
        for(int i = m; i < arr.length; i++){
            int randPos = (int)(Math.random() * (i + 1));
            if(randPos < m){
                result[randPos] = arr[i];
            }
        }
        return result;
    }

    public static void main(String[] args) {
        int[] arr = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
        int[] result = reservoirSampling(arr, 3);
        System.out.println(Arrays.toString(result));
    }

}