I have a post about how to return a value by it’s weight by inputing a stream. http://allenlipeng47.com/PersonalPage/index/view/129/nkey
Based on that, I will develop this to another problem: Reservoir Sampling.
The probelm of Reservoir Sampling is like this. There are elements inputted by stream, which lenght is n. How can we uniformly choose m out of n elements. We assume n is too large to fit in memory.
The solution is similar to the previous post. We define an ResultArray of length m. At beginning, we push [0,…,m-1] elements to ResultArray. Then loop element from [m,…,n-1]. For element i, we generate a random number with range [0,…,i-1]. If random number is less than m, then we put element i to ResultArray[i]. Otherwise, we continue the loop.
package com.pli.project.algorithm.probability; import java.util.Arrays; /** * Created by lipeng on 2015/5/25. */ public class ReservoirSampling { /** * Suppose arr is a very large array, can't be stored in memory. * Randomly select m element among arr. * @param arr, the input array. Very large, can't be stored in memory. * @param m, the size of reservoir * @return, the sampling result. */ public static int[] reservoirSampling(int[] arr, int m){ if(arr==null || m > arr.length){ return null; } int[] result = Arrays.copyOf(arr, m); for(int i = m; i < arr.length; i++){ int randPos = (int)(Math.random() * (i + 1)); if(randPos < m){ result[randPos] = arr[i]; } } return result; } public static void main(String[] args) { int[] arr = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; int[] result = reservoirSampling(arr, 3); System.out.println(Arrays.toString(result)); } }