Based on last Reservoir Sampling problem, I will develop it a little bit more. What if each element has a weight. Let’s say we have below elements. The struture is: [id, weight]:
e0=[0, 2], e1=[1, 1], e2=[2, 1], e3=[3, 1], e4=[4, 1].
And we want to sample 2 of them. Let’s compute the possibilite that e0 will be sampled. Out of 5 elements, the possibility that first time we choose e0 is 2/(2+1+1+1+1) = 2/6. And we have chance that the first time we didn’t choose e0, it is 4/6. And the possibility that we choose e0 at 2nd time is (4/6)*(2/5). So the total possibility we choose e0 is 2/6 + (4/6)*(2/5).
Above should be the result of the sampling. But how do we handle this problem with weight? Actually, this is a weighted reservoir sampling problem which is described in this paper http://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf.
The idea is that we compute a value ki of element i, by combining weight wi and a random value ui. The ki will be between (0,…,1):
And we read from stream, keep top m ki value of element by using heap.
package com.pli.project.algorithm.probability; import java.util.*; /** * Created by lipeng on 2015/5/25. */ public class WeightedReservoirSampling { public static int[] weightedReservoirSampling(int[][] items, int m){ if(items==null || m > items.length){ System.out.println("incorrect input."); return null; } PriorityQueue<WrsElement> heap = new PriorityQueue<WrsElement>(10); /** Transform weight into a double between (0,1). Put it in min heap. **/ for(int i=0; i < items.length ; i++){ double key = Math.pow((Math.random()), 1/(double)items[i][1]); WrsElement element = new WrsElement(items[i][0], key); heap.add(element); if(heap.size() > m){ heap.poll(); } } /** Output result. **/ int[] result = new int[m]; for(int i=0;i<m; i++){ result[i] = heap.poll().value; } return result; } static class WrsElement implements Comparable<WrsElement>{ int value; double key; public WrsElement(int value, double key){ this.value = value; this.key = key; } public int compareTo(WrsElement wrsElement) { return Double.compare(this.key, wrsElement.key); } @Override public String toString() { return "WrsElement{" + "value=" + value + ", key=" + key + '}'; } } public static void main(String[] args) { /** each item = {id, weight} **/ int[][] items = { {1, 1}, {2, 5}, {3, 20}, {4, 5}, {5, 10}, {6, 3}, {7, 3}, {8, 3} }; int[] result = weightedReservoirSampling(items, 3); System.out.println(Arrays.toString(result)); } public static void test(){ int[][] items = {{0, 2}, {1, 1}, {2, 1}, {3, 1}, {4, 1} }; int m = 2; int[] probabilities = new int[items.length]; for(int i=0; i< 1000; i++) { int[] result = weightedReservoirSampling(items, m); for(int j = 0; j < m; j++){ probabilities[result[j]]++; } } System.out.println(Arrays.toString(probabilities)); } }
Here are some other referene to learn:
http://gregable.com/2007/10/reservoir-sampling.html
http://stackoverflow.com/questions/17117872/is-there-an-algorithm-for-weighted-reservoir-sampling