One method for selecting a sample of m different elements from a file of n records is to repeatedly select a random element until we have m different ones. We show that the number of selections is on average smaller than 2 ln (2) m and that the algorithm has a linear running time if we use a hash table for the elements already selected in the sample.

You do not currently have access to this article.