The Hypergeometric Distribution: Sampling Without Replacement

Imagine drawing a certain number of objects from a finite population where some objects are of interest and the order of drawing doesn’t matter. This scenario, where sampling occurs without replacement, is precisely where the hypergeometric distribution steps in.

What is the Hypergeometric Distribution?

The hypergeometric distribution is a discrete probability distribution used to calculate the probability of obtaining a specific number of successes (desired objects) in a fixed sample size drawn without replacement from a finite population that contains both successes and failures (undesired objects).

Key Features:

Discrete: The random variable (number of successes) can only take on whole number values from 0 to the minimum of (number of successes in population, sample size).
Sampling without replacement: Once drawn, an object is not returned to the population, impacting the probability of drawing other objects in subsequent draws.

Understanding Key Concepts:

Population: The entire set of objects from which the sample is drawn.
Successes: Objects of interest in the population.
Failures: Objects not of interest in the population.
Sample size (n): The total number of objects drawn from the population.
Number of successes in the population (K): The total number of success objects in the population.
Number of failures in the population (N-K): The total number of failure objects in the population (population size minus successes).
Number of successes in the sample (k): The specific number of success objects you want to calculate the probability of obtaining.

What Does the Hypergeometric Distribution Tell Us?

This distribution allows us to calculate:

The probability of obtaining k successes in a sample of size n drawn without replacement from a population with K successes and N-K failures.
The expected number (mean) of successes in the sample.

Understanding Key Formulas

Probability of k successes in a sample of size n:

P(X = k) = (C(k, K) * C(n - k, N - K)) / C(n, N)

X represents the random variable (number of successes).
k represents the number of successes in the sample you’re calculating the probability for.
C(x, y) represents the binomial coefficient, which calculates the number of ways to choose x objects from a group of y objects.
N, K, and n are defined above.

Expected number of successes (mean):

E(X) = (n * K) / N

Interpreting the Formulas:

The binomial coefficient terms account for the different combinations of successes and failures that can lead to obtaining k successes in a sample of size n.
The expected number (mean) indicates the average number of successes you can expect to draw in repeated samples of size n from the given population.

Examples:

Drawing Colored Balls: A bag contains 5 red balls (successes) and 10 white balls (failures). You draw 3 balls without replacement. What is the probability of getting exactly 2 red balls?

P(X = 2) = (C(2, 5) * C(1, 10)) / C(3, 15) ≈ 0.333

Defective Products: A factory produces 100 items, of which 20 are defective (failures). If 15 items are randomly chosen for quality inspection, what is the expected number of defective products found?

E(X) = (15 * 20) / 100 = 3

Limitations of the Hypergeometric Distribution

It applies to finite populations where the sampling process affects the probability of subsequent draws.
Calculating probabilities can become cumbersome for larger values of n, K, and N.

Conclusion

The hypergeometric distribution offers a valuable tool for situations involving sampling without replacement from finite populations. By understanding its concepts, formulas, and limitations, you can calculate probabilities and gain insights into the likelihood of obtaining specific outcomes in such scenarios.

On Statistics