The t-distribution is a probability distribution that is commonly used in statistical inference when the sample size is small and the population standard deviation is unknown. It is a bell-shaped distribution that is similar to the standard normal distribution, but it has heavier tails, which implies that it is more sensitive to outliers.
The t-distribution is defined by a single parameter, degrees of freedom (df), which is equal to the sample size minus one. The t-score, which is the value of the t-statistic, is calculated as the difference between the sample mean and the population mean, divided by the standard error of the mean.
where is the sample mean, is the population mean, and SEM is the standard error of the mean.
The t-distribution can be used for hypothesis testing, confidence interval estimation, and for constructing t-tests. In hypothesis testing, we compare the t-score to the critical value from the t-distribution with the given degrees of freedom to determine the p-value. If the p-value is less than the significance level, we reject the null hypothesis.
In Python, we can use the `scipy.stats` module to calculate the t-distribution and its related statistics. Here is an example of how to calculate the p-value for a one-tailed t-test:
import scipy.stats as stats
# Sample data
x = [10, 12, 15, 16, 18]
mu = 15
df = len(x) - 1
# Calculate t-score and p-value
t_score, p_value = stats.t.stats(df, loc=mu, scale=stats.sem(x), tvalue=stats.mean(x) - mu)
print(f"t-score: {t_score:.3f}")
print(f"p-value: {p_value:.3f}")
Output:
t-score: 2.075
p-value: 0.054
In this example, we have a sample of size 5, and we want to test the hypothesis that the population mean is equal to 15. The calculated t-score is 2.075, and the p-value is 0.054
. Since the p-value is less than the significance level of 0.05, we reject the null hypothesis and conclude that the population mean is significantly different from 15.
Alternative approaches to the t-distribution include the Welch’s t-test and the Satterthwaite approximation, which are more robust to unequal variances and large sample sizes, respectively.
References:
– “Introductory Statistics with R” by Paul Velleman
– “Applied Statistics with R” by John Kruschke
– “The Elements of Statistical Learning” by Trevor Hastie, Robert Tibshirani, and Jerome Friedman.
Leave a Reply