This tutorial demonstrates how to perform a one-way ANOVA in Python using the sklearn.feature_selection.f_classif
function from the scikit-learn library. We’ll work with the iris
dataset included in scikit-learn for an example.
Importing Libraries:
1 2 3 | from sklearn.datasets import load_iris from sklearn.feature_selection import f_classif import pandas as pd |
Loading the Iris Dataset:
1 2 3 | iris = load_iris() x = iris.data # Features y = iris.target # Target variable (species) |
One-Way ANOVA with scikit-learn:
1 2 3 4 5 6 7 8 9 | # Perform ANOVA f_statistic, p_value = f_classif(x, y) # Print results print ( "F-statistic:" , f_statistic) print ( "p-value:" , p_value) #F-statistic: [ 119.26450218 49.16004009 1180.16118225 960.0071468 ] #p-value: [1.66966919e-31 4.49201713e-17 2.85677661e-91 4.16944584e-85] |
Explanation:
f_classif
takes two arguments:x
(the features) andy
(the target variable).- It returns two values:
f_statistic
which represents the overall difference between groups andp_value
which tests the null hypothesis (no difference between groups). - Smaller p-values (<0.05) indicate statistically significant differences between groups.
Interpreting Results:
In this example, the f_statistic
and p_value
will depend on the specific dataset and features used. However, a small p-value indicates that at least one of the three iris species has a significantly different distribution of petal and sepal measurements compared to the others.
Additional Notes:
- This is a basic example. You can further analyze the results using post-hoc tests like Tukey’s HSD to identify specific groups that differ.
- Remember to check the assumptions of normality and homogeneity of variance before performing ANOVA.
- Consider visualizing the data using boxplots or violin plots to get a better understanding of the distributions within each group.
Bonus: Visualizing with Pandas:
1 2 3 4 5 6 | # Create pandas DataFrame df = pd.DataFrame(x, columns = iris.feature_names) df[ 'species' ] = iris.target_names[y] # Boxplots for each feature df.groupby( 'species' ).boxplot(rot = - 90 , layout = ( 1 , 3 )) |

Leave a Reply