Tutorial: Creating a Linear Regression in Python

This tutorial guides you through building a linear regression model in Python, covering importing libraries, loading data, fitting the model, and interpreting results.

1. Import Libraries

import pandas as pd
from sklearn.linear_model import LinearRegression

2. Load Data

Replace with your actual data source:

# Import Dataset from sklearn
from sklearn import datasets

# Load Iris Data
iris = datasets.load_iris()

# Creating pd DataFrames
iris_df = pd.DataFrame(data= iris.data, columns= iris.feature_names)
target_df = pd.DataFrame(data= iris.target, columns= ['species'])

def converter(specie):
    if specie == 0:
        return 'setosa'
    elif specie == 1:
        return 'versicolor'
    else:
        return 'virginica'

target_df['species'] = target_df['species'].apply(converter)

# Concatenate the DataFrames
iris_df = pd.concat([iris_df, target_df], axis= 1)
x = iris_df["sepal length (cm)"]
y = iris_df["sepal width (cm)"]

#Filter the dataframe
iris_df_setosa = iris_df[(iris_df.species == "setosa")]
iris_df_setosa.head()

3. Fit the Model

Create a LinearRegression instance and fit it to the data:

# Run the regression
x = iris_df_setosa["sepal length (cm)"]
y = iris_df_setosa["sepal width (cm)"]

model = LinearRegression()
model.fit(x.values.reshape(-1, 1), y.values.reshape(-1, 1))
iris_df_setosa["predicted_y"] = model.predict(x.values.reshape(-1, 1))

r_sq = model.score(x.values.reshape(-1, 1), y.values.reshape(-1, 1))
print(r_sq)
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")
# intercept: [-0.56943267]
# slope: [[0.7985283]]

r_sq = model.score(x.values.reshape(-1, 1), y.values.reshape(-1, 1))
print(r_sq)
# 0.5513755803923133

4. Plot the results

# Running pip to install plotnine may be required
pip install plotnine
import plotnine as p
from plotnine import *

plot = (
    ggplot(iris_df_setosa, aes(x="sepal length (cm)", y="sepal width (cm)"))
    + geom_point()
    + geom_line(aes(x="sepal length (cm)", y="predicted_y"))
    + labs(title="Sepal Length and Sepal Width", x="Sepal Length (cm)", y="Sepal Width (cm)")
)

print(plot)

References

https://medium.com/analytics-vidhya/linear-regression-using-iris-dataset-hello-world-of-machine-learning-b0feecac9cc1

https://realpython.com/linear-regression-in-python/

Leave a Reply

Your email address will not be published. Required fields are marked *