How to Create a Scatterplot with a Regression Line in Python

A scatterplot is a graphical representation of the relationship between two continuous variables. It is a powerful tool for exploratory data analysis, helping to identify trends, patterns, and correlations. One common enhancement to a scatterplot is the addition of a regression line, which represents the linear relationship between the two variables. In this article, we will discuss how to create a scatterplot with a regression line using Python.

The Statistical Concept: Simple Linear Regression

Simple linear regression is a statistical method used to model the relationship between a dependent variable and an independent variable. The goal is to find the best-fit line that describes the linear relationship between the two variables. The mathematical formula for the simple linear regression line is:

    \[y = \beta_0 + \beta_1x + \epsilon\]

where:

  • y is the dependent variable,
  • x is the independent variable,
  •  \beta_0 is the intercept,
  • \[ \beta_1 \ is the slope, and
  • \[ \epsilon \ is the error term.

Python Implementation

To create a scatterplot with a regression line in Python, we will use the popular data analysis library, NumPy and Matplotlib. Let’s start by importing these libraries and creating some sample data:

import numpy as np
import matplotlib.pyplot as plt

# Sample data
x = np.linspace(0, 10, 100)
y = 2 * x + 1 + np.random.normal(size=len(x))

Now, let’s create the scatterplot with the regression line:

# Create scatterplot
plt.scatter(x, y)

# Fit the regression line
z = np.polyfit(x, y, 1)

# Create regression line
p = np.poly1d(z)

# Plot regression line
plt.plot(x, p(x), color='red')

# Display the plot
plt.xlabel('Independent Variable (x)')
plt.ylabel('Dependent Variable (y)')
plt.title('Scatterplot with Regression Line')
plt.show()

Summary

In this article, we discussed how to create a scatterplot with a regression line using Python. We introduced the concept of simple linear regression, and we implemented it using NumPy and Matplotlib. By following these steps, you will be able to explore the relationship between two continuous variables and visualize the linear trend in your data.

For more information on scatterplots and regression analysis, we recommend checking out the following resources:

Leave a Reply

Your email address will not be published. Required fields are marked *