Curve Fitting in Python

Curve fitting is a statistical method used to establish a mathematical relationship between a set of data points and a continuous function. The goal is to find the best-fitting curve that approximates the data, allowing for predictions and analysis of trends. In this article, we will explore the concept of curve fitting, its underlying statistical principles, and how to implement it using Python.

Statistical Background: Linear Regression and Polynomial Regression

Before diving into curve fitting, it’s essential to understand two fundamental regression techniques: linear regression and polynomial regression. These methods form the foundation for many curve fitting techniques.

Linear Regression

Linear regression is a statistical method that models the relationship between a dependent variable y and one or more independent variables x. The goal is to find the equation of a straight line that best fits the data, minimizing the sum of the squared differences between the observed and predicted values. The mathematical formula for a simple linear regression line is:

$y = \beta_0 + \beta_1x$

where:

$\beta_0$ is the intercept, and
$\beta_1$ is the slope.

To estimate the coefficients, we use the method of least squares. In Python, we can use the scipy.stats.linregress function to perform linear regression.

Polynomial Regression

Polynomial regression extends linear regression to model non-linear relationships between variables. It models the relationship between a dependent variable y and one or more independent variables x using a polynomial function of degree n. The mathematical formula for a polynomial regression curve of degree n is:

$y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + ... + \beta_nx^n$

where:

$\beta_0$ is the intercept,
$\beta_1$ to $\beta_n$ are the coefficients.

Python’s scipy.polynomial.polynomial module can be used to fit a polynomial curve to data.

Curve Fitting in Python: Scipy’s Curve Fitting Module

Scipy’s curve_fit function is a powerful tool for curve fitting in Python. It uses a Levenberg-Marquardt algorithm to minimize the sum of the squared differences between the observed and predicted values, allowing for fitting of non-linear functions.

Example: Curve Fitting a Gaussian Distribution

Let’s consider an example where we want to fit a Gaussian distribution to some data:

import numpy as np
from scipy.optimize import curve_fit
 
# Generate some data
x = np.linspace(-3, 3, 100)
y = np.exp(-(x + 1) ** 2 / 2)
y_noise = y + np.random.normal(0, 0.1, size=len(x))
 
# Fit the Gaussian distribution to the noisy data
p0 = [1, 1, 1]
popt, pcov = curve_fit(gaussian, x, y_noise, p0=p0)
 
# Plot the results
import matplotlib.pyplot as plt
 
plt.plot(x, y_noise, '.', label='Data')
plt.plot(x, gaussian(x, *popt), '-', label='Fit')
plt.legend()
plt.show()

In this example, we generate some data by adding Gaussian noise to a Gaussian distribution. We then use curve_fit to find the best-fitting parameters for the Gaussian distribution and plot the results.

Conclusion

Curve fitting is a versatile statistical method for modeling data and understanding relationships between variables. With Python’s powerful curve fitting tools, such as Scipy’s curve_fit function, it’s easier than ever to implement curve fitting techniques in your data analysis workflows.