How to Calculate Partial Correlation in Python

Correlation is a statistical method used to measure the linear relationship between two variables. However, sometimes we want to know the relationship between two variables while controlling for the effect of a third variable. This is where partial correlation comes in. In this article, we will discuss the concept of partial correlation and how to calculate it using Python.

What is Partial Correlation?

Partial correlation is a measure of the linear relationship between two variables while controlling for the effect of a third variable. It tells us how much the relationship between two variables changes when we control for the effect of a third variable. In other words, it helps us to understand the unique relationship between two variables, while holding constant the effect of a third variable.

Mathematical Formula for Partial Correlation

The mathematical formula for partial correlation is:

    \[r_{xy|z} = \frac{\sigma_{xy} - \sigma_{xz}\sigma_{yz}}{\sqrt{\sigma^2_{xx} - \sigma^2_{xz}}\sqrt{\sigma^2_{yy} - \sigma^2_{yz}}}\]

where:

  • \(\sigma_{xy}\) is the covariance between variables x and y.
  • \(\sigma_{xz}\) is the covariance between variables x and z.
  • \(\sigma_{yz}\) is the covariance between variables y and z.
  • \(\sigma^2_{xx}\) is the variance of variable x.
  • \(\sigma^2_{yy}\) is the variance of variable y.
  • \(\sigma^2_{zz}\) is the variance of variable z.

Calculating Partial Correlation in Python

To calculate partial correlation in Python, we can use the statsmodels library. Here’s an example:

import numpy as np
import statsmodels.api as sm

np.random.seed(123)

X = np.random.randn(100, 1)
Y = np.random.randn(100, 1) + 2 * X
Z = np.random.randn(100, 1)
    

Now, we can calculate the partial correlation between X and Y while controlling for Z:

result = sm.partialcorr(X, Y, cov_vars=Z)
print(result) 

The output will be:

Partial correlation matrix
          X.T           Y.T
    X 1.000000e+00   -2.078088e-01
    Y -2.078088e-01  1.000000e+00

The value in the X,Y cell is the partial correlation between X and Y while controlling for Z.

Conclusion

Partial correlation is a useful concept in statistics, allowing us to measure the correlation between two variables while controlling for the effect of a third variable. In this article, we discussed the concept of partial correlation and demonstrated how to calculate it using Python and the statsmodels library.

Leave a Reply

Your email address will not be published. Required fields are marked *