Python

Visualize Regression Models with Seaborn

Certainly, this is the second article on Data Visualization using the Seaborn package of python. In order to demonstrate regression analysis, today I will explain how to Visualize Regression Models with Seaborn. Before proceeding further, I will briefly explain the regression analysis.

What is Regression?

Whenever you have one dependent variable and one or more independent variables, you can find the strength of the relationship among these variables using the regression analysis. In fact, the regression has applications in many fields including time-series analysis, forecasting, finance and so on.

Regression Methods in Seaborn Package

The following regression methods are available in the seaborn package.

  • regplot
  • lmplot
  • residplot

regplot Method

Basically, the regplot() method performs a linear regression model fit and plots the data. Also, it takes a and y variables in a number of ways. Besides, you can also specify other parameters like color, scatter, and ci (confidence interval).

lmplot Method

Whenever you want to draw a 2D scatter plot, then you can use the lmplot() method. In that case, you can compare two variables. Also, it is possible to draw an optional regression line.

residplot Method

In fact, sometimes you wish to plot the residual of linear regression. In that case, you can use the residplot() method of seaborn. Besides, it is helpful in determining how much a regression line misses the data points.

Examples to Visualize Regression Models with Seaborn

In general, this section provides a number of examples for demonstrating the regression including linear regression, logistic regression, and polynomial regression.

Example of regplot

The following example uses a dataset of patients’ records that represents a number of parameters that can cause a stroke. Particularly, the variables used in this example are BMI, age, hypertension, heart disease, average glucose level, and stroke. As can be seen, the various plots are shown in the output.

import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("stroke_data.csv")
print(df1.head())

print(list(df1))

#Handling Missing values
df2=df1.dropna()
print(df2)

#Regression example 1
sb.regplot(x="heart_disease", y="hypertension", data=df2, color="y", ci=70)
plt.title("Heart Disease vs. Hypertension")
plt.show()

#Regression example 2
sb.regplot(x="avg_glucose_level", y="bmi", data=df2, color="g", ci=70)
plt.title("Average Glucose Level vs. BMI")
plt.show()

#Regression example 3
sb.regplot(x="age", y="bmi", data=df2, color="r", ci=70)
plt.title("Age vs. BMI")
plt.show()

#Regression example 4
sb.regplot(x="hypertension", y="stroke", data=df2, color="b", ci=70)
plt.title("Hypertension vs. Stroke")
plt.show()

#Regression example 5
sb.regplot(x="heart_disease", y="stroke", data=df2, color="b", ci=70)
plt.title("Heart Disease vs. Stroke")
plt.show()

Output

Demonstration of regplot Method in Seaborn
Demonstration of regplot Method in Seaborn

The Stroke Dataset

As shown in following image, the dataset contains several fields. However, in these examples specific fields are being used which are mentioned above.

The Stroke Dataset
The Stroke Dataset

Polynomial Regression

Because sometimes the relationship between the independent variable and dependent variable can’t be represented as a straight line in regression analysis. Instead, a polynomial of nth degree can better model the relationship. In that case, we can use polynomial regression. As a matter of fact, the same regplot() method has an attribute called order, that we can use here.

Example of Polynomial Regression

The following example shows a polynomial regression plot between the variables age and bmi. Significantly, the value of order is set to 2 that represents a quadratic polynomial.


import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("stroke_data.csv")
print(df1.head())

print(list(df1))

#Handling Missing values
df2=df1.dropna()
print(df2)

sb.regplot(x="age", y="bmi", data=df2,
                 order=2, ci=None, scatter=None)
plt.title("Plynomial Regression")
plt.show()

Output

Polynomial Regression
Polynomial Regression

Logistic Regression

In the event that a dependent variable is binary, we can use the Logistic Regression. Basically, the logistic regression represents the relationship between the binary dependent variable and an independent variable.

Example of Logistic Regression

The following example demonstrates the regression analysis of a binary variable stroke and bmi. Further, the variable stroke takes a value of either 0 or 1 representing whether the patient experiences a stroke or not.


import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("stroke_data.csv")
print(df1.head())

print(list(df1))

#Handling Missing values
df2=df1.dropna()
print(df2)

sb.regplot(x="bmi", y="stroke", data=df2,
                 logistic=True, n_boot=500, y_jitter=.03)
plt.title("Logistic Regression")
plt.show()

Output

Logistic Regression
Logistic Regression

Note: For logistic regression, the module statsmodels should be installed.

Summary

To sum up, in this article, I have demonstrated the regplot() method to Visualize Regression Models with Seaborn. As can be seen, the regplot() method can be used to fit a linear regression, a polynomial regression, as well as logistic regression. However, the seaborn package has two more methods to visualize the regression and these methods are lmplot(), and residplot() that I will cover in the future.


Related Topics

Data Visualization with Seaborn

A Brief Introduction of Pandas Library in Python

Object-Oriented Programming with Python

The Python Libraries that you should Learn for Analyzing IoT Data

A Brief Tutorial on NumPy in Python

Data Visualization with Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *