Python

Visualizing Regression Models with lmplot() and residplot() in Seaborn

Programmingempire

In this article on Visualizing Regression Models with lmplot() and residplot() in Seaborn, I will explain these two methods of the Seaborn package.

What is a FacetGrid?

Basically, it is a class in the Seaborn package that helps us visualize the distribution of one variable. At the same time, we can also visualize the relationship between the multiple variables using different panels. Moreover, a faceltGrid has three dimensions – the row, col, and the hue.

Example of Using lmplot() Method

The following code example uses a Stroke Prediction Dataset available here. As has been noted, our stroke dataset contains the following fields:

[‘id’, ‘gender’, ‘age’, ‘hypertension’, ‘heart_disease’, ‘ever_married’, ‘work_type’, ‘Residence_type’, ‘avg_glucose_level’, ‘bmi’, ‘smoking_status’, ‘stroke’]

Certainly, the function lmplot() draws a scatterplot, and also it does so on a FaceGrid(). Evidently, the function takes several arguments. While the arguments x and y represent the columns in the dataset, the parameter data should be assigned the name of the data frame. Besides, there is a hue parameter that allows us to add another dimension which allows us to represent another column in the same plot using a color. In fact, the code example given below shows the relationship between two variables as well as a hue representing a third column in the CSV,

import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("stroke_data.csv")
print(df1.head())

print(list(df1))
#Example of using lmplot()
sb.lmplot(x="age", y="bmi", hue="gender", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="gender", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="gender", data=df1)
plt.show()


sb.lmplot(x="age", y="bmi", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="ever_married", data=df1)
plt.show()


sb.lmplot(x="age", y="bmi", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="work_type", data=df1)
plt.show()

sb.lmplot(x="age", y="bmi", hue="smoking_status", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="smoking_status", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="smoking_status", data=df1)
plt.show()

Output

Demonstration of lmplot()
Demonstration of lmplot()

Residual of a Linear Regression

While we draw the regression line in the scatter plot, it may happen that not all the points fall on the regression line. Accordingly, the vertical line that we draw from a data point to the regression line is known as the residual of that point. When a point is above the regression line it is called the positive residual. Similarly, the points falling below the regression line have a negative residual and the points on the line have zero residual.

Benefits of Determining Residuals

Since the residual provides us information about the deviation of the actual value from the predicted value they help us in determining the accuracy of our regression model.

Example of Using residplot() Method

The following example uses a dataset of the Daily Temperature of Major Cities which is available here. Since the dataset consists of a huge number of rows, we apply certain filters. Firstly, the rows of the Asia region are retrieved. After that, India is selected for the Country field and Delhi is selected for the City field. Afterwards, we select rows for the year 2020 and 1995 respectively in two different data frames. Then we draw the regression plots and the residue plots for both data frames.

import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("city_temperature.csv")
print(df1.head())

print(list(df1))

#Printing unique values of the Region field
print(df1['Region'].unique())

print(df1.loc[df1["Region"]=="Asia"])
#Retrieve data where the Region is Asia
df2=df1.loc[df1["Region"]=="Asia"]
print(df2['Country'].unique())

# From Asia region, retrive the data where Country is india
df3=df2.loc[df2["Country"]=="India"]
print(df3)

# Fetch data from the City=Delhi
df4=df3.loc[df3["City"]=="Delhi"]
print(df4)

# Data from the Year=2020
df5=df4.loc[df4["Year"]==2020]
print(df5)

# Draw Regression Plot
sb.regplot(y="AvgTemperature", x="Month", data=df5)
plt.title("Regression Plot on Month-wise Average Temperature for the year 2020")
plt.show()

# Data from the Year=1995
df6=df4.loc[df4["Year"]==1995]
print(df6)

# Data from year=1995 and month upto 5
df7=df6.loc[df6["Month"]<=5]
print(df7)

# Draw Regression Plot
lim=sb.regplot(y="AvgTemperature", x="Month", data=df7)
lim.set(ylim=(50, 100))
plt.title("Regression Plot on Month-wise Average Temperature for the year 1995")
plt.show()

# Draw Residual Plot
sb.residplot(y="AvgTemperature", x="Month", data=df5)
plt.title("Residue Plot on Month-wise Average Temperature for the year 2020")
plt.show()
# Draw Residual Plot
sb.residplot(y="AvgTemperature", x="Month", data=df7)
plt.title("Residue Plot on Month-wise Average Temperature for the year 1995")
plt.show()

Output

Regression Plots and Residue Plots
Regression Plots and Residue Plots

Filtering the Dataset

Filtering the Dataset
Filtering the Dataset

Summary

This article on Visualizing Regression Models with lmplot() and residplot() in Seaborn demonstrates the use of both of these functions available in the Regression API of the Seaborn package. generally, the lmplot() function compares two different variables whereas the residplot() function measures the accuracy of the regression model.


Further Reading

Deep Learning Tutorial

Text Summarization Techniques

How to Implement Inheritance in Python

Find Prime Numbers in Given Range in Python

Running Instructions in an Interactive Interpreter in Python

Deep Learning Practice Exercise

Python Practice Exercise

Deep Learning Methods for Object Detection

Understanding YOLO Algorithm

What is Image Segmentation?

ImageNet and its Applications

Image Contrast Enhancement using Histogram Equalization

Transfer Learning and its Applications

Examples of OpenCV Library in Python

Examples of Tuples in Python

Python List Practice Exercise

Understanding Blockchain Concepts

Edge Detection Using OpenCV

Predicting with Time Series

Example of Multi-layer Perceptron Classifier in Python

Measuring Performance of Classification using Confusion Matrix

Artificial Neural Network (ANN) Model using Scikit-Learn

Popular Machine Learning Algorithms for Prediction

Long Short Term Memory – An Artificial Recurrent Neural Network Architecture

Python Project Ideas for Undergraduate Students

Creating Basic Charts using Plotly

Visualizing Regression Models with lmplot() and residplot() in Seaborn

Data Visualization with Pandas

A Brief Introduction of Pandas Library in Python

A Brief Tutorial on NumPy in Python

programmingempire

You may also like...