February 24, 2021

Programmingempire

In this article on Visualizing Regression Models with lmplot() and residplot() in Seaborn, I will explain these two methods of the Seaborn package.

What is a FacetGrid?

Basically, it is a class in the Seaborn package that helps us visualize the distribution of one variable. At the same time, we can also visualize the relationship between the multiple variables using different panels. Moreover, a faceltGrid has three dimensions – the row, col, and the hue.

Example of Using lmplot() Method

The following code example uses a Stroke Prediction Dataset available here. As has been noted, our stroke dataset contains the following fields:

[‘id’, ‘gender’, ‘age’, ‘hypertension’, ‘heart_disease’, ‘ever_married’, ‘work_type’, ‘Residence_type’, ‘avg_glucose_level’, ‘bmi’, ‘smoking_status’, ‘stroke’]

Certainly, the function lmplot() draws a scatterplot, and also it does so on a FaceGrid(). Evidently, the function takes several arguments. While the arguments x and y represent the columns in the dataset, the parameter data should be assigned the name of the data frame. Besides, there is a hue parameter that allows us to add another dimension which allows us to represent another column in the same plot using a color. In fact, the code example given below shows the relationship between two variables as well as a hue representing a third column in the CSV,

import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("stroke_data.csv")
print(df1.head())

print(list(df1))
#Example of using lmplot()
sb.lmplot(x="age", y="bmi", hue="gender", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="gender", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="gender", data=df1)
plt.show()


sb.lmplot(x="age", y="bmi", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="ever_married", data=df1)
plt.show()


sb.lmplot(x="age", y="bmi", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="work_type", data=df1)
plt.show()

sb.lmplot(x="age", y="bmi", hue="smoking_status", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="smoking_status", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="smoking_status", data=df1)
plt.show()

Output

Residual of a Linear Regression

While we draw the regression line in the scatter plot, it may happen that not all the points fall on the regression line. Accordingly, the vertical line that we draw from a data point to the regression line is known as the residual of that point. When a point is above the regression line it is called the positive residual. Similarly, the points falling below the regression line have a negative residual and the points on the line have zero residual.

Benefits of Determining Residuals

Since the residual provides us information about the deviation of the actual value from the predicted value they help us in determining the accuracy of our regression model.

Example of Using residplot() Method

The following example uses a dataset of the Daily Temperature of Major Cities which is available here. Since the dataset consists of a huge number of rows, we apply certain filters. Firstly, the rows of the Asia region are retrieved. After that, India is selected for the Country field and Delhi is selected for the City field. Afterwards, we select rows for the year 2020 and 1995 respectively in two different data frames. Then we draw the regression plots and the residue plots for both data frames.

import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd

df1=pd.read_csv("city_temperature.csv")
print(df1.head())

print(list(df1))

#Printing unique values of the Region field
print(df1['Region'].unique())

print(df1.loc[df1["Region"]=="Asia"])
#Retrieve data where the Region is Asia
df2=df1.loc[df1["Region"]=="Asia"]
print(df2['Country'].unique())

# From Asia region, retrive the data where Country is india
df3=df2.loc[df2["Country"]=="India"]
print(df3)

# Fetch data from the City=Delhi
df4=df3.loc[df3["City"]=="Delhi"]
print(df4)

# Data from the Year=2020
df5=df4.loc[df4["Year"]==2020]
print(df5)

# Draw Regression Plot
sb.regplot(y="AvgTemperature", x="Month", data=df5)
plt.title("Regression Plot on Month-wise Average Temperature for the year 2020")
plt.show()

# Data from the Year=1995
df6=df4.loc[df4["Year"]==1995]
print(df6)

# Data from year=1995 and month upto 5
df7=df6.loc[df6["Month"]<=5]
print(df7)

# Draw Regression Plot
lim=sb.regplot(y="AvgTemperature", x="Month", data=df7)
lim.set(ylim=(50, 100))
plt.title("Regression Plot on Month-wise Average Temperature for the year 1995")
plt.show()

# Draw Residual Plot
sb.residplot(y="AvgTemperature", x="Month", data=df5)
plt.title("Residue Plot on Month-wise Average Temperature for the year 2020")
plt.show()
# Draw Residual Plot
sb.residplot(y="AvgTemperature", x="Month", data=df7)
plt.title("Residue Plot on Month-wise Average Temperature for the year 1995")
plt.show()

Output

Filtering the Dataset

Summary

This article on Visualizing Regression Models with lmplot() and residplot() in Seaborn demonstrates the use of both of these functions available in the Regression API of the Seaborn package. generally, the lmplot() function compares two different variables whereas the residplot() function measures the accuracy of the regression model.

Programmingempire

What is a FacetGrid?

Example of Using lmplot() Method

Residual of a Linear Regression

Benefits of Determining Residuals

Example of Using residplot() Method

Summary

Further Reading

You may also like...

How to Create a Currency Converter Using StreamLit?

Examples of Creating and Using Tuples in Python

Popular Machine Learning Algorithms for Prediction