Data Visualization with Seaborn
4. Data Visualization Techniques
Seaborn is a statistical data visualization library built on top of Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. In this lesson, you'll explore advanced plotting techniques using Seaborn.
Seaborn's Statistical Plot Types
Seaborn offers several specialized plot types for visualizing statistical relationships. Let's explore some of these:
1. KDE (Kernel Density Estimation) Plot
KDE plots are used to visualize the distribution of a dataset:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
plt.switch_backend('Agg')
# Generate data
data = np.random.multivariate_normal([0, 0], [[1, .5], [.5, 1]], size=200)
x, y = data.T
# Create KDE plot
sns.set_style("whitegrid")
plt.figure(figsize=(10, 6))
sns.kdeplot(x=x, y=y, cmap="viridis", shade=True, cbar=True)
plt.title("2D Kernel Density Estimation")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
print("Plot created successfully.")
2. Pair Plot
Pair plots are useful for exploring relationships between multiple variables:
import seaborn as sns
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Load dataset
iris = sns.load_dataset("iris")
# Create pair plot
sns.set_style("ticks")
plt.figure(figsize=(12, 10))
sns.pairplot(iris, hue="species", markers=["o", "s", "D"])
plt.suptitle("Iris Dataset - Pair Plot", y=1.02)
plt.show()
print("Plot created successfully.")
3. Categorical Plot: Swarm Plot
Swarm plots are a combination of strip plot and violin plot, showing the distribution of values in a categorical variable:
import seaborn as sns
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Load dataset
tips = sns.load_dataset("tips")
# Create swarm plot
sns.set_style("whitegrid")
plt.figure(figsize=(12, 6))
sns.swarmplot(x="day", y="total_bill", hue="sex", data=tips, palette="Set3")
plt.title("Distribution of Total Bill by Day and Gender")
plt.xlabel("Day of the Week")
plt.ylabel("Total Bill ($)")
plt.show()
print("Plot created successfully.")
Key Seaborn Features
1. Built-in Datasets: Seaborn comes with several built-in datasets for easy exploration.
2. Statistical Estimation: Many Seaborn plots incorporate statistical estimations by default.
3. Aesthetics: Seaborn provides beautiful default styles and color palettes.
4. Categorical Data: Specialized plots for visualizing categorical data relationships.
5. Integration with Pandas: Seamless integration with Pandas DataFrames for easy plotting.
Practice Exercises
Now it's time to apply what you've learned about Seaborn!
Exercise 1: Create a Violin Plot
Using the 'titanic' dataset, create a violin plot to show the distribution of age for each passenger class.
- Data Loading: Load the 'titanic' dataset using Seaborn.
- Violin Plot Creation: Use sns.violinplot() to create the plot.
- Customization: Add appropriate labels, title, and adjust the figure size.
import seaborn as sns
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Load the dataset
titanic = sns.load_dataset("___")
# Create the violin plot
plt.figure(figsize=(12, 6))
sns.___(___) # Add your code here
plt.title("___")
plt.xlabel("___")
plt.ylabel("___")
plt.show()
print("Plot created successfully.")
# Your interpretation here:
interpretation = "___" # Replace with your interpretation of the plot
print(interpretation)
Exercise 2: Create a Heatmap
Using the 'flights' dataset, create a heatmap to visualize the average number of passengers for each month across different years.
- Data Preparation: Load the 'flights' dataset and pivot it to create a matrix.
- Heatmap Creation: Use sns.heatmap() to create the heatmap.
- Customization: Add a color bar, labels, and title to the heatmap.
import seaborn as sns
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Load and prepare the data
flights = sns.load_dataset("flights")
flights_pivot = flights.pivot("___", "___", "___")
# Create the heatmap
plt.figure(figsize=(12, 8))
sns.___(___) # Add your code here
plt.title("___")
plt.show()
print("Plot created successfully.")
# Your analysis here:
analysis = "___" # Replace with your analysis of the heatmap
print(analysis)
Summary
Seaborn provides powerful tools for creating informative and aesthetically pleasing statistical visualizations. By leveraging its high-level interface and integration with Pandas, you can quickly explore and communicate complex data relationships. Continue practicing with different Seaborn plot types and customization options to enhance your data visualization skills and effectively convey insights from your data.