Time Series Data Analysis
5. Advanced Data Manipulation with Python
Time series analysis is crucial for understanding data that changes over time. In this lesson, you'll learn how to work with time series data using Pandas and explore various analysis techniques.
Introduction to Time Series Data
Time series data is a sequence of data points indexed in time order. Common examples include stock prices, weather data, and sales figures. Pandas provides powerful tools for handling time series data.
Creating and Manipulating Time Series Data
Let's start by creating a simple time series dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Create a date range
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
# Create a series with random walk data
np.random.seed(0)
data = np.random.randn(len(dates)).cumsum()
ts = pd.Series(data, index=dates)
print("First 10 rows of the time series:")
print(ts.head(10))
# Plot the time series
plt.figure(figsize=(12, 6))
ts.plot()
plt.title('Random Walk Time Series')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()
print("Plot created successfully.")
Resampling and Frequency Conversion
Resampling allows you to change the frequency of your time series data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Using the same time series as before
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
np.random.seed(0)
data = np.random.randn(len(dates)).cumsum()
ts = pd.Series(data, index=dates)
# Resample to monthly frequency
monthly_mean = ts.resample('M').mean()
# Plot original and resampled data
plt.figure(figsize=(12, 6))
ts.plot(label='Daily')
monthly_mean.plot(label='Monthly Mean')
plt.title('Original vs Resampled Time Series')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()
print("Plot created successfully.")
print("\nFirst 5 rows of monthly mean:")
print(monthly_mean.head())
Rolling Statistics and Moving Averages
Rolling statistics are useful for smoothing time series and identifying trends:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Using the same time series as before
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
np.random.seed(0)
data = np.random.randn(len(dates)).cumsum()
ts = pd.Series(data, index=dates)
# Calculate 30-day moving average
rolling_mean = ts.rolling(window=30).mean()
# Plot original data and moving average
plt.figure(figsize=(12, 6))
ts.plot(label='Original')
rolling_mean.plot(label='30-day Moving Average')
plt.title('Time Series with Moving Average')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()
print("Plot created successfully.")
Key Time Series Concepts
1. Date Range: Creating and manipulating date ranges with pd.date_range().
2. Resampling: Changing the frequency of time series data.
3. Rolling Statistics: Calculating moving averages and other rolling computations.
4. Time-based Indexing: Efficiently selecting data based on dates or date ranges.
5. Seasonality and Trends: Identifying patterns and long-term movements in data.
Practice Exercises
Now it's time to apply what you've learned about time series analysis!
Exercise 1: Stock Price Analysis
Using the provided stock price dataset, perform the following analyses:
- Calculate and plot the daily returns of the stock.
- Compute and visualize the 20-day and 50-day moving averages.
- Resample the data to weekly frequency and calculate the weekly price range (high - low).
- Use pct_change() to calculate daily returns.
- Utilize rolling() for moving averages.
- Apply resample() for weekly data and agg() for custom calculations.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
# Create a sample stock price dataset
dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='B')
np.random.seed(0)
prices = 100 + np.random.randn(len(dates)).cumsum()
df = pd.DataFrame({'price': prices}, index=dates)
# Your code here
# 1. Calculate and plot daily returns
daily_returns = ___
# 2. Compute and visualize moving averages
ma_20 = ___
ma_50 = ___
# 3. Resample to weekly frequency and calculate price range
weekly_range = ___
# Plotting
plt.figure(figsize=(12, 8))
# Plot your results here
plt.show()
print("Plot created successfully.")
# Print results
print("First 5 rows of daily returns:")
print(___)
print("\nFirst 5 rows of weekly price range:")
print(___)
Exercise 2: Advanced Time Series Analysis
Using a provided temperature dataset, perform these advanced analyses:
- Identify and visualize the seasonal pattern in the data.
- Decompose the time series into trend, seasonal, and residual components.
- Forecast the next 30 days of temperatures using a simple moving average model.
- Use groupby() and mean() to identify seasonal patterns.
- Apply seasonal_decompose from statsmodels for time series decomposition.
- Implement a simple forecasting method using historical averages.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
plt.switch_backend('Agg')
# Create a sample temperature dataset
dates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='D')
np.random.seed(0)
temp = 20 + 10 * np.sin(np.arange(len(dates)) * 2 * np.pi / 365) + np.random.randn(len(dates)) * 3
df = pd.DataFrame({'temperature': temp}, index=dates)
# Your code here
# 1. Identify and visualize seasonal pattern
seasonal_pattern = ___
# 2. Decompose the time series
decomposition = ___
# 3. Forecast next 30 days
forecast = ___
# Plotting
plt.figure(figsize=(12, 10))
# Plot your results here
plt.show()
print("Plot created successfully.")
# Print results
print("Seasonal Pattern:")
print(___)
print("\nForecast for next 30 days:")
print(___)
Summary
Time series analysis is a powerful tool for understanding and forecasting data that changes over time. By mastering techniques such as resampling, rolling statistics, and decomposition, you can extract valuable insights from temporal data and make informed decisions based on historical patterns and trends. Continue practicing with different datasets and exploring more advanced time series techniques to enhance your skills in this crucial area of data analysis.