搜索

QQ空間 QQ好友新浪微博微信

最強(qiáng)總結(jié)，十大時(shí)間序列算法！！

昵稱69125444 2024-07-23

展開全文

哈嘍，我是小白~

今兒咱們一起來看看時(shí)間序列分析算法。

時(shí)間序列分析方法很重要，主要由于：

趨勢(shì)識(shí)別：幫助我們識(shí)別數(shù)據(jù)中的長(zhǎng)期趨勢(shì)和季節(jié)性變化，為預(yù)測(cè)未來趨勢(shì)提供依據(jù)。
異常檢測(cè)：能夠檢測(cè)數(shù)據(jù)中的異常值和異常模式，及時(shí)發(fā)現(xiàn)潛在問題。
決策支持：為政策制定和戰(zhàn)略規(guī)劃提供數(shù)據(jù)驅(qū)動(dòng)的洞察，幫助優(yōu)化決策過程。

所以，基于它的重要性，我們一起來看今天分享的有：

自回歸
移動(dòng)平均
自回歸滑動(dòng)平均
自回歸積分滑動(dòng)平均
季節(jié)性自回歸積分滑動(dòng)平均
向量自回歸
向量自回歸滑動(dòng)平均
長(zhǎng)短期記憶網(wǎng)絡(luò)
Prophet
變分自編碼器

一起來看看~

1. 自回歸 (AR, Autoregressive Model)

自回歸模型假設(shè)當(dāng)前值是過去若干時(shí)刻值的線性組合。

原理

模型利用前 p 個(gè)時(shí)間點(diǎn)的數(shù)據(jù)預(yù)測(cè)當(dāng)前時(shí)間點(diǎn)的數(shù)據(jù)。

核心公式

其中：

是常數(shù)項(xiàng)，
是自回歸系數(shù)，
是白噪聲誤差。

自回歸模型的推導(dǎo)可以通過對(duì)時(shí)間序列數(shù)據(jù)進(jìn)行線性回歸得到，即：

使用最小二乘法求解系數(shù) 。

核心案例

案例中，使用了股票價(jià)格數(shù)據(jù)，展示如何擬合AR模型并進(jìn)行預(yù)測(cè)，并生成兩個(gè)以上的數(shù)據(jù)分析圖形。

使用一個(gè)股票價(jià)格的時(shí)間序列數(shù)據(jù)，首先對(duì)數(shù)據(jù)進(jìn)行預(yù)處理，然后擬合一個(gè)自回歸模型，最后生成幾個(gè)圖形，包括時(shí)間序列圖、ACF圖、PACF圖和預(yù)測(cè)圖。

使用 yfinance 庫來獲取股票價(jià)格數(shù)據(jù)。

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# 下載數(shù)據(jù)
ticker = 'AAPL'  # 以蘋果公司股票為例
start_date = '2020-01-01'
end_date = '2023-01-01'
data = yf.download(ticker, start=start_date, end=end_date)
close_prices = data['Close']

# 繪制時(shí)間序列圖
plt.figure(figsize=(14, 7))
plt.plot(close_prices, label='Close Price')
plt.title('Apple Stock Close Prices')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()

# ACF和PACF圖
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
plot_acf(close_prices, ax=ax1, lags=50)
plot_pacf(close_prices, ax=ax2, lags=50)
plt.show()

# 擬合自回歸模型
lags = 30
model = AutoReg(close_prices, lags=lags)
model_fit = model.fit()

# 模型預(yù)測(cè)
pred_start = len(close_prices)
pred_end = pred_start   50
predictions = model_fit.predict(start=pred_start, end=pred_end)

# 繪制預(yù)測(cè)結(jié)果
plt.figure(figsize=(14, 7))
plt.plot(close_prices, label='Observed')
plt.plot(predictions, label='Forecast', linestyle='--')
plt.title('Apple Stock Close Price Forecast')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()

時(shí)間序列圖：展示了蘋果公司股票的每日收盤價(jià)。
ACF和PACF圖：用于檢查時(shí)間序列的自相關(guān)性和部分自相關(guān)性，幫助確定AR模型的階數(shù)。
預(yù)測(cè)圖：展示了擬合AR模型后的未來50天的股票價(jià)格預(yù)測(cè)。

2. 移動(dòng)平均 (MA, Moving Average Model)

移動(dòng)平均模型假設(shè)當(dāng)前值是過去若干時(shí)刻誤差的線性組合。

原理

模型利用前 q 個(gè)時(shí)間點(diǎn)的誤差預(yù)測(cè)當(dāng)前時(shí)間點(diǎn)的數(shù)據(jù)。

核心公式

其中：

是常數(shù)項(xiàng)，
是移動(dòng)平均系數(shù)，
是白噪聲誤差。

通過假設(shè)誤差項(xiàng) 是獨(dú)立同分布的白噪聲，可以用最小二乘法或極大似然估計(jì)法求解系數(shù) 。

核心案例

假設(shè)我們有一個(gè)月度的銷售數(shù)據(jù)，該數(shù)據(jù)包含一些季節(jié)性和隨機(jī)波動(dòng)。我們將使用移動(dòng)平均模型來平滑數(shù)據(jù)，并比較平滑后的數(shù)據(jù)與原始數(shù)據(jù)。此外，我們還將計(jì)算并繪制殘差。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# 生成模擬的時(shí)間序列數(shù)據(jù)
np.random.seed(42)
n_periods = 120
date_range = pd.date_range(start='2010-01', periods=n_periods, freq='M')
seasonal_pattern = np.sin(2 * np.pi * date_range.month / 12)
random_noise = np.random.normal(scale=0.5, size=n_periods)
sales = 10   seasonal_pattern   random_noise

# 創(chuàng)建數(shù)據(jù)框
data = pd.DataFrame({'Date': date_range, 'Sales': sales})
data.set_index('Date', inplace=True)

# 繪制原始數(shù)據(jù)
plt.figure(figsize=(14, 6))
plt.plot(data.index, data['Sales'], label='Original Sales Data')
plt.title('Monthly Sales Data')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.legend()
plt.show()

# 使用移動(dòng)平均模型進(jìn)行平滑
window_size = 12
data['Sales_MA'] = data['Sales'].rolling(window=window_size).mean()

# 繪制平滑后的數(shù)據(jù)
plt.figure(figsize=(14, 6))
plt.plot(data.index, data['Sales'], label='Original Sales Data')
plt.plot(data.index, data['Sales_MA'], label=f'{window_size}-month Moving Average', color='red')
plt.title('Monthly Sales Data with Moving Average')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.legend()
plt.show()

# 計(jì)算殘差
data['Residual'] = data['Sales'] - data['Sales_MA']

# 繪制殘差
plt.figure(figsize=(14, 6))
plt.plot(data.index, data['Residual'], label='Residuals', color='green')
plt.title('Residuals from Moving Average Model')
plt.xlabel('Date')
plt.ylabel('Residual')
plt.legend()
plt.show()

# 繪制自相關(guān)圖和偏自相關(guān)圖
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
plot_acf(data['Residual'].dropna(), ax=axes[0], lags=40)
plot_pacf(data['Residual'].dropna(), ax=axes[1], lags=40)
axes[0].set_title('ACF of Residuals')
axes[1].set_title('PACF of Residuals')
plt.show()

生成時(shí)間序列數(shù)據(jù)：使用正弦函數(shù)生成季節(jié)性模式，并添加隨機(jī)噪聲。
繪制原始數(shù)據(jù)：繪制原始的月度銷售數(shù)據(jù)。
計(jì)算移動(dòng)平均：使用滾動(dòng)窗口方法計(jì)算移動(dòng)平均。
繪制平滑后的數(shù)據(jù)：比較平滑后的數(shù)據(jù)與原始數(shù)據(jù)。
計(jì)算并繪制殘差：計(jì)算殘差并繪制殘差時(shí)間序列。
繪制自相關(guān)圖和偏自相關(guān)圖：分析殘差的自相關(guān)性和偏自相關(guān)性。

3. 自回歸滑動(dòng)平均 (ARMA, Autoregressive Moving Average Model)

ARMA模型結(jié)合了自回歸和移動(dòng)平均模型。

原理

模型利用前 p 個(gè)時(shí)間點(diǎn)的數(shù)據(jù)和前 q 個(gè)時(shí)間點(diǎn)的誤差預(yù)測(cè)當(dāng)前時(shí)間點(diǎn)的數(shù)據(jù)。

核心公式

ARMA模型是將自回歸模型和移動(dòng)平均模型結(jié)合，利用兩者的推導(dǎo)方法，對(duì)兩個(gè)模型的參數(shù)進(jìn)行聯(lián)合估計(jì)。

核心案例

代碼中，我們使用Python中的 statsmodels 庫來進(jìn)行ARMA建模和分析，并使用 matplotlib 來繪制圖形。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# 生成示例時(shí)間序列數(shù)據(jù)
np.random.seed(42)
n = 200
ar_params = np.array([0.75, -0.25])
ma_params = np.array([0.65, 0.35])
ar = np.r_[1, -ar_params]  # add zero-lag and negate
ma = np.r_[1, ma_params]   # add zero-lag
y = np.random.normal(size=n)
x = np.convolve(y, ma)[:n]   np.random.normal(size=n)
time_series = pd.Series(x)

# 繪制原始時(shí)間序列數(shù)據(jù)
plt.figure(figsize=(12, 6))
plt.plot(time_series, label='Original Time Series')
plt.title('Original Time Series')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

# 繪制自相關(guān)圖 (ACF) 和偏自相關(guān)圖 (PACF)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
plot_acf(time_series, ax=axes[0], title='Autocorrelation Function (ACF)')
plot_pacf(time_series, ax=axes[1], title='Partial Autocorrelation Function (PACF)')
plt.show()

# 建立并擬合ARMA模型
model = ARIMA(time_series, order=(2, 0, 2))
arma_result = model.fit()

# 打印模型摘要
print(arma_result.summary())

# 繪制擬合后的時(shí)間序列和殘差圖
plt.figure(figsize=(12, 6))
plt.plot(time_series, label='Original Time Series')
plt.plot(arma_result.fittedvalues, color='red', label='Fitted Values')
plt.title('Original and Fitted Time Series')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

# 繪制殘差圖
residuals = arma_result.resid
plt.figure(figsize=(12, 6))
plt.plot(residuals, label='Residuals')
plt.title('Residuals of ARMA Model')
plt.xlabel('Time')
plt.ylabel('Residual')
plt.legend()
plt.show()

生成示例時(shí)間序列數(shù)據(jù)：我們生成一個(gè)包含200個(gè)數(shù)據(jù)點(diǎn)的時(shí)間序列，使用自回歸參數(shù) ar_params 和移動(dòng)平均參數(shù) ma_params。
繪制原始時(shí)間序列數(shù)據(jù)：使用 matplotlib 繪制原始時(shí)間序列數(shù)據(jù)。
繪制ACF和PACF圖：使用 statsmodels 庫的 plot_acf 和 plot_pacf 函數(shù)繪制自相關(guān)函數(shù) (ACF) 和偏自相關(guān)函數(shù) (PACF) 圖。
建立并擬合ARMA模型：使用 statsmodels 庫中的 ARIMA 函數(shù)建立ARMA模型并進(jìn)行擬合。
打印模型摘要：打印擬合結(jié)果的摘要，包含模型參數(shù)和統(tǒng)計(jì)信息。
繪制擬合后的時(shí)間序列和殘差圖：繪制原始時(shí)間序列與模型擬合值的對(duì)比圖，以及模型殘差圖。

4. 自回歸積分滑動(dòng)平均 (ARIMA, Autoregressive Integrated Moving Average Model)

ARIMA模型擴(kuò)展了ARMA模型，適用于非平穩(wěn)時(shí)間序列。

原理

模型通過對(duì)數(shù)據(jù)進(jìn)行差分處理，使其平穩(wěn)，然后再應(yīng)用ARMA模型。

核心公式

其中：

是滯后算子，
是差分次數(shù)。

通過對(duì)原始序列進(jìn)行 d 次差分，使其變?yōu)槠椒€(wěn)序列，然后對(duì)平穩(wěn)序列應(yīng)用ARMA模型進(jìn)行參數(shù)估計(jì)。

核心案例

當(dāng)然可以！下面是一個(gè)關(guān)于時(shí)間序列分析的案例，使用 ARIMA 模型來分析和預(yù)測(cè)數(shù)據(jù)。我們將使用 Python 的 pandas、numpy、matplotlib 和 statsmodels 庫來完成這項(xiàng)任務(wù)。具體案例為模擬一個(gè)經(jīng)濟(jì)數(shù)據(jù)的時(shí)間序列，例如股票價(jià)格或經(jīng)濟(jì)指標(biāo)。

案例概述

創(chuàng)建一個(gè)模擬的時(shí)間序列數(shù)據(jù)，應(yīng)用 ARIMA 模型進(jìn)行建模和預(yù)測(cè)，并畫出原始數(shù)據(jù)、ACF (自相關(guān)函數(shù)) 圖和預(yù)測(cè)結(jié)果圖。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# 1. 創(chuàng)建模擬時(shí)間序列數(shù)據(jù)
np.random.seed(42)
n = 200
time = np.arange(n)
data = np.sin(0.1 * time)   0.5 * np.random.randn(n)

# 轉(zhuǎn)換為 Pandas DataFrame
df = pd.DataFrame(data, columns=['Value'])
df.index = pd.date_range(start='2020-01-01', periods=n, freq='D')

# 2. 繪制原始時(shí)間序列圖
plt.figure(figsize=(14, 7))
plt.subplot(3, 1, 1)
plt.plot(df.index, df['Value'], label='Original Data')
plt.title('Original Time Series')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()

# 3. 繪制自相關(guān)函數(shù) (ACF) 和偏自相關(guān)函數(shù) (PACF) 圖
plt.subplot(3, 1, 2)
plot_acf(df['Value'], ax=plt.gca(), lags=30)
plt.title('ACF of Time Series')

plt.subplot(3, 1, 3)
plot_pacf(df['Value'], ax=plt.gca(), lags=30)
plt.title('PACF of Time Series')

plt.tight_layout()
plt.show()

# 4. 應(yīng)用 ARIMA 模型
from statsmodels.tsa.arima.model import ARIMA

# 擬合 ARIMA 模型
model = ARIMA(df['Value'], order=(5, 0, 0))  # (p, d, q) 這里 d=0 是因?yàn)閿?shù)據(jù)沒有差分
model_fit = model.fit()

# 打印模型摘要
print(model_fit.summary())

# 5. 預(yù)測(cè)未來 20 個(gè)時(shí)間點(diǎn)
forecast = model_fit.forecast(steps=20)

# 創(chuàng)建預(yù)測(cè)數(shù)據(jù)的時(shí)間序列
forecast_index = pd.date_range(start=df.index[-1]   pd.Timedelta(days=1), periods=20, freq='D')
forecast_df = pd.DataFrame(forecast, index=forecast_index, columns=['Forecast'])

# 6. 繪制預(yù)測(cè)結(jié)果圖
plt.figure(figsize=(14, 7))
plt.plot(df.index, df['Value'], label='Original Data')
plt.plot(forecast_df.index, forecast_df['Forecast'], color='red', linestyle='--', label='Forecast')
plt.title('Forecast using ARIMA Model')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

數(shù)據(jù)生成: 使用正弦函數(shù)加上隨機(jī)噪聲生成模擬時(shí)間序列數(shù)據(jù)。
圖形展示:

原始時(shí)間序列圖：展示生成的時(shí)間序列數(shù)據(jù)。
自相關(guān)函數(shù) (ACF) 圖和偏自相關(guān)函數(shù) (PACF) 圖：用于確定 ARIMA 模型的參數(shù)。

ARIMA 模型:

使用 ARIMA 類來擬合模型并進(jìn)行預(yù)測(cè)。
打印模型摘要以查看擬合結(jié)果。

預(yù)測(cè)結(jié)果圖: 展示 ARIMA 模型的預(yù)測(cè)結(jié)果與原始數(shù)據(jù)。

5. 季節(jié)性自回歸積分滑動(dòng)平均 (SARIMA, Seasonal ARIMA)

SARIMA模型擴(kuò)展了ARIMA模型，適用于季節(jié)性時(shí)間序列。

原理

模型結(jié)合了季節(jié)性自回歸、季節(jié)性差分和季節(jié)性移動(dòng)平均成分。

核心公式

其中：

是季節(jié)周期，
是季節(jié)差分次數(shù)。

將季節(jié)性成分加入到ARIMA模型中，結(jié)合季節(jié)性自回歸、季節(jié)性差分和季節(jié)性移動(dòng)平均，對(duì)模型進(jìn)行參數(shù)估計(jì)。

核心案例

以下是一個(gè)關(guān)于月度航空乘客數(shù)量數(shù)據(jù)的案例分析，該數(shù)據(jù)集包含1949年1月到1960年12月之間的月度航空乘客數(shù)量。我們使用SARIMA模型對(duì)該數(shù)據(jù)進(jìn)行建模，并繪制相關(guān)的圖形進(jìn)行數(shù)據(jù)分析。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller, acf, pacf

# 加載航空乘客數(shù)據(jù)集
file = 'airline-passengers.csv'
data = pd.read_csv(file, index_col='Month', parse_dates=True)
data.index.freq = 'MS'

# 繪制原始數(shù)據(jù)
plt.figure(figsize=(10, 6))
plt.plot(data, label='Monthly Airline Passengers')
plt.title('Monthly Airline Passengers from 1949 to 1960')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.show()

# 進(jìn)行ADF檢驗(yàn)
adf_result = adfuller(data['Passengers'])
print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')

# 繪制ACF和PACF圖
lag_acf = acf(data['Passengers'], nlags=40)
lag_pacf = pacf(data['Passengers'], nlags=40, method='ols')

plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.stem(range(len(lag_acf)), lag_acf, linefmt='b-', markerfmt='bo', basefmt='r-')
plt.axhline(y=0, linestyle='--', color='gray')
plt.axhline(y=-1.96/np.sqrt(len(data)), linestyle='--', color='gray')
plt.axhline(y=1.96/np.sqrt(len(data)), linestyle='--', color='gray')
plt.title('Autocorrelation Function')

plt.subplot(122)
plt.stem(range(len(lag_pacf)), lag_pacf, linefmt='b-', markerfmt='bo', basefmt='r-')
plt.axhline(y=0, linestyle='--', color='gray')
plt.axhline(y=-1.96/np.sqrt(len(data)), linestyle='--', color='gray')
plt.axhline(y=1.96/np.sqrt(len(data)), linestyle='--', color='gray')
plt.title('Partial Autocorrelation Function')
plt.tight_layout()
plt.show()

# 擬合SARIMA模型
model = SARIMAX(data['Passengers'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
results = model.fit()

# 打印模型總結(jié)
print(results.summary())

# 繪制預(yù)測(cè)結(jié)果
data['forecast'] = results.predict(start=120, end=144, dynamic=True)
plt.figure(figsize=(10, 6))
plt.plot(data['Passengers'], label='Actual Passengers')
plt.plot(data['forecast'], label='Forecasted Passengers', color='red')
plt.title('Actual vs Forecasted Passengers')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.show()

加載并繪制月度航空乘客數(shù)據(jù)的原始數(shù)據(jù)圖。
使用ADF檢驗(yàn)檢查數(shù)據(jù)的平穩(wěn)性。
繪制自相關(guān)函數(shù)（ACF）和偏自相關(guān)函數(shù)（PACF）圖，以幫助確定模型的階數(shù)。
使用SARIMA模型擬合數(shù)據(jù)并打印模型總結(jié)。
繪制實(shí)際數(shù)據(jù)與預(yù)測(cè)數(shù)據(jù)的對(duì)比圖。

6. 向量自回歸 (VAR, Vector Autoregression)

VAR模型是自回歸模型的多變量擴(kuò)展，適用于多變量時(shí)間序列。

原理

模型利用多個(gè)時(shí)間序列的歷史數(shù)據(jù)進(jìn)行聯(lián)合預(yù)測(cè)。

核心公式

其中：

是多變量時(shí)間序列向量，
是系數(shù)矩陣。

對(duì)多變量時(shí)間序列進(jìn)行線性回歸，使用最小二乘法求解系數(shù)矩陣。

核心案例

使用美國(guó)經(jīng)濟(jì)時(shí)間序列數(shù)據(jù)集，該數(shù)據(jù)集包括了消費(fèi)、收入和投資的月度數(shù)據(jù)。這個(gè)數(shù)據(jù)集可以在 statsmodels 庫中找到。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.api import VAR

# 加載數(shù)據(jù)集
from statsmodels.datasets.macrodata import load_pandas
data = load_pandas().data

# 選擇感興趣的變量
df = data[['realgdp', 'realcons', 'realinv']]

# 設(shè)置時(shí)間索引
dates = pd.date_range(start='1959Q1', periods=len(df), freq='Q')
df.index = dates

# 繪制原始數(shù)據(jù)的時(shí)間序列圖
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8))
df.plot(subplots=True, ax=axes)
plt.tight_layout()
plt.show()

# 計(jì)算一階差分
df_diff = df.diff().dropna()

# 繪制差分后的數(shù)據(jù)
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8))
df_diff.plot(subplots=True, ax=axes)
plt.tight_layout()
plt.show()

# 構(gòu)建并訓(xùn)練VAR模型
model = VAR(df_diff)
results = model.fit(maxlags=15, ic='aic')

# 打印模型摘要
print(results.summary())

# 繪制模型殘差的時(shí)間序列圖
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8))
for i, column in enumerate(df_diff.columns):
    axes[i].plot(results.resid[:, i])
    axes[i].set_title(f'Residuals of {column}')
plt.tight_layout()
plt.show()

# 預(yù)測(cè)未來的時(shí)間序列
lag_order = results.k_ar
forecast = results.forecast(df_diff.values[-lag_order:], steps=10)
forecast_index = pd.date_range(start=df.index[-1], periods=10, freq='Q')
forecast_df = pd.DataFrame(forecast, index=forecast_index, columns=df.columns)

# 繪制預(yù)測(cè)結(jié)果
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8))
for i, column in enumerate(df.columns):
    axes[i].plot(df.index, df[column], label='Original')
    axes[i].plot(forecast_df.index, forecast_df[column], label='Forecast')
    axes[i].set_title(f'{column} - Original vs Forecast')
    axes[i].legend()
plt.tight_layout()
plt.show()

加載了包含美國(guó)經(jīng)濟(jì)數(shù)據(jù)的宏觀經(jīng)濟(jì)數(shù)據(jù)集。
選擇了實(shí)際GDP、實(shí)際消費(fèi)和實(shí)際投資作為分析變量。
繪制了原始數(shù)據(jù)和一階差分后的數(shù)據(jù)的時(shí)間序列圖。
構(gòu)建并訓(xùn)練了VAR模型，并打印了模型摘要。
繪制了模型殘差的時(shí)間序列圖。
預(yù)測(cè)了未來10個(gè)季度的時(shí)間序列，并繪制了預(yù)測(cè)結(jié)果與原始數(shù)據(jù)的對(duì)比圖。

7. 向量自回歸滑動(dòng)平均 (VARMA, Vector Autoregressive Moving Average)

VARMA模型結(jié)合了向量自回歸和移動(dòng)平均模型。

原理

模型利用多個(gè)時(shí)間序列的歷史數(shù)據(jù)和誤差進(jìn)行聯(lián)合預(yù)測(cè)。

核心公式

其中：

是誤差系數(shù)矩陣。

結(jié)合VAR模型和MA模型的推導(dǎo)方法，對(duì)兩個(gè)模型的參數(shù)進(jìn)行聯(lián)合估計(jì)。

核心案例

時(shí)間序列分析中的向量自回歸滑動(dòng)平均（VARMA）模型通常用于分析多個(gè)相關(guān)聯(lián)的時(shí)間序列變量之間的動(dòng)態(tài)關(guān)系。

假設(shè)我們有兩個(gè)變量，分別是銷售量和廣告支出，我們想分析它們之間的動(dòng)態(tài)關(guān)系。我們將構(gòu)建一個(gè)VARMA(1,1)模型來說明。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.varmax import VARMAX
from statsmodels.tsa.vector_ar.var_model import VAR

# 生成模擬數(shù)據(jù)
np.random.seed(42)
n_obs = 100
sales = np.random.normal(loc=100, scale=15, size=n_obs)
advertising = np.random.normal(loc=50, scale=10, size=n_obs)

# 創(chuàng)建DataFrame
data = pd.DataFrame({'Sales': sales, 'Advertising': advertising})

# 拆分?jǐn)?shù)據(jù)為訓(xùn)練集和測(cè)試集
train = data.iloc[:80]
test = data.iloc[80:]

# 擬合VARMA模型
model = VARMAX(train, order=(1, 1))
results = model.fit(maxiter=1000, disp=False)
print(results.summary())

# 預(yù)測(cè)未來值
forecast = results.forecast(steps=len(test))

# 繪制銷售量和廣告支出的時(shí)間序列及預(yù)測(cè)結(jié)果
plt.figure(figsize=(14, 7))

plt.subplot(2, 1, 1)
plt.plot(train['Sales'], label='Actual Sales (Train)')
plt.plot(test.index, forecast['Sales'], label='Forecasted Sales')
plt.title('Sales Forecast using VARMA(1,1)')
plt.legend()

plt.subplot(2, 1, 2)
plt.plot(train['Advertising'], label='Actual Advertising (Train)')
plt.plot(test.index, forecast['Advertising'], label='Forecasted Advertising')
plt.title('Advertising Forecast using VARMA(1,1)')
plt.legend()

plt.tight_layout()
plt.show()

我們首先生成了模擬數(shù)據(jù)，模擬了銷售量和廣告支出的時(shí)間序列。
將數(shù)據(jù)分為訓(xùn)練集和測(cè)試集。
使用VARMAX模型擬合了VARMA(1,1)模型。
對(duì)測(cè)試集進(jìn)行預(yù)測(cè)，并繪制了銷售量和廣告支出的實(shí)際數(shù)據(jù)以及預(yù)測(cè)結(jié)果的時(shí)間序列圖。

8. 長(zhǎng)短期記憶網(wǎng)絡(luò) (LSTM, Long Short-Term Memory)

LSTM是一種特殊的遞歸神經(jīng)網(wǎng)絡(luò)（RNN），適用于捕捉長(zhǎng)時(shí)間依賴關(guān)系。

原理

LSTM通過引入記憶單元和門控機(jī)制，有效解決了RNN的梯度消失問題。

核心公式

LSTM通過記憶單元和三個(gè)門控單元（遺忘門，輸入門和輸出門）進(jìn)行信息控制。推導(dǎo)過程可以通過反向傳播算法進(jìn)行參數(shù)優(yōu)化。

核心案例

使用Keras來構(gòu)建和訓(xùn)練LSTM模型，并使用Matplotlib來繪制數(shù)據(jù)分析圖形。我們使用一個(gè)簡(jiǎn)單的模擬時(shí)間序列數(shù)據(jù)來演示LSTM模型的應(yīng)用。該數(shù)據(jù)代表某種時(shí)間序列，例如每日溫度變化。

import numpy as np
import matplotlib.pyplot as plt

# 生成模擬時(shí)間序列數(shù)據(jù)
np.random.seed(0)
time_steps = 100
data = np.sin(np.linspace(0, 10 * np.pi, time_steps))   np.random.normal(0, 0.5, time_steps)

# 繪制原始數(shù)據(jù)
plt.figure(figsize=(14, 6))
plt.plot(data, label='Original Data')
plt.title('Simulated Time Series Data')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.show()

# 構(gòu)建LSTM模型
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# 準(zhǔn)備訓(xùn)練數(shù)據(jù)
def create_dataset(data, look_back=1):
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data[i:(i   look_back)])
        y.append(data[i   look_back])
    return np.array(X), np.array(y)

look_back = 3
X, y = create_dataset(data, look_back)
X = X.reshape(X.shape[0], X.shape[1], 1)  # LSTM 需要 3D 輸入

# 構(gòu)建 LSTM 模型
model = Sequential()
model.add(LSTM(50, input_shape=(look_back, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

# 訓(xùn)練模型
model.fit(X, y, epochs=100, batch_size=1, verbose=2)

# 預(yù)測(cè)并繪制結(jié)果
train_predict = model.predict(X)
train_predict_plot = np.empty_like(data)
train_predict_plot[:] = np.nan
train_predict_plot[look_back:len(train_predict)   look_back] = train_predict.flatten()

# 繪制原始數(shù)據(jù)與預(yù)測(cè)數(shù)據(jù)對(duì)比圖
plt.figure(figsize=(14, 6))
plt.plot(data, label='Original Data')
plt.plot(train_predict_plot, label='LSTM Prediction')
plt.title('LSTM Prediction vs Original Data')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.show()

# 繪制損失函數(shù)變化圖
history = model.history
loss = history.history['loss']

plt.figure(figsize=(14, 6))
plt.plot(loss, label='Training Loss')
plt.title('Model Training Loss Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

兩個(gè)數(shù)據(jù)分析圖形：一個(gè)是原始數(shù)據(jù)與LSTM預(yù)測(cè)數(shù)據(jù)的對(duì)比圖，另一個(gè)是訓(xùn)練過程中損失函數(shù)的變化圖。

9. Prophet

Prophet是由Facebook開發(fā)的時(shí)間序列預(yù)測(cè)工具，適用于包含節(jié)假日效應(yīng)、趨勢(shì)變化和周期性變化的數(shù)據(jù)。

原理

模型將時(shí)間序列分解為趨勢(shì)、季節(jié)性和節(jié)假日效應(yīng)部分。

核心公式

其中：

表示趨勢(shì)部分，
表示季節(jié)性部分，
表示節(jié)假日效應(yīng)。

通過對(duì)各部分進(jìn)行獨(dú)立建模和參數(shù)估計(jì)，最終結(jié)合各部分得到預(yù)測(cè)結(jié)果。

核心案例

下面是一個(gè)使用 Facebook 的 Prophet 庫進(jìn)行時(shí)間序列分析的案例。在這個(gè)案例中，我們將使用 Prophet 對(duì)一個(gè)時(shí)間序列進(jìn)行建模和預(yù)測(cè)，并繪制多個(gè)圖形來展示數(shù)據(jù)分析的結(jié)果。

我們將使用一個(gè)示例數(shù)據(jù)集，該數(shù)據(jù)集包含了某網(wǎng)站每日的訪問量。我們將進(jìn)行以下步驟：

導(dǎo)入數(shù)據(jù)并預(yù)處理。
使用 Prophet 進(jìn)行建模和預(yù)測(cè)。
繪制原始數(shù)據(jù)及預(yù)測(cè)結(jié)果。
繪制預(yù)測(cè)中的趨勢(shì)和季節(jié)性成分。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from prophet import Prophet

# 生成示例數(shù)據(jù)
dates = pd.date_range(start='2020-01-01', periods=730, freq='D')
data = np.random.poisson(lam=200, size=730)   np.linspace(0, 100, 730)
df = pd.DataFrame({'ds': dates, 'y': data})

# 初始化并訓(xùn)練Prophet模型
model = Prophet(yearly_seasonality=True, daily_seasonality=False)
model.fit(df)

# 創(chuàng)建未來的數(shù)據(jù)框架并進(jìn)行預(yù)測(cè)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

# 繪制原始數(shù)據(jù)及預(yù)測(cè)結(jié)果
fig1 = model.plot(forecast)
plt.title('Original Data and Forecast')
plt.xlabel('Date')
plt.ylabel('Website Traffic')

# 繪制趨勢(shì)和季節(jié)性成分
fig2 = model.plot_components(forecast)
plt.show()

# 繪制實(shí)際數(shù)據(jù)與預(yù)測(cè)值的對(duì)比
plt.figure(figsize=(10, 6))
plt.plot(df['ds'], df['y'], label='Actual')
plt.plot(forecast['ds'], forecast['yhat'], label='Forecast', linestyle='--')
plt.fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'], color='gray', alpha=0.2)
plt.title('Actual vs Forecast')
plt.xlabel('Date')
plt.ylabel('Website Traffic')
plt.legend()
plt.show()

# 繪制殘差圖
residuals = df['y'] - forecast['yhat'][:len(df)]
plt.figure(figsize=(10, 6))
plt.plot(df['ds'], residuals)
plt.axhline(0, linestyle='--', color='red')
plt.title('Residuals')
plt.xlabel('Date')
plt.ylabel('Residual')
plt.show()

數(shù)據(jù)生成和預(yù)處理：使用 pandas 生成了從 2020 年 1 月 1 日開始的 730 天的日期范圍。生成了一個(gè)示例數(shù)據(jù)集，模擬了網(wǎng)站的每日訪問量。
模型訓(xùn)練：初始化 Prophet 模型，并設(shè)置年度季節(jié)性為真，每日季節(jié)性為假。使用示例數(shù)據(jù)集訓(xùn)練模型。
預(yù)測(cè)：創(chuàng)建一個(gè)包含未來 365 天的日期的數(shù)據(jù)框架，并使用訓(xùn)練好的模型進(jìn)行預(yù)測(cè)。繪制原始數(shù)據(jù)及預(yù)測(cè)結(jié)果圖，展示了實(shí)際數(shù)據(jù)與預(yù)測(cè)值的對(duì)比。
趨勢(shì)和季節(jié)性成分：繪制趨勢(shì)和季節(jié)性成分圖，展示了預(yù)測(cè)中的長(zhǎng)期趨勢(shì)和季節(jié)性波動(dòng)。
實(shí)際數(shù)據(jù)與預(yù)測(cè)值的對(duì)比：將實(shí)際數(shù)據(jù)與預(yù)測(cè)值進(jìn)行對(duì)比，展示了預(yù)測(cè)模型的擬合效果。
殘差圖：計(jì)算殘差（實(shí)際值減去預(yù)測(cè)值），并繪制殘差圖，展示了模型的預(yù)測(cè)誤差。

10. 變分自編碼器 (VAE, Variational Autoencoders)

VAE是一種生成模型，適用于捕捉復(fù)雜時(shí)間序列數(shù)據(jù)的潛在結(jié)構(gòu)。

原理

VAE通過變分推斷，將復(fù)雜的時(shí)間序列數(shù)據(jù)映射到潛在空間，然后再?gòu)臐撛诳臻g重構(gòu)數(shù)據(jù)。

核心公式

VAE使用變分推斷的方法，最大化對(duì)數(shù)似然函數(shù)：

其中：

是Kullback-Leibler散度。

VAE通過神經(jīng)網(wǎng)絡(luò)來參數(shù)化和，使用反向傳播算法進(jìn)行優(yōu)化。

這些方法各自有其適用的場(chǎng)景和優(yōu)缺點(diǎn)，選擇合適的方法取決于具體的數(shù)據(jù)特征和分析需求。

核心案例

這里，是一個(gè)使用變分自編碼器（VAE）進(jìn)行時(shí)間序列分析的案例：

生成正弦波時(shí)間序列數(shù)據(jù)。
構(gòu)建和訓(xùn)練變分自編碼器（VAE）。
使用VAE對(duì)時(shí)間序列數(shù)據(jù)進(jìn)行編碼和解碼。
繪制原始時(shí)間序列與重構(gòu)時(shí)間序列的對(duì)比圖。
繪制VAE潛在空間的可視化圖。

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

# 生成正弦波時(shí)間序列數(shù)據(jù)
def generate_sine_wave(seq_length, num_samples):
    x = np.linspace(0, np.pi * 2 * num_samples, seq_length * num_samples)
    y = np.sin(x)
    data = y.reshape(num_samples, seq_length, 1)
    return data

seq_length = 50
num_samples = 1000
data = generate_sine_wave(seq_length, num_samples)

# 數(shù)據(jù)集類
class TimeSeriesDataset(Dataset):
    def __init__(self, data):
        self.data = torch.tensor(data, dtype=torch.float32)
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx]

dataset = TimeSeriesDataset(data)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# 定義VAE模型：
class VAE(nn.Module):
    def __init__(self, input_dim, hidden_dim, latent_dim):
        super(VAE, self).__init__()
        # 編碼器
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, latent_dim * 2)  # 輸出mean和logvar
        )
        # 解碼器
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim),
            nn.Sigmoid()
        )

    def reparameterize(self, mean, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mean   eps * std

    def forward(self, x):
        x = x.view(x.size(0), -1)
        mean_logvar = self.encoder(x)
        mean, logvar = mean_logvar[:, :latent_dim], mean_logvar[:, latent_dim:]
        z = self.reparameterize(mean, logvar)
        x_recon = self.decoder(z)
        return x_recon, mean, logvar

# 訓(xùn)練VAE模型：
input_dim = seq_length
hidden_dim = 128
latent_dim = 16
num_epochs = 50
learning_rate = 0.001

vae = VAE(input_dim, hidden_dim, latent_dim)
optimizer = optim.Adam(vae.parameters(), lr=learning_rate)
criterion = nn.MSELoss()

def loss_function(recon_x, x, mean, logvar):
    recon_loss = criterion(recon_x, x)
    kld_loss = -0.5 * torch.sum(1   logvar - mean.pow(2) - logvar.exp())
    return recon_loss   kld_loss

for epoch in range(num_epochs):
    vae.train()
    train_loss = 0
    for batch in dataloader:
        optimizer.zero_grad()
        recon_batch, mean, logvar = vae(batch)
        loss = loss_function(recon_batch, batch.view(-1, input_dim), mean, logvar)
        loss.backward()
        train_loss  = loss.item()
        optimizer.step()
    print(f'Epoch {epoch 1}, Loss: {train_loss / len(dataloader.dataset)}')

# 繪制原始時(shí)間序列與重構(gòu)時(shí)間序列的對(duì)比圖
vae.eval()
with torch.no_grad():
    for batch in dataloader:
        recon_batch, mean, logvar = vae(batch)
        break

recon_batch = recon_batch.view(-1, seq_length).numpy()
original_batch = batch.numpy()

plt.figure(figsize=(12, 6))
plt.plot(original_batch[0], label='Original')
plt.plot(recon_batch[0], label='Reconstructed')
plt.legend()
plt.title('Original vs Reconstructed Time Series')
plt.show()


# 繪制VAE潛在空間的可視化圖
latent_vectors = []
vae.eval()
with torch.no_grad():
    for batch in dataloader:
        _, mean, _ = vae(batch)
        latent_vectors.append(mean.numpy())

latent_vectors = np.concatenate(latent_vectors, axis=0)

plt.figure(figsize=(10, 8))
plt.scatter(latent_vectors[:, 0], latent_vectors[:, 1], alpha=0.5)
plt.title('Latent Space Visualization')
plt.xlabel('Latent Dimension 1')
plt.ylabel('Latent Dimension 2')
plt.show()

最后

以上就是今天所有的內(nèi)容了。

如果對(duì)你來說比較有用，記得點(diǎn)贊、收藏，慢慢學(xué)習(xí)~

下期會(huì)有更多干貨等著你！~

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：昵稱69125444 > 《機(jī)器學(xué)習(xí)》

舉報(bào)/認(rèn)領(lǐng)