Efficient Time-Series Analysis Using Python’s Pmdarima Library | by Muriel Kosaka | Jan, 2021

[ad_1]


From our visualizations, I determined that our p parameter is 0 and q parameter is 2 — our p,d,q parameters will be (0,2,2) for the ARIMA model. After splitting the data into training and testing groups and fitting the ARIMA model on the training set to predict the test set, we obtained a r² value of -1.52 — telling us that the model did not follow the trend of data at all.

I most likely calculated the p,d,q values incorrectly which caused the r² value to be negative, but in the mean time let’s try to build another ARIMA model using pmdarima.

Using pmdarima for Auto ARIMA model

In the previous method, checking for stationarity, making data stationary if necessary, and determining the values of p and q using the ACF/PACF plots can be time-consuming and less efficient. Using pmdarima’s auto_arima() function makes this task easier for us by eliminating steps 2 and 3 for implementing an ARIMA model. Let’s try it with the current dataset.

After loading and preparing the data, we can use pmdarima’s ADFTest() function to conduct a Dickey-Fuller test.

adf_test=ADFTest(alpha=0.05)
adf_test.should_diff(df)
# Output
(0.01, False)

This result indicates that the data is not stationary, so we need to use the “Integrated (I)” concept (d parameter) to make the data stationary while building the Auto ARIMA model.

Next, I split the dataset into training and test (80%/20%) sets to build the Auto ARIMA model on the training set and forecast using the test dataset

train=df[:114]
test=df[-30:]
plt.plot(train)
plt.plot(test)

Then, we build the Auto ARIMA model by using pmdarima’s auto_arima() function. Using the auto_arima() function calls for small p,d,q values which represent non-seasonal components and uppercase P,D,Q values which represent seasonal components. Auto_arima() is similar to other hyperparameter tuning methods, and is determined to find the optimal values for p,d,q using different combinations. The final p,d,q values are determined with lower AIC and BIC parameters taken into consideration.

model=auto_arima(train,start_p=0,d=1,start_q=0,
max_p=5,max_d=5,max_q=5, start_P=0,
D=1, start_Q=0, max_P=5,max_D=5,
max_Q=5, m=12, seasonal=True,
error_action='warn',trace=True,
supress_warnings=True,stepwise=True,
random_state=20,n_fits=50)

We can view the model summary:

Next, we can using the trained model to forecast the number of airline passengers on the test set and create a visualization.

prediction = pd.DataFrame(model.predict(n_periods = 30),index=test.index)
prediction.columns = ['predicted_passengers']
plt.figure(figsize=(8,5))
plt.plot(train,label="Training")
plt.plot(test,label="Test")
plt.plot(prediction,label="Predicted")
plt.legend(loc = 'upper left')
plt.savefig('SecondPrection.jpg')
plt.show()

The Auto ARIMA model gave us a r² value of 0.65 — this model did a much better job at capturing the trend in the data compared to my first implementation of the ARIMA model.

Read More …

[ad_2]


Write a comment