Is Gold a Good Investment? Gold Price Prediction Using Machine Learning Techniques
[ad_1]
This weblog put up is tailored from a capstone venture created by Aihui Ong for Springboard’s Data Science Career Track. This put up initially appeared on Aihui’s Medium page.
Is gold a superb funding typically?
SPDR Gold Trust (GLD) exchange-traded fund (ETF) tracks the worth motion of gold and is an economical and handy approach to spend money on gold with out shopping for the true gold.
I’ve owned GLD since 2011 and the worth then was $151. Actually, at the moment, I used to be nonetheless new to investing and all I examine was to spend money on gold to diversify and recession-proof my portfolio and it’s a secure haven to guard myself in opposition to a attainable disaster. Clearly, it hasn’t panned out that method!
To promote or to not promote?
From the graph under, GLD worth has declined since 2011. So, ought to I promote and minimize my losses or ought to I maintain on? After I requested myself this query, it was in November 2019 and the worth of GLD then was $138. I’ve determined to take a extra data-driven method to reply this query.
Whether or not to promote or to carry GLD, let’s see if gold remains to be a superb funding utilizing a monetary evaluation method. If not, it could be higher to chop my losses and spend money on a higher-growth funding.
Previous 15-year efficiency
Let’s evaluate the efficiency of GLD vs. S&P500 index (e.g. SPY) for the previous 15 years from 2004 to 2019.
GLD began buying and selling in 2004. In case you’ve invested $10,000 in each GLD and S&P 500 in 2004, that is what you’ll have made 15 years later.
Previous 5-year efficiency
In case you had invested $10,000 in each GLD and S&P 500 in 2015, that is what you’ll have made 5 years later in 2019.
GLD vs. SPY: 10-year danger evaluation
SPY ETF is likely one of the hottest funds that goals to trace the S&P 500 Index.
So: is gold nonetheless a superb funding?
Primarily based on this evaluation, gold is a unstable funding. The efficiency evaluation has proven a giant swing in positive aspects relying on whenever you’ve invested in GLD. Investing 15 years in the past would have a CAGR of seven.69%, whereas investing 5 years in the past, the CAGR would have dropped to 2.97%, that’s a 61% drop!
The Sharpe Ratio which measures the common return earned in extra of the risk-free charge can be a lot decrease than SPY, which signifies it’s a riskier funding than SPY.
Utilizing ARIMA and Fb Prophet to Predict the Value of Gold
GLD Information Set
Subsequent, let’s predict the long run worth of gold utilizing a extra knowledge science method. The historic costs of SPDR® Gold Shares (NYSE Arca : GLD) had been downloaded from Yahoo. Information spans from the inception of this share from 11/18/2004 to the date of obtain, 11/22/2019.
Exploratory Information Evaluation
The info body reveals that the information doesn’t have inventory costs for each date. That’s as a result of the inventory market is closed on weekends and holidays. We have to fill within the lacking days to make this dataset a really day by day time collection knowledge.
Ahead Fill Lacking Information
For days the place there isn’t a pricing info, we re-sample the Day and fill within the lacking values from the day gone by.
ffill_data = ffill_data.resample(“D”).ffill().reset_index()
Is the Time Sequence Information Stationary?
Stationarity is vital in time collection evaluation as a result of stationary processes are simpler to investigate. Here’s an article that explains the details. To test if the information is stationary, run a Dicky-Fuller check on the Open Value.
adfuller_result = adfuller(knowledge['Open'])
print('ADF Statistic: ', adfuller_result[0])
print('p-value: ', adfuller_result[1])
ADF Statistic: -1.8025174981713052
p-value: 0.3792125439071432
The info shouldn’t be stationary as a result of the p-value is bigger than 0.05.
Making the Time Sequence Information Stationary
There are alternative ways to make time-series knowledge stationary. Just a few strategies embody the distinction as soon as methodology, distinction twice and sq. root methodology. Let’s use all Three and choose one of the best methodology. After we apply every of the strategies, we apply the Dicky-Fuller check and the outcomes are under:
The Sq. Root strategies didn’t produce a p-value lower than 0.05. So we must always remove it. Each Differencing as soon as and twice strategies produced a p-value lower than 0.05 however Differencing Twice produced a way more destructive ADF Statistic. That’s what we would like, the extra destructive the higher.
Let’s evaluate the time collection knowledge earlier than differencing twice and after.
Practice-Check Information Break up
As a result of we’re coping with time-series knowledge, we can not cut up the information randomly. The sooner knowledge ought to at all times be the coaching knowledge and the later knowledge needs to be within the check set.
There are 15 years of information. We’re going to use the primary 12 years (2004–2016) as coaching knowledge and the final Three years (2017–2019) as check knowledge.
Modeling Utilizing ARIMA and Auto ARIMA
A extensively used statistical methodology for time collection forecasting is the ARIMA (AutoRegressive Built-in Shifting Common) mannequin. An extension to ARIMA that helps the direct modeling of the seasonal part of the collection is named SARIMA.
mannequin = SARIMAX(df, order = (p,d,q))
- p = variety of autoregressive lags
- d = order of differencing
- q = variety of transferring common lags
For the SARIMAX mannequin, we’ll want to determine manually what’s p,d and q. For d, we’ve already decided differencing the information twice is the easiest way to make the information stationary. P and q will be decided by utilizing the Akaike Data Criterion (AIC) and Bayesian Data Criterion (BIC). Seek advice from the code on Github to see how that is performed.
As a substitute of manually determining the precise values for p,d and q, we are able to use Auto Arima to carry out an computerized grid search to find the optimum order for an ARIMA mannequin. It would additionally spotlight if there’s any seasonality within the knowledge.
outcomes = auto_arima(arima_data,
seasonal=True,
start_p = 1,
start_q = 1,
start_P=1,
start_Q=1,
max_P=3,
max_Q=3,
m=7, #seasonal interval
information_criterion='aic',
hint=True,
error_action='ignore',
stepwise=True)
Primarily based on the outcomes generated by auto_arima that produced the bottom AIC rating, the Greatest Match ARIMA is: order=(0, 1, 0) seasonal_order=(0, 0, 0, 7). Evaluate this to the handbook approach to derive (p,d,q) utilizing AIC, BIC and differencing, one of the best match ARIMA order=(0,2,1).
Evaluating the Outcomes Between ARIMA and Auto ARIMA
There’s not a lot distinction between the two fashions, nevertheless, the imply absolute error is decrease for Auto ARIMA. We’re going to use the Auto ARIMA mannequin to forecast future costs.
Utilizing Auto Arima to Forecast Opening Value For Final 365 Days of Coaching Information
That is the mannequin utilizing the Greatest Match ARIMA order and used it to foretell the open worth of gold ETF for the final 365 days of the coaching knowledge. We then in contrast the prediction in opposition to the true costs for the final 365 days of the coaching set.
The forecast (crimson line) aligns very effectively with the true values (blue line) for the final 365 days of the coaching knowledge and it falls throughout the confidence intervals (pink space).
MAE, MSE and RMSE Scores
- Imply Absolute Error (MAE): 0.63
- Root Imply Squared Error (RMSE): 1.02.
Our mannequin forecasted the common day by day open worth within the coaching set is inside $1.02 of the true open costs.
Utilizing Auto ARIMA to Forecast Opening Value and Evaluate with Check Information
Let’s use the fitted mannequin utilizing auto_arima to forecast the opening costs from 1/1/2017–11/22/2019 and evaluate the forecasted outcomes in opposition to the check knowledge set.
MAE, MSE and RMSE Scores
- Imply Absolute Error (MAE): 6.70
- Root Imply Squared Error (RMSE): 8.02
The forecasted knowledge (crimson line) confirmed an upward development which is aligned with the check knowledge (brown line). It accurately predicted that Gold Costs will go up from 2017–2019. Additionally it is throughout the confidence interval. Nonetheless, the interval is giant (pink space). This reveals that it’s arduous to foretell the costs of gold day-to-day but it surely’s capable of predict a normal development over time.
Modeling Utilizing Fb Prophet and Evaluating to Auto ARIMA
Facebook Prophet is a process for forecasting time collection knowledge based mostly on an additive mannequin the place non-linear tendencies are match with yearly, weekly, and day by day seasonality, plus vacation results. It really works greatest with time collection which have sturdy seasonal results and a number of other seasons of historic knowledge.
By wanting on the above plot produced by Prophet, the forecasted knowledge (crimson line) reveals a flat line. The forecasted knowledge shouldn’t be aligned with the check knowledge (brown line) when the check knowledge reveals an upward development.
Prophet doesn’t appear to be as correct as ARIMA mannequin. The MAE, MSE, and RMSE of Prophet are additionally greater than the outcomes of ARIMA mannequin.
Predicting the Value of Gold ETFs For the Subsequent 2 Years
Having validated our forecast outcomes with our check knowledge, we’re going to carry out an out of pattern forecasting utilizing the auto_arima methodology. We’ve got knowledge on gold costs up until 11/22/2019. We’re going to forecast the worth of gold for the subsequent 2 years from 11/23/2019–11/21/2021.
Primarily based on earlier best-fit order mannequin really useful by auto_arima, our mannequin ought to have these parameters:
all_auto_arima_model = SARIMAX(arima_data,
seasonal=True,
order=(0,1,0),
seasonal_order=(0,0,0,7),
development='c')
We used the above mannequin to forecast 2 years out and likewise used the mannequin to foretell the worth for your complete interval from 11/18/2004–11/21/2021.
The crimson line is the prediction outcomes from 11/18/2004–11/21/2021. You’ll be able to see that predicted costs are very effectively aligned to the precise costs as proven within the black line. The two-year forecast does point out an upward development in gold worth ETFs within the subsequent 2 years.
Forecasting Conclusion
Primarily based on this evaluation, between ARIMA and Fb Prophet, ARIMA reveals a greater match between precise knowledge and predicted knowledge. The evaluation additionally helps us attain a few key takeaways.
- Within the out-of-sample forecast, the ARIMA mannequin reveals an upward development in gold costs for the subsequent 2 years, forecasting the worth of gold to be at $150.80 by 11/21/2021. That’s a 9.28% improve from the present worth of $138 (as of study date: 11/22/2019) and a compounded annual progress charge (CAGR) of 4.53%.
- Primarily based on the Sharpe Ratio evaluation, Gold has proven to be a riskier funding than the S&P500. Nonetheless, it has proven a CAGR of round 3% during the last 5 years. The ARIMA mannequin predicted a 4.53% CAGR over the subsequent 2 years.
I made a decision to not promote GLD based mostly on my evaluation above which was in November 2019. As of at present, July 10, 2020, the worth of GLD was $169, method greater than my out-of-sample prediction! This evaluation and modeling didn’t consider exterior occasions or catastrophes, like the present COVID-19 pandemic. Many traders purchase gold as a secure haven to guard themselves in opposition to a attainable disaster. As seen on this Google Pattern knowledge, the search on the time period “gold worth” peaked Mar 15–21 2020 when the inventory market plunged.
Methods to Enhance the Mannequin
There are different elements and machine studying fashions that will produce higher predictions.
- Account for exterior occasions/disaster within the mannequin
- Lengthy Brief Time period Reminiscence (LSTM)
- Excessive Gradient Boosting (XGBoost)
Hope you discovered this evaluation helpful and that can assist you get began with your individual modeling, right here’s the code on Github.
Need to study extra? Take a look at Springboard’s Data Science Career Track.
Disclaimer: This text is for leisure and academic functions ONLY. It’s not meant to be any type of monetary recommendation. I’m new to Information Science and luxuriate in utilizing my new found-skills to deal with enjoyable real-life conundrums.
Due to my mentors at Springboard Amir Ziai and Benjamin Bell for supporting me and imparting upon me your knowledge science information and knowledge.
[ad_2]
Source link