BusiTelCe » Artificial Intelligence » Predict Stock Price with Multiple Regression and R

# Predict Stock Price with Multiple Regression and R

Plethora of study has been done to forecast a stock price using predictive algorithms and other statistical techniques. As a novice in the field of machine learning, I was curious to see to how a stock price can be predicted using multiple regression.

For this, I have pulled some data from nseindia.com and then processed these to suit my needs. This page has the quick summary of my study and findings.

## What did I expect before starting the study?

My intention was to find out key variables of the stock that can help to predict the stock price for the next day. I anticipated that stock price is very much depended on below 3 variables:

• India volatality index  – This index is maintained by NSE and tells overall stock market volatility.
• Delivery percentage of the stock – This tells what percentage of stocks traded is not for intraday
• Overall volume of the stock – Number of stocks being traded, typically an increasing volume should confirm the trend

Lets check out how much of this understanding comes true as per the model.

## Data sample from NSE, India

I have taken 3 different datasets to do the analysis. Data is extracted for the two years 2015 and 2016.

• HINDALCO stock data
• NIFTY index data
• Volatility index VIX data

Above 3 sets of data is used to derive a combined data set with most relevant variables based on stock market domain knowledge.

• Date – Date on which all the values are applicable
• HINDALCO_NextDayclose – HINDALCO stock closing price for the NEXT day compared to date value
• IndiaVIX_Close – VIX index closing price for the day as in date
• NIFTY_Change – Change in NIFTY index price compared to previous day
• Price_5SMA – 5 day simple moving average for HINDALCO closing price
• Price_14RSI – 14 day relative strength index(RSI) based on HINDALCO closing price
• Volume_5SMA – 5 day simple moving average for HINDALCO closing volume
• Volume_14RSI – 14 day relative strength index(RSI) based on HINDALCO closing volume
• Delivery_Percent_3SMA – Percentage of HINDALCO stock delivered as compared to number of stocks traded.

### R model using multiple regression

Below is the model result in R with all the independent variables

If you observer the model output, some facts are obvious:

• IndiaVIX negatively influences stock price. That seems to be correct because increase in volatility is observed with decrease in stock and index values. However, IndiaVIX variable is not a significant one for our prediction.
• NIFTY_change, Price_5SMA, and Price_14RSI are significant variables for us.

Further refining this initital model led me to final model where all the variables are significant and R square value is highest. Final model R output is as below:

Final model is built by considering most significant independent variables

• NIFTY_Change
• Price_5SMA
• Price_14RSI

Overall regression accuracy is quite high. Adjusted R square is “0.9871”. That means 98.71% of HINDALCO closing price can be explained by the variance in the input variables. F stat value of 6081 is quite high and indicates that model is significantly good.

## RESULTS: How accurate is this model to predict?

If we plot the actual and predicted values on validation data set(as in the diagram below), it looks that the values are very close. This is good for the model.