Predicting NSE Stock Price by Using Multiple Regression Technique with R

Plethora of study has been done to forecast a stock price using predictive algorithms and other statistical techniques. As a novice in the field of machine learning, I was curious to see to how a stock price can be predicted using multiple regression.

For this, I have pulled some data from nseindia.com and then processed these to suit my needs. This page has the quick summary of my study and findings. 

What did I expect before starting the study?

My intention was to find out key variables of the stock that can help to predict the stock price for the next day. I anticipated that stock price is very much depended on below 3 variables:

  • India volatality index  - This index is maintained by NSE and tells overall stock market volatility.
  • Delivery percentage of the stock - This tells what percentage of stocks traded is not for intraday
  • Overall volume of the stock - Number of stocks being traded, typically an increasing volume should confirm the trend 

Lets check out how much of this understanding comes true as per the model.

Data sample from NSE, India

I have taken 3 different datasets to do the analysis. Data is extracted for the two years 2015 and 2016.

  • HINDALCO stock data
  • NIFTY index data
  • Volatility index VIX data

Above 3 sets of data is used to derive a combined data set with most relevant variables based on stock market domain knowledge.

  • Date – Date on which all the values are applicable
  • HINDALCO_NextDayclose – HINDALCO stock closing price for the NEXT day compared to date value
  • IndiaVIX_Close – VIX index closing price for the day as in date
  • NIFTY_Change – Change in NIFTY index price compared to previous day
  • Price_5SMA – 5 day simple moving average for HINDALCO closing price
  • Price_14RSI – 14 day relative strength index(RSI) based on HINDALCO closing price
  • Volume_5SMA - 5 day simple moving average for HINDALCO closing volume
  • Volume_14RSI - 14 day relative strength index(RSI) based on HINDALCO closing volume
  • Delivery_Percent_3SMA – Percentage of HINDALCO stock delivered as compared to number of stocks traded.

R model using multiple regression

Below is the model result in R with all the independent variables

Multiple regression output from R code

If you observer the model output, some facts are obvious:

  • IndiaVIX negatively influences stock price. That seems to be correct because increase in volatility is observed with decrease in stock and index values. However, IndiaVIX variable is not a significant one for our prediction.
  • NIFTY_change, Price_5SMA, and Price_14RSI are significant variables for us.

Further refining this initital model led me to final model where all the variables are significant and R square value is highest. Final model R output is as below:

Final model is built by considering most significant independent variables

  • NIFTY_Change
  • Price_5SMA
  • Price_14RSI

Final output from R model

Overall regression accuracy is quite high. Adjusted R square is “0.9871”. That means 98.71% of HINDALCO closing price can be explained by the variance in the input variables. F stat value of 6081 is quite high and indicates that model is significantly good.

RESULTS: How accurate is this model to predict?

If we plot the actual and predicted values on validation data set(as in the diagram below), it looks that the values are very close. This is good for the model.

Stock Price: Actual vs Predicted

Did my initial expectation proved to be true?

Answer is No.

It turned out that Volume_5SMA, Volume_14RSI and Delivery_Percent_3SMA are not significant variables for the final model. This is contrary to common understanding.

However, this result may need further verification by back-testing with larger data set. Also, another possibility is that 5 day SMA variable may already contains the impact from these variables.

Scope for enhancing the model

Though the model has done a good job in analyzing stock price, business should also consider further enhancing the model as below.

  • This model is built by considering closing price. We can also do another model by considering opening price. This will help to build a prediction of stock price with a range – this is much more helpful for making trading decisions.
  • This model(or enhanced model) needs to be tested on larger sets of data for multiple stocks. That will help to fine tune the model accuracy.
  • It would be interesting to vet this model by adding few more technical indicators such as – EMA(exponential moving average), MACD(moving average conversion diversion), and ADX (average directional index).