AboveIndex

About - Blog - Contact

Calculating Stock Beta, Alpha and R^2 using Pandas and Statsmodels

According to investopedia Beta is a measure of a volatility of an asset (stock or a portfolio) in relation to the overall market. Beta is a main part of the capital asset pricing model (CAPM) which calculates expected return of an asset, follow the links if you wanna learn more about CAPM and Beta.

In this article we're going to implement a way to calculate Beta using Pandas and Numpy

so, How do we calculate Beta?

Beta is simply calculated using regression analysis, which means we're going to create a scatter plot of the asset (stock) return vs the market return, fit a line to it, the slope of that line is the asset beta.

Relax though, we're going to use tools like pandas and statsmodels to do most of the heavy lifting here.

Let's get started by importing Numpy, Pandas and statsmodels , plus matplotlib to generate visualization and DataReader to grab the data from Yahoo Finance

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# statsmodels is used to do OLS regression
import statsmodels.api as sm

# we're using yahoo finance data, pandas datareader will import the data we need
from pandas_datareader.data import DataReader

preparing the data

let's load the data from Yahoo Finance, resample it down to monthly prices and calculate the monthly returns

start_date = "2013-12-01"
symbols = ["MSFT", "^GSPC"]
# df is the main dataframe that'll hold the Adjusted closing prices
df = pd.DataFrame()

for symbol in symbols:
    dftemp = DataReader(symbol,"yahoo",start_date)
    # we only need the adjusted close price.
    df[symbol] = dftemp["Adj Close"]

# lets take a look,
print df.head()
                 MSFT        ^GSPC
Date                              
2013-12-02  35.460313  1800.900024
2013-12-03  35.331199  1795.150024
2013-12-04  35.912211  1792.810059
2013-12-05  35.045302  1785.030029
2013-12-06  35.377311  1805.089966

Seems Legit, moving on to resampling

# resample the data from daily to monthly
dfm = df.resample('M').last()
# print "dfm:\n %s " % dfm.head()
# this is equal to the 2 lines above. uncomment if you wanna test that.
# dfm2 = df.asfreq('M',method='ffill')
# print "dfm2:\n %s" % dfm2.head()

# calculating monthly returns
dfmret = dfm.pct_change()
dfmret = dfmret.dropna()
print dfmret.head()
                MSFT     ^GSPC
Date                          
2014-01-31  0.011494 -0.035583
2014-02-28  0.020013  0.043117
2014-03-31  0.069956  0.006932
2014-04-30 -0.014394  0.006201
2014-05-31  0.020515  0.021030

Now onto the fun part.

Visualizing The data

This isn't necessary but it helped me understand the concept better, We'll use a scatter plot to look at the Asset returns compared to the market index

# create a plot 15 by 6
fig, ax1 = plt.subplots(1,figsize=(15,6))
ax1.scatter(dfmret["^GSPC"],dfmret.iloc[:,0],label="monthly returns", color='blue', edgecolors='none', alpha=0.7)
ax1.grid(True)

ax1.set_xlabel("S&P500 Monthly Returns")
ax1.set_ylabel("%s Monthly Returns" % (dfmret.columns[0]))

plt.show()

png

So far, so good. Now that we have this, we can fit a line to it, calculate its slope and we got our Beta, To do so we need to use a linear regression model, I usually use scikit-learn for my linear regression needs, but for today, lets use statsmodels which integrates nicely with Pandas, it was a part of it as a matter of fact.

Fitting the Linear Model

The idea is pretty straight forward, the model Takes X and Y values and returns a line that best fits these values. we can then compute the beta and alpha from the generated line.

Y could also be a formula for the line, but luckily an ndarray works fine.

first, we gotta add a constant line, called the intercept, it's a simple array of 1.0

# create the X we're gonna feed to the model
# dfmret.iloc[:,1] is slicing to include all the rows ":" in the second column
# then we add the constant.
X = sm.add_constant(dfmret.iloc[:,1])
# y is the values of the monthly returns of the stock = the first column
y = dfmret.iloc[:,0]

# creating the model
model = sm.OLS(y,X).fit()

# lets see what we got so far
print model.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   MSFT   R-squared:                       0.361
Model:                            OLS   Adj. R-squared:                  0.342
Method:                 Least Squares   F-statistic:                     19.23
Date:                Tue, 03 Jan 2017   Prob (F-statistic):           0.000106
Time:                        01:10:22   Log-Likelihood:                 54.239
No. Observations:                  36   AIC:                            -104.5
Df Residuals:                      34   BIC:                            -101.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.0110      0.009      1.172      0.249        -0.008     0.030
^GSPC          1.3264      0.302      4.385      0.000         0.712     1.941
==============================================================================
Omnibus:                        5.515   Durbin-Watson:                   2.498
Prob(Omnibus):                  0.063   Jarque-Bera (JB):                4.536
Skew:                           0.511   Prob(JB):                        0.104
Kurtosis:                       4.407   Cond. No.                         32.9
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The Regression results tells us alot, including Beta, Alpha and R^2 , beta is the market coefficient or ^GSPC coef , alpha is the constant coef or const coef since it's the distance between X axis and the line.

now let's see how the line looks when plotted to the scatter plot

# THATS IT, here is beta
beta = model.params["^GSPC"]
alpha = model.params["const"]
rsqr = model.rsquared

# adding the line to the plot

# From the earlier step
fig, ax1 = plt.subplots(1,figsize=(15,6))
ax1.scatter(dfmret["^GSPC"],dfmret.iloc[:,0],label="monthly returns", color='blue', edgecolors='none', alpha=0.7)
ax1.grid(True)
ax1.set_xlabel("S&P500 Monthly Returns")
ax1.set_ylabel("%s Monthly Returns" % (dfmret.columns[0]))

# create X points of the line, by using the min and max points to generate sequence
line_x = np.linspace(dfmret.iloc[:,0].min(), dfmret.iloc[:,0].max())

# generate y points by multiplying x by the slope
ax1.plot(line_x, line_x*model.params["^GSPC"], color="red", label="beta")

# add legend
ax1.legend(loc='upper center', ncol=2, fontsize='large')
# add Beta and R2 stats to plot
ax1.text(-0.12, 0.12, 'Beta: %.2f' % beta, fontsize=12)
ax1.text(-0.12, 0.09, 'Alpha: %.2f' % alpha, fontsize=12)
ax1.text(-0.12, 0.06, 'r^2: %.2f' % rsqr, fontsize=12)


plt.show()

png

Rolling Beta, Alpha and R^2

using the same code we can generate a rolling stats for the stock, It's useful to look at the moving stats to see how relation to the market changed over time.

def get_monthly_returns(symbols,start_date):
    # df is the main dataframe that'll hold the Adjusted closing prices
    df = pd.DataFrame()

    for symbol in symbols:
        dftemp = DataReader(symbol,"yahoo",start_date)
        # we only need the adjusted close price.
        df[symbol] = dftemp["Adj Close"]

    dfm = df.resample('M').last()
    dfm_returns = dfm.pct_change()
    return dfm_returns
# wrapping up the code in a simple function
def calc_stats(df):
    '''
        requires a DataFrame of monthly returns where
        first column is the stock
        second column is the market index
    '''
    X = sm.add_constant(df.iloc[:,1])
    model = sm.OLS(df.iloc[:,0],X).fit()

    beta = model.params["^GSPC"]
    alpha = model.params["const"]
    r2 = model.rsquared
    return beta, alpha, r2
def rolling_stats(df, window=5):
    #     dataframe to hold the results
    res = pd.DataFrame(index=df.index)

    for i in xrange(0,len(df.index)):

        if len(df) - i >= window:
            # break the df into smaller chunks
            chunk = df.iloc[i:window+i,:]
            # calc_stats is a function created from the code above,
            # refer to the Gist at the end of the article.
            beta,alpha,r2 = calc_stats(chunk)
            res.set_value(chunk.tail(1).index[0],"beta",beta)
            res.set_value(chunk.tail(1).index[0],"alpha",alpha)
            res.set_value(chunk.tail(1).index[0],"r2",r2)
            # print "%s beta: %.4f \t alpha: %.4f" % (chunk.tail(1).index[0],b,a)
    res = res.dropna()
    return res
#testing the rolling stats
dftest = get_monthly_returns(["AAPL","^GSPC"],"2008-01-01")
df_rolling = rolling_stats(dftest, window=10)

print df_rolling.head(20)

# plotting the rolling beta
ax = df_rolling[["beta"]].plot(title="AAPL Rolling Beta")
ax.grid(True)
ax.legend(loc='upper left', ncol=2, fontsize='small')
plt.show()
                beta     alpha        r2
Date                                    
2008-11-30  1.673596  0.042146  0.471210
2008-12-31  1.527930  0.029536  0.414000
2009-01-31  1.248017  0.020263  0.316976
2009-02-28  0.675371 -0.016330  0.115726
2009-03-31  0.931183  0.001146  0.255890
2009-04-30  1.112707  0.021580  0.385105
2009-05-31  1.126251  0.027955  0.421310
2009-06-30  1.120403  0.027354  0.416387
2009-07-31  0.901389  0.053803  0.546569
2009-08-31  1.034426  0.042763  0.513715
2009-09-30  0.815390  0.059506  0.436485
2009-10-31  0.809741  0.071444  0.657997
2009-11-30  0.950798  0.056032  0.690917
2009-12-31  1.483326  0.027239  0.759880
2010-01-31  1.670954  0.013031  0.804248
2010-02-28  1.501126  0.014937  0.725170
2010-03-31  1.674662  0.016668  0.752001
2010-04-30  1.676677  0.020403  0.681627
2010-05-31  1.142485  0.036008  0.559702
2010-06-30  1.198003  0.040853  0.650188

png

Thats it for now, Feel free to leave a tweet If you have a better way of implementing this or if I made a mistake.

Here's the Gist of the full script

Happy Trading

references

Get in touch with the author of this post: @ya7ya