Tuesday, January 2, 2018

Deriving OLS with Matrices

In Econometric and statistics, ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters($\beta$)  in a linear regression model. The goal is to minimize the differences ($\varepsilon \text{ or e}$) between the observed values and predicted values with the use of linear approximation. In this post i make an effort to derive the OLS solution ($\beta$) with Matrices

We know that Simple OLS regression looks like $y={{\beta }_{1}}+{{\beta }_{2}}X+\varepsilon$. Suppose we have ‘n’ numbers of observations and ‘k’ numbers of explanatory variables. If we convert the equation into matrix form, it looks like:



From above representation we can infer that the size of Y (Dependent variable) is N by 1, size of X (Independent variable) is N by K and size of e  (errors of prediction) is N by 1. This gives us a clue that size of Beta (B) is K by 1 i.e  Beta has K row and 1 column.

The whole purpose of this post is to derive the beta from the structure by minimizing the square of residuals  in matrix notation 

(Step: 1) Because we can stack everything together, we can write the equation (no matter how many explanatory variables we have) in the form: \[y=x\beta +\varepsilon\].

The equation in-turn can be written as: \[\text{y=xb+e}\] Where, e=Residual and XB is OLS estimator.

(Step 2) We know that the error term 'e' is  \[\begin{align} & e=y-{{y}^{\wedge }}=y-xb \\ & \\ \end{align}\] (y is the fitted value and y^ is the predicted, and we minimize sum of residual squares).

Note: ‘e’ is a scalar and it has N by 1 dimension matrix, and 'e' transpose has 1 by N dimension.

So the square of residuals is ${{e}^{2}}={{e}^{T}}*e$, and it has 1 by 1 dimension.

(Step 3) We represent minimized sum of square residuals as  $Minimize\text{ S(b)= }\sum{{{e}_{i}}^{2}}={{e}^{T}}*e$ , and after we include the value of e and e(Transpose) it looks like: $={{[y-xb]}^{T}}*[y-xb]$

Note: ($e^{T}={{[y-xb]}^{T}}and $ e= [y-xb])

We need to solve this further with the help of matrix rule which states that: ${{(A+B)}^{T}}={{A}^{T}}+{{B}^{T}}$ and ${{(A*B)}^{T}}={{B}^{T}}*{{A}^{T}}$

(Step 4) Working out with the rule, we progress further and expand ((Step 3) as follows: $\begin{align} & =[{{y}^{T}}-{{x}^{T}}{{b}^{T}}][y-xb] \\ & ={{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb \\ \end{align}$

This means that sum of square residuals $S={{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb$

(Step 5) Now, inorder to minimize the sum of square residuals we take first order derivative with respect to 'b'

\[\frac{\partial S(b)}{\partial b}=\frac{\partial ({{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb)}{\partial b}\]

There is a rule for derivatives of matrix which states that:

\[(i)\frac{\partial Ax}{\partial x}={{A}^{'}}\text{ ; (ii)}\frac{\partial {{x}^{'}}Ax}{\partial x}=2Ax\text{ (If  }\!\!'\!\!\text{ A }\!\!'\!\!\text{  is symetric) and = (}{{\text{A}}^{'}}+A)x\text{ (If  }\!\!'\!\!\text{ A }\!\!'\!\!\text{  is not symetric)}\] 
Symmetric: If A is equal to A (Transpose) then A is symmetric.

(Step 6) First order condition leaves us with:
 \[\begin{align} & \frac{\partial S(b)}{\partial b}=\frac{\partial ({{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb)}{\partial b} \\ & \frac{\partial S(b)}{\partial b}=0-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}}+2{{x}^{T}}xb \\ & \frac{\partial S(b)}{\partial b}=2{{x}^{T}}xb-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}} \\ \end{align}\]

Now, we make the term equal to zero as per first order condition of minimization to get expression for 'b'.
\[\begin{align} & 2{{x}^{T}}xb-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}}=0 \\ & or\text{ }2{{x}^{T}}xb-{{x}^{T}}y-{{x}^{T}}y=0 \\ & or\text{ }2{{x}^{T}}xb={{x}^{T}}y+{{x}^{T}}y \\ & or\text{ }2{{x}^{T}}xb=2{{x}^{T}}y \\ & or\text{ }{{({{x}^{T}}x)}^{-1}}{{x}^{T}}xb={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y \\ & or\text{ }Ib={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y \\ & \therefore b={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y \\ \end{align}\]

(Step 7) Now we check the second order condition of minimization, where the result must be greater than zero:


\[\frac{{{\partial }^{2}}S(b)}{\partial b\partial {{b}^{'}}}=2{{x}^{T}}x\text{ (The expression is a Positive definite)}\]

Since $2{{x}^{T}}x$ is greater than zero, we confirm that the expression derived for beta i.e $b={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y$ shall give value of coefficient that minimizes the sum of the squared errors of prediction when y is regressed on x.

Next post will focus on OLS solution properties (With Matrices)

No comments:

Post a Comment