In Econometric and
statistics, ordinary least squares (OLS) or
linear least squares is a method for estimating the unknown
parameters($\beta$) in a linear regression model. The goal is
to minimize the differences ($\varepsilon \text{ or e}$) between
the observed values and predicted values with the use of linear approximation. In this post i make an effort to derive the OLS solution ($\beta$) with Matrices
From above representation we can infer that the size of Y (Dependent variable) is N by 1, size of X (Independent variable) is N by K and size of e (errors of prediction) is N by 1. This gives us a clue that size of Beta (B) is K by 1 i.e Beta has K row and 1 column.
The whole purpose of this post is to derive the beta from the structure by minimizing the square of residuals in matrix notation
The whole purpose of this post is to derive the beta from the structure by minimizing the square of residuals in matrix notation
(Step: 1) Because we can stack everything together, we can write the equation (no matter how many explanatory variables we have) in the form: \[y=x\beta +\varepsilon\].
The equation in-turn can be written as: \[\text{y=xb+e}\] Where, e=Residual and XB is OLS estimator.
(Step 2) We know that the error term 'e' is \[\begin{align} & e=y-{{y}^{\wedge }}=y-xb \\ & \\ \end{align}\] (y is the fitted value and y^ is the predicted, and we minimize sum of residual squares).
Note: ‘e’ is a scalar and it has N by 1 dimension matrix, and 'e' transpose has 1 by N dimension.
So the square of residuals is ${{e}^{2}}={{e}^{T}}*e$, and it has 1 by 1 dimension.
(Step 3) We represent minimized sum of square residuals as $Minimize\text{ S(b)= }\sum{{{e}_{i}}^{2}}={{e}^{T}}*e$ , and after we include the value of e and e(Transpose) it looks like: $={{[y-xb]}^{T}}*[y-xb]$
Note: ($e^{T}={{[y-xb]}^{T}}and $ e= [y-xb])
We need to solve this further with the help of matrix rule which states that: ${{(A+B)}^{T}}={{A}^{T}}+{{B}^{T}}$ and ${{(A*B)}^{T}}={{B}^{T}}*{{A}^{T}}$
(Step 4) Working out with the rule, we progress further and expand ((Step 3) as follows: $\begin{align} & =[{{y}^{T}}-{{x}^{T}}{{b}^{T}}][y-xb] \\ & ={{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb \\ \end{align}$
This means that sum of square residuals $S={{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb$
(Step 5) Now, inorder to minimize the sum of square residuals we take first order derivative with respect to 'b'
\[\frac{\partial S(b)}{\partial b}=\frac{\partial ({{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb)}{\partial b}\]
(Step 6) First order condition leaves us with:
\[\begin{align} & \frac{\partial S(b)}{\partial b}=\frac{\partial ({{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb)}{\partial b} \\ & \frac{\partial S(b)}{\partial b}=0-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}}+2{{x}^{T}}xb \\ & \frac{\partial S(b)}{\partial b}=2{{x}^{T}}xb-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}} \\ \end{align}\]
Now, we make the term equal to zero as per first order condition of minimization to get expression for 'b'.
(Step 4) Working out with the rule, we progress further and expand ((Step 3) as follows: $\begin{align} & =[{{y}^{T}}-{{x}^{T}}{{b}^{T}}][y-xb] \\ & ={{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb \\ \end{align}$
This means that sum of square residuals $S={{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb$
(Step 5) Now, inorder to minimize the sum of square residuals we take first order derivative with respect to 'b'
\[\frac{\partial S(b)}{\partial b}=\frac{\partial ({{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb)}{\partial b}\]
There
is a rule for derivatives of matrix which states that:
\[(i)\frac{\partial
Ax}{\partial x}={{A}^{'}}\text{ ; (ii)}\frac{\partial {{x}^{'}}Ax}{\partial
x}=2Ax\text{ (If }\!\!'\!\!\text{ A
}\!\!'\!\!\text{ is symetric) and =
(}{{\text{A}}^{'}}+A)x\text{ (If
}\!\!'\!\!\text{ A }\!\!'\!\!\text{
is not symetric)}\]
Symmetric: If A is equal to A (Transpose) then A is symmetric.(Step 6) First order condition leaves us with:
\[\begin{align} & \frac{\partial S(b)}{\partial b}=\frac{\partial ({{y}^{T}}y-{{b}^{T}}{{x}^{T}}y-{{y}^{T}}xb+{{b}^{T}}{{x}^{T}}xb)}{\partial b} \\ & \frac{\partial S(b)}{\partial b}=0-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}}+2{{x}^{T}}xb \\ & \frac{\partial S(b)}{\partial b}=2{{x}^{T}}xb-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}} \\ \end{align}\]
Now, we make the term equal to zero as per first order condition of minimization to get expression for 'b'.
\[\begin{align}
& 2{{x}^{T}}xb-{{x}^{T}}y-{{({{y}^{T}}x)}^{T}}=0 \\
& or\text{ }2{{x}^{T}}xb-{{x}^{T}}y-{{x}^{T}}y=0 \\
& or\text{ }2{{x}^{T}}xb={{x}^{T}}y+{{x}^{T}}y \\
& or\text{ }2{{x}^{T}}xb=2{{x}^{T}}y \\
& or\text{ }{{({{x}^{T}}x)}^{-1}}{{x}^{T}}xb={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y \\
& or\text{ }Ib={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y \\
& \therefore b={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y \\
\end{align}\]
(Step 7) Now we check
the second order condition of minimization, where the result must be greater than zero:
\[\frac{{{\partial
}^{2}}S(b)}{\partial b\partial {{b}^{'}}}=2{{x}^{T}}x\text{ (The expression is a Positive
definite)}\]
Since $2{{x}^{T}}x$ is greater than zero, we confirm that
the expression derived for beta i.e $b={{({{x}^{T}}x)}^{-1}}{{x}^{T}}y$ shall
give value of coefficient that minimizes the sum of the squared errors of
prediction when y is regressed on x.
Next post will focus on OLS solution properties (With Matrices)
Next post will focus on OLS solution properties (With Matrices)
No comments:
Post a Comment