EFR: Statistical Properties of OLS

Our statistical model is restricted to linearity (in parameters ).

The model we have is $y={{\beta }_{1}}+{{\beta }_{2}}X+\varepsilon$

As stated in the earlier post, we can stack everything together, and write the equation (no matter how many explanatory variables we have), in the matrix form: $y=x\beta +\varepsilon$. The matrix representation shows that the regression analysis is conditional upon value of 'x'

'Y' and 'X' are observed, 'beta' is fixed unknown parameter and '$\varepsilon $' is observed random term .

The reason we use OLS is that, under assumptions A1- A7 of the classical linear regression model, the OLS have several desirable statistical properties. This post examines these desirable statistical properties of OLS under seven assumptions:

Seven assumptions of the model:

A1: Fixed Regressor: It means that all the elements of matrix 'x' with dimension N times K are fixed (or we can say it is non-stochastic/non random/deterministic). Further, assumption also requires that N must be greater than K (N is number of observations and K is number of co-efficient), and matrix N*K is a full rank (Invertible).

Violations of these two conditions can have errors in variables in following forms:

a) Auto Regression

b) Simultaneous Equation

c) Perfect Multicollinearity

A2: Random Disturbance zero Mean: It means that expectation of random term is equal to zero or
$E\left[ \varepsilon \right]=0\text{ or }E\left[ \varepsilon \right]=0$.

A3: Homoscedasticity: (Homo= Same and Scedas= Spread): It means that variance of the error terms is equal to sigma square. $Var({{\varepsilon }_{i}})=\text{ E(}\varepsilon {{\varepsilon }^{T}})={{\sigma }^{2}}$.

The variance is similar across range of values of independent variable.

Violation of A3 leads to Heteroscedasticity : It means that the variability of variable is unequal across the range of values.

We can use assumptions A2 $E\left[ \varepsilon \right]=0\text{ or }E\left[ \varepsilon \right]=0$ with A3 to get more insight regarding the property $Var({{\varepsilon }_{i}})=\text{ E(}\varepsilon {{\varepsilon }^{T}})={{\sigma }^{2}}$ :

Step1 is formula of variance (actual subtract to mean)
Step2 is rule of decomposition
Step 3 we replace the value of expectation of $E(\varepsilon )$from A2: which states that $E\left[ \varepsilon \right]=0\text{ or }E\left[ \varepsilon \right]=0$

A4: No Auto-correlation: The Covariance between epsilon(i) and epsilon (j) is zero or $Cov({{\varepsilon }_{i}}{{\varepsilon }_{j}})=0$

In other words, error of 'i'th variable and 'j' variable doesn't change together.

A5: Constant Parameters : This means ${{\beta }_{k*1}}\text{ and }\varepsilon $ are fixed and unknown

A6: Linear Model : As stated earlier, it is linear in parameter. Basically it is saying that 'y' is generated through a process $y={{\beta }_{1}}+{{\beta }_{2}}X+\varepsilon$ (DGP= Data generating process)

A7: Normality: The distributions of the error term '$\varepsilon $' is normal

Now if we combine assumptions A2 (Random Distribution) +A3 (Homoscedasticity) +A4 (No Auto-Correlation)+A7 (Normality) we get '$\varepsilon \sim N(0\text{ , }{{\sigma }^{2}}{{I}_{N*N}})$'

In Econometrics; the assumptions are just like a catalogue , one has to refer the catalogue and if there is any violation (of the assumptions) then we must treat the violation with specific technique.

EFR

Labels

Wednesday, January 10, 2018

Statistical Properties of OLS

No comments:

Post a Comment