Definition
Our objective is to explore the effect of regressor on the response variable . In this context we define our model as the following:
Here, y is the dependent and observable variable.
All of our assumptions are based on the , which is the error term in the model. It is also called as residual. Our assumptions are:
- ,
Observe that is a function of a random variable, . Thus itself is a random variable.
In the case of SLR we have one scalar response variable and one regressor. We define our model as following:
One can interpret as intercept and as the slope of the fitted line.
At the time we construct our model, we do not know parameters and and our aim is to estimate these from the data using Least Squares Estimation.
Model Analysis
Estimation by LSE
We estimate , and . To do so we say that data is paired as . Observe that
We define sum of squared errors as
LSE for
Derivative of a sum is sum of the derivatives.
LSE for
Substitute
where
LSE for
Here is the degrees of freedom.
Distribution of least squares estimates
We have estimated , and using random samples from the data. Thus, they are random variables too.
Since the LSE is BLUE we have:
- Then
- Then
t-values
Goodness-of-fit
Question
We question whether the data match the model or not.
One would say that the model is a good fit for the data if
Thus
Sums of squares
- , total deviation in .
- , sum of residuals.
- , deviation caused by regression.
If then .
Coefficient of determination
It is defined as:
represents the share due to in total variation in . So means that 95% variation in is due to and 5% of the variation is due to model residuals. Such a model is considered as a good fit.
Observe that Thus we say
- model is a good-fit
- model may not be a poor-fit. You need to conduct goodness-of-fit test.
Goodness-of-fit Test
Hypothesis
: , means that has no effect on . : , has effect on . We test this hypothesis with ANOVA.
We construct our ANOVA table:
Source | DoF | Sum of Squares | Mean of SS | |
---|---|---|---|---|
Regression | 1 | |||
Error | ||||
Total | ||||
Thus one can deduct that: |
Usual (significance level) values are:
- 0.01
- 0.05
- 0.10
Rule of decision
Let
- Reject if . Thus conclude that has statistically significant effect on at significance level.
- Fail to reject if . Therefore say that there is not enough evidence to state the effect of on .