Definition

Our objective is to explore the effect of regressor on the response variable . In this context we define our model as the following:

Here, y is the dependent and observable variable.

All of our assumptions are based on the , which is the error term in the model. It is also called as residual. Our assumptions are:

  1. ,

Observe that is a function of a random variable, . Thus itself is a random variable.

In the case of SLR we have one scalar response variable and one regressor. We define our model as following:

One can interpret as intercept and as the slope of the fitted line.

At the time we construct our model, we do not know parameters and and our aim is to estimate these from the data using Least Squares Estimation.

Model Analysis

Estimation by LSE

We estimate , and . To do so we say that data is paired as . Observe that

We define sum of squared errors as

LSE for

Derivative of a sum is sum of the derivatives.

LSE for

Substitute

where

LSE for

Here is the degrees of freedom.

Distribution of least squares estimates

We have estimated , and using random samples from the data. Thus, they are random variables too.

Since the LSE is BLUE we have:

  1. Then

  1. Then

t-values

Goodness-of-fit

Question

We question whether the data match the model or not.

One would say that the model is a good fit for the data if

Thus

Sums of squares

  • , total deviation in .
  • , sum of residuals.
  • , deviation caused by regression.

If then .

Coefficient of determination

It is defined as:

represents the share due to in total variation in . So means that 95% variation in is due to and 5% of the variation is due to model residuals. Such a model is considered as a good fit.

Observe that Thus we say

  • model is a good-fit
  • model may not be a poor-fit. You need to conduct goodness-of-fit test.

Goodness-of-fit Test

Hypothesis

: , means that has no effect on . : , has effect on . We test this hypothesis with ANOVA.

We construct our ANOVA table:

SourceDoFSum of SquaresMean of SS
Regression1
Error
Total
Thus one can deduct that:

Usual (significance level) values are:

  • 0.01
  • 0.05
  • 0.10

Rule of decision

Let

  • Reject if . Thus conclude that has statistically significant effect on at significance level.
  • Fail to reject if . Therefore say that there is not enough evidence to state the effect of on .

Test of hypothesis