Comments

Simple Regression (RLS)


In numerous problems, the researcher faces two variables that provide prediction of future behaviors.

This prediction can be achieved through a study involving the regression line equation, conceived through the criterion (y, dependent or response) and independent (x, also known as prognostic) variables.

It is a common reality in the research universe, involving variables such as income, age, expenses, among many others.

Equation of the line

Y = a1 + a2.x

Where y is the dependent variable and x is the independent variable.

The1 is the value of y for x and a2.is mean value of y per unit x.

The linear relationship between the two variables is measured by the correlation coefficient (R).

R ranges from -1 to 1, where 1 is the perfect correlation and the opposite indicates a strong negative correlation. Values ​​close to zero indicate poor correlation.

In the example below, if there were a high R, y could be predicted for future events.

Y

X

Fuel expenses

Km rounded

Personal Income

Years of study

Part Defect Numbers

Hours of quality training

The calculation of R is a very simple operation for software with statistical functions, being necessary to deepen the calculation procedures.

In this type of analysis it is important to determine how much the regression line represents the data. In this case, it is necessary to calculate the R2 Pearson's or coefficient of determination.

A R2 0.80, 80% of the variability is derived from x. Conversely, it can be said that 20% of the variance of Y is not attributable to differences in x.

To obtain the hypothesis test, we formulate H0 and H1 this way:

H0 : p = 0

H1: p ≠ 0

The calculation of t is performed through the formula,

Being t calculated greater than t tabulated, the null hypothesis is rejected.

Example

A driver wants to forecast his car expenses based on the miles he drives per month.

KM

EXPENSES (R $)

3203

400

3203

400

2603

340

3105

400

1305

150

804

100

1604

200

2706

300

805

100

1903

200

3203

400

3702

450

3203

400

3203

400

803

100

803

100

1102

130

3202

400

1604

150

1603

200

3203

400

3702

450

3403

440

Regression Statistics

R multiple

0,993064678

R-Square

0,986177454

R-square adjusted

0,985519237

Standard error

127,508336

Comments

23

Looking at the table above, we can see a strong correlation between the variables, where R is very close to 1.

Miles driven explains 98% of the variance of spending. Next: Multiple Linear Regression (RLM)