# Simple Regression (RLS)

In numerous problems, the researcher faces two variables that provide prediction of future behaviors.

This prediction can be achieved through a study involving the regression line equation, conceived through the criterion (y, dependent or response) and independent (x, also known as prognostic) variables.

It is a common reality in the research universe, involving variables such as income, age, expenses, among many others.

## Equation of the line

Y = a1 + a2.x

Where y is the dependent variable and x is the independent variable.

The1 is the value of y for x and a2.is mean value of y per unit x.

The linear relationship between the two variables is measured by the correlation coefficient (R).

R ranges from -1 to 1, where 1 is the perfect correlation and the opposite indicates a strong negative correlation. Values ​​close to zero indicate poor correlation.

In the example below, if there were a high R, y could be predicted for future events.

 Y X Fuel expenses Km rounded Personal Income Years of study Part Defect Numbers Hours of quality training

The calculation of R is a very simple operation for software with statistical functions, being necessary to deepen the calculation procedures.

In this type of analysis it is important to determine how much the regression line represents the data. In this case, it is necessary to calculate the R2 Pearson's or coefficient of determination.

A R2 0.80, 80% of the variability is derived from x. Conversely, it can be said that 20% of the variance of Y is not attributable to differences in x.

To obtain the hypothesis test, we formulate H0 and H1 this way:

H0 : p = 0

H1: p ≠ 0

The calculation of t is performed through the formula, Being t calculated greater than t tabulated, the null hypothesis is rejected.

### Example

A driver wants to forecast his car expenses based on the miles he drives per month.

 KM EXPENSES (R \$) 3203 400 3203 400 2603 340 3105 400 1305 150 804 100 1604 200 2706 300 805 100 1903 200 3203 400 3702 450 3203 400 3203 400 803 100 803 100 1102 130 3202 400 1604 150 1603 200 3203 400 3702 450 3403 440
 Regression Statistics R multiple 0,993064678 R-Square 0,986177454 R-square adjusted 0,985519237 Standard error 127,508336 Comments 23

Looking at the table above, we can see a strong correlation between the variables, where R is very close to 1.

Miles driven explains 98% of the variance of spending. Next: Multiple Linear Regression (RLM)