In numerous problems, the researcher faces two variables that provide prediction of future behaviors.
This prediction can be achieved through a study involving the regression line equation, conceived through the criterion (y, dependent or response) and independent (x, also known as prognostic) variables.
It is a common reality in the research universe, involving variables such as income, age, expenses, among many others.
Equation of the line
Y = a_{1} + a_{2}.x
Where y is the dependent variable and x is the independent variable.
The_{1} is the value of y for x and a_{2}.is mean value of y per unit x.
The linear relationship between the two variables is measured by the correlation coefficient (R).
R ranges from -1 to 1, where 1 is the perfect correlation and the opposite indicates a strong negative correlation. Values close to zero indicate poor correlation.
In the example below, if there were a high R, y could be predicted for future events.
Y | X |
Fuel expenses | Km rounded |
Personal Income | Years of study |
Part Defect Numbers | Hours of quality training |
The calculation of R is a very simple operation for software with statistical functions, being necessary to deepen the calculation procedures.
In this type of analysis it is important to determine how much the regression line represents the data. In this case, it is necessary to calculate the R^{2} Pearson's or coefficient of determination.
A R^{2} 0.80, 80% of the variability is derived from x. Conversely, it can be said that 20% of the variance of Y is not attributable to differences in x.
To obtain the hypothesis test, we formulate H_{0} and H_{1} this way:
H_{0} : p = 0
H_{1}: p ≠ 0
The calculation of t is performed through the formula,
_{ }
Being t calculated greater than t tabulated, the null hypothesis is rejected.
Example
A driver wants to forecast his car expenses based on the miles he drives per month.
KM | EXPENSES (R $) |
3203 | 400 |
3203 | 400 |
2603 | 340 |
3105 | 400 |
1305 | 150 |
804 | 100 |
1604 | 200 |
2706 | 300 |
805 | 100 |
1903 | 200 |
3203 | 400 |
3702 | 450 |
3203 | 400 |
3203 | 400 |
803 | 100 |
803 | 100 |
1102 | 130 |
3202 | 400 |
1604 | 150 |
1603 | 200 |
3203 | 400 |
3702 | 450 |
3403 | 440 |
Regression Statistics | |
R multiple | 0,993064678 |
R-Square | 0,986177454 |
R-square adjusted | 0,985519237 |
Standard error | 127,508336 |
Comments | 23 |
Looking at the table above, we can see a strong correlation between the variables, where R is very close to 1.
Miles driven explains 98% of the variance of spending. Next: Multiple Linear Regression (RLM)