In numerous problems, the researcher faces two variables that provide prediction of future behaviors.
This prediction can be achieved through a study involving the regression line equation, conceived through the criterion (y, dependent or response) and independent (x, also known as prognostic) variables.
It is a common reality in the research universe, involving variables such as income, age, expenses, among many others.
Equation of the line
Y = a1 + a2.x
Where y is the dependent variable and x is the independent variable.
The1 is the value of y for x and a2.is mean value of y per unit x.
The linear relationship between the two variables is measured by the correlation coefficient (R).
R ranges from -1 to 1, where 1 is the perfect correlation and the opposite indicates a strong negative correlation. Values close to zero indicate poor correlation.
In the example below, if there were a high R, y could be predicted for future events.
Years of study
Part Defect Numbers
Hours of quality training
The calculation of R is a very simple operation for software with statistical functions, being necessary to deepen the calculation procedures.
In this type of analysis it is important to determine how much the regression line represents the data. In this case, it is necessary to calculate the R2 Pearson's or coefficient of determination.
A R2 0.80, 80% of the variability is derived from x. Conversely, it can be said that 20% of the variance of Y is not attributable to differences in x.
To obtain the hypothesis test, we formulate H0 and H1 this way:
H0 : p = 0
H1: p ≠ 0
The calculation of t is performed through the formula,
Being t calculated greater than t tabulated, the null hypothesis is rejected.
A driver wants to forecast his car expenses based on the miles he drives per month.
EXPENSES (R $)
Looking at the table above, we can see a strong correlation between the variables, where R is very close to 1.Miles driven explains 98% of the variance of spending. Next: Multiple Linear Regression (RLM)