The standard example I use is predicting weight from shoe size. You can predict weight equally well with the right or left shoe size. But together it doesn't work out. They show up as tighter scatterplots next to the diagonal. The F statistic is highly significant but none of the independent variables is, even without any adjustment for all 9 of them.
Some of these variables are highly significant, even with a Bonferroni adjustment. There's much more that can be said by looking at these results, but it would take us away from the main point.
This sort of situation will arise in time series analysis. We can consider the subscripts to be times. Due to this, we lose little information by subsampling the series at regular intervals. One conclusion we can draw from this is that when too many variables are included in a model they can mask the truly significant ones. The first sign of this is the highly significant overall F statistic accompanied by not-so-significant t-tests for the individual coefficients. Even when some of the variables are individually significant, this does not automatically mean the others are not.
That's one of the basic defects of stepwise regression strategies: they fall victim to this masking problem. Incidentally, the variance inflation factors in the first regression range from 2.
This happens when the predictors are highly correlated. Imagine a situation where there are only two predictors with very high correlation. Individually, they both also correlate closely with the response variable. Consequently, the F-test has a low p-value it is saying that the predictors together are highly significant in explaining the variation in the response variable. But the t-test for each predictor has a high p-value because after allowing for the effect of the other predictor there is not much left to explain.
Yet all the relations will obviously be there and easily detectable with regression analysis. You said that you understand the issue of variables being correlated and regression being insignificant better; it probably means that you have been conditioned by frequent mentioning of multicollinearity, but you would need to boost your understanding of the geometry of least squares. A keyword to search for would be "collinearity" or "multicollinearity".
VIFs are much easier to understand, but they can't deal with collinearity involving the intercept i. The answer you get depends on the question you ask.
In addition to the points already made, the individual parameters F values and the overall model F values answer different questions, so they get different answers. I have seen this happen even when the individual F values are not that close to significant, especially if the model has more than 2 or 3 IVs. I do not know of any way to combine the individual p-values and get anything meaningful, althought there may be a way.
One other thing to keep in mind is that the tests on the individual coefficients each assume that all of the other predictors are in the model. In other words each predictor is not significant as long as all of the other predictors are in the model.
There must be some interaction or interdependence between two or more of your predictors. Another is to realize it means that X is related to Y when controlling for the other variables, but not alone.
You say X relates to unique variance in Y. This is right. The unique variance in Y, though, is different from the total variance. However, in a few cases, the tests could yield different results. For example, a significant overall F-test could determine that the coefficients are jointly not all equal to zero while the tests for individual coefficients could determine that all of them are individually equal to zero. In the intercept-only model, all of the fitted values equal the mean of the response variable.
Therefore, if the P value of the overall F-test is significant, your regression model predicts the response variable better than the mean of the response.
While R-squared provides an estimate of the strength of the relationship between your model and the response variable, it does not provide a formal hypothesis test for this relationship. The overall F-test determines whether this relationship is statistically significant.
If the P value for the overall F-test is less than your significance level, you can conclude that the R-squared value is significantly different from zero. To see how the F-test works using concepts and graphs, see my post about understanding the F-test. If your entire model is statistically significant, that's great news! However, be sure to check the residual plots so you can trust the results! If you're learning about regression, read my regression tutorial!
Improve this answer. Good Luck Good Luck 15 15 bronze badges. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. Featured on Meta. Now live: A fully responsive profile. Version labels for answers. Related 3. Hot Network Questions.
0コメント