Wednesday, October 13, 2010

A Conversation

FLG: I don't understand why the hypothesis test for slope of regression line is a t-test.

Professor: What do you mean?

FLG: Well, the general form of a t-test is Z over the square root of a Chi-squared over it's degrees of freedom. Your explanation has b1 - 0 in the numerator. I get how that stays Z. But then you're dividing that by SE, which equals sb1, which equals sqrt [ Σ(yi - ŷi)2 / (n - 2) ] / sqrt [ Σ(xi - x)2 ]. Now, [ Σ(yi - ŷi)2 / (n - 2) ] would make sense. That's a Chi-squared over its degrees of freedom.

Professor: How do you know it's Chi-squared?

FLG: Uh, cuz it has to be. Σ(yi - ŷi)2 matches the general form of a Chi-squared, which is Σxi2.

Professor: I'd have to think about that.

FLG: Well, I guess the easiest way to explain my confusion is this. sqrt [ Σ(yi - ŷi)2 / (n - 2) ] makes sense. My issue is how sqrt [ Σ(xi - x)2 ] doesn't affect the t-test. I mean, that's another Chi-squared.

Professor: I don't understand your question.

FLG: The general form of student-t distribution is Z over square root of a Chi-squared divided by its degrees of freedom. b1 - 0 / sqrt [ Σ(yi - ŷi)2 / (n - 2) ] fits that. However, your dividing the denominator by another Chi-squared.

Professor: Are you sure n-2 is the degrees of freedom?

FLG: No, I hadn't fully though that out, but it makes sense as an assumption because it fits the general form of the student-t. What concerns me is the effect of adding another Chi-square distribution in the denominator.

Professor: Wait a second. How do you know these are Chi-squared again?

FLG: Because they're the sum of squares, the underlying distribution of which is normal by assumption. Uh, forget it. I'm starting to confuse myself. I'll just think about it some more on my own.

13 comments:

Andrew Stevens said...

I guess I would respond that sqrt [ Σ(yi - ŷi)2 ] / sqrt [ Σ(xi - x)2 ] still looks like a Chi-squared distribution to me. Divide by its degrees of freedom and you get the same formula. So it's not clear to me why dividing by that equation creates any problems. Am I wrong here? If they're summing over the same ranges, why wouldn't the one divided by the other also be a Chi-squared distribution? Or am I missing something?

Andrew Stevens said...

By the way, we're dividing by the square root of a Chi-square distribution to the denominator, which I believe is the key here. I.e. Σ(yi - ŷi)2 ] / Σ(xi - x)2 is the Chi-square distribution. Then we divide by the degrees of freedom and take the square root.

Andrew Stevens said...

Now I'm second guessing myself though. I'm going to have to give it a little more thought.

FLG said...

Andrew:

That's actually helpful.

If we ignore the sqrts completely, which I think we can here, then Chi2 / df / Chi2 is really equal to Chi2 * Chi2 / df.

But is a Chi2 times a Chi2 still a Chi2? I don't know.

Or maybe we can't ignore the sqrt.

George Pal said...

Okay, this is funny but not as can be. Do it like Monte Python.

FLG said...

There was a dead parrot involved.

Andrew Stevens said...

You can definitely ignore the square root. The square root of x over the square root of y is equal to the square root of x/y so we can leave that aside for the moment. And (Chi2/df)/Chi2 = (Chi2/Chi2)/df. So we can get Chi2/Chi2as your Chi2 distribution so the question is - is that a Chi2? (Keep in mind that these are not independent distributions, by the way. It's the sum of the squares of the differences in the dependent variable divided by the sum of the squares of the differences in the independent variable.)

Anyway, I'm no longer sure that it is a Chi2 distribution.

arethusa said...

I'm with George Pal (and Colin Mochrie). Do it like Japanese anime!

Andrew Stevens said...

Okay, I think I solved this for you. The formula in question is not necessarily a chi-squared distribution, but it might be a chi-squared distribution. The t-test is used because the t-score has a t-distribution if the null hypothesis is true. If it isn't, the t-test will fail because the formula we're considering is not a chi-squared distribution.

At least I think that's the answer. Our basement flooded a couple of months ago (it's all cleaned up now with a new carpet and everything) and we had to move around my textbooks so I wasn't able to find my old statistics textbook. So that's just my own thoughts on the matter.

FLG said...

Andrew:


"The null hypothesis stating that a partial regression coefficient is equal to
zero can be tested by using a standard F-test which tests the equivalent null
hypothesis stating that the associated partial coefficient of correlation is zero.
This F-test has v1 = 1 and v2 = N − K − 1 degrees of freedom (with N being
the number of observations and K being the number of predictors). Because v1
is equal to 1, the square root of F gives a Student-t test."

From here

Andrew Stevens said...

I see you had already figured it out then. The ratio of two independent chi-squared distributions divided by their respective degrees of freedom is an F-distribution. The square root of an F distribution with v1=1 and v2=v is a Student's t-distribution, allowing the use of the t-test to determine if you've got a t-distribution and therefore whether the null hypothesis should be rejected or not.

FLG said...

Andrew:

I'm not quite sure I follow you about testing whether it's a t -distribution. The t-test part relies upon it being a t-distribution, not testing whether it is.

The F distribution, as you say, is a (Chi2 / df) / (Chi2 / df).

If numerator df =1, then it becomes this:
(Chi2 /1) / (Chi2/ df)

Or more simply:
Chi2/(Chi2/df)

Take the square root, and it is:
Z / Sqrt(Chi2 / df)


It's not testing whether it's a t-distribution, it is a t-distribution by the very nature of F-distributions and t-distributions.

Andrew Stevens said...

No, it's testing whether it is a t distribution. That's what the t-test does. If the null hypothesis is supported, then the t-test will pass and, if not, it will fail because the distribution being tested is not a t distribution. (Yes, yes, this is statistics, so we don't get simple passes and fails like that, I know.)

The question is whether that relationship that you posited holds or not. If the null hypothesis is correct, then the slope of the regression line will equal zero and the full formula will be a t-distribution with n-2 degrees of freedom.

The key is that Z is not simply x minus mu which makes sense of b1 - 0. The form of Z is x minus mu over sigma which is itself the square root of the sum of squares. The mysterious extra chi squared distribution you're worried about is actually the standard deviation of the purported normal distribution. If it's not normal, then the t-test will fail.

Sorry it took me so long to see the problem here. I assumed, as you did, that sigma could be ignored because it will just equal one in a standard normal distribution, and then I realized that we are testing for normality, not assuming it.

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 United States License.