multiple linear regression assignment

Recall, the degrees of freedom for the t-test is DFE = n – v – 1. There are only 2 explanatory variables left in the model, so the degrees of freedom for the t-tests = 10 – 2 – 1 = 7 .

B. A is not correct because it is possible that a backwards selection process will eliminate all variables. But\ , remember that we’ll stop eliminating variables once all remaining variables have p-values less than 0.05, which is the case here. Therefore, C is also incorrect.

C . Note that B is not correct – “keeping the number of radios and TV sets the same” is used i\ n the interpretation of the coefficient of newspaper copies and is different than the phrase “after accounting for the effects of the number of radios and number of TV sets in the country.”

False. Whenever all explanatory variables in a model have p-values from the t-test less than 0.05 \(or so\), we\ stop the backwards selection process. Such a model would be considered our “final model”.

t 6 = \(0.0005421 - 0\) / 0.0008653 = 0.6265 . Some notes: 1\) the degrees of freedom for the t-test is DFE = n – v –\ 1 = 6. As has been mentioned several times, when performing a t-test in regression, the degrees of freedom is ALWAYS DFE. 2\) Notice in the output above, the t-statistic for this t-test is given in the row for newspaper copies and under “T” – it is rounded to two decimal places in t\ he output. All the t-statistics \(under “T”\) in the regression output are calculated by dividing the “Coef” by “SE Coef”.

MSM = 0.16132 and MSE = 0.03477. 0.16132/0.03477 = 4.6396 or 4.64 rounded to two decimal places.

numerator df = DFM = # explanatory variables = 3 . denominator df = n – v – 1 = 10 – 3 – 1 = 6 . Both are highlighted in red in the output below: Analysis of Variance Source DF SS MS F P Regression 3 0.48397 0.16132 4.64 0.053 Residual Error 6 0.20859 0.03477 \ Total 9 0.69256

There is suggestive, but weak, evidence to indicate that at least one of number of daily newspaper copies, nu\ mber of radios, and/or number of TV sets help to explain a country’s literacy rate \(p-value = 0.053\). Some notes: 1\) even\ though the evidence is weak, we should continue the analysis to find out for sure if there is at least one explanatory variable that is a significant predictor of literacy rate and, if so, which one or ones. Anytime the p-value is less than 0.1 for the F\ -test, we should continue the analysis. 2\) Remember, the conclusion states that there is suggestive evidence that at least one explanatory variable is a significant predictor of literacy rate. It does NOT tell us how many or which one or ones are significant predictors of l\ iteracy rate – only that there is at least one that is. 3\) if the F-test indicates no evidence to reject the null hypothesis, then there is no need to continue the analysis as there is no evidence to indicate that any of the explanatory variables are helpful in explai\ ning the response variable. However, if there is even the slightest bit of evidence to reject the null hypothesis from the F-test \(i.e., p-value < 0.10\), we should continue the analysis. This will involve doing t-tests on each explanatory variable, a\ s we will see below.

First, we must check that we’re not extrapolating: all values of the explanatory variables are within \ the range of the data collected, so we’re okay. \(To illustrate, 200 daily newspapers is between the minimum of 10 daily newspaper copies per 1000 people in Kenya and 391 daily newspaper copies per 1000 people in Norway.\) Second, make sure you put \ the right values in for the right x’s – recall that x 1 = number of daily newspaper copies, x 2 = number of radios, and x 3 = number of TV sets \(all per 1000 people\): y^ = 0.840 . We’d predict about 84% of the residents to be literate in such a country.

B . Since the coefficient is negative, we’d expect the literacy rate to be lower for every additional radio per 1000 people in the \ population \(for countries with the same number of daily newspaper copies and TV sets per 1000 people in the population\).

Response variable: literacy rate . Explanatory variables: number of daily newspaper copies, number of radios, and number of TV sets \(all per 1000 people in the population of the country\).

y^ = 0.51486 + 0.00054x 1 - 0.00035x 2 + 0.00199x 3 where y^= predicted literacy rate x 1 = the number of daily ne\ wspaper copies in the country \(per 1000 people\) x 2 = the number of radios in the country \(per 1000 people\) x 3\ = the number of TV sets in the country \(per 1000 people\) Note where these numbers come from in the output – they are highl\ ighted in red in the outp\ ut below. It is important that we make sure we get the right coefficient with the right variable! Predicto\ r Coef SE Coef T P Constant 0.51486 0.09368 5.50 0.002 newspaper copies 0.0005421 0.0008653 0.63 0.554 radios -0.000\ 3535 0.0003285 -1.08 0.323 television sets 0.001988 0.001550 1.28 0.247

The “constant” term . If all the x’s and the residual equal 0, the model would be: y = B 0 + B 1 \(0\) + B 2 \(0\) + … + B v \(0\) + 0 = B 0

Recall, the degrees of freedom for any hypothesis test or confidence interval that involves a t-statistic is DFE = n – v – 1, where v = the number of explanatory variables in the model. In our problem, n = 10 and\ v = 3. Therefore, the degrees of freedom for the t* critical value is 10 – 3- 1 = 6 .

b 3 = 0.00199, SE\(b 3 \) = 0.00155, and = 2.447. Therefore, the lower bound = \(0.00199\) – \(2.447\)\(0.00155\) = -0.00180 . The upper bound = \(0.00199\) + \(2.447\)\(0.00155\) = 0.00578 . We write the 95% confidence interval for B 3 as \(-0.00180 , 0.00578\).

D . The interpretation is a combination of the interpretation of a confidence interval and the interpretation of\ the coefficient.

Both A and C are correct! The backwards selection process says to remove the variable with the highest p-value \ from the t-test as long as it’s greater than 0.05 \(or so\). All three variables have p-values greater than 0.05 and newspaper copies has the highest p-value, so it gets removed first since it is the “least significant” explanatory variable. So, C is correct. A \ is also correct because the closer a t-statistic is to 0, the higher its p-value. \(Think about that – a t-statistic tells us how many standard errors an observation is from the mean. The more standard errors an observation is from the mean, the low\ er the tail area probability, which means a lower p-value.\) It is important to note that the backwards selection process only eliminates on\ e variable at a time! Therefore, E is not correct – again, we never remove more than one variable at a time!!

Sound Clip \(243 KB\)

Sound Clip \(362 KB\)

Sound Clip \(112 KB\)