 α  Z  0.10  1.282  0.05  1.645  0.025  1.960  0.010  2.326  0.005  2.576  0.001  3.090  0.0001  3.719  Rejection Region for LowerTailed Z Test (H 1 : μ < μ 0 ) with α =0.05 The decision rule is: Reject H 0 if Z < 1.645.  a  Z  0.10  1.282  0.05  1.645  0.025  1.960  0.010  2.326  0.005  2.576  0.001  3.090  0.0001  3.719  Rejection Region for TwoTailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05 The decision rule is: Reject H 0 if Z < 1.960 or if Z > 1.960.    0.20  1.282  0.10  1.645  0.05  1.960  0.010  2.576  0.001  3.291  0.0001  3.819  The complete table of critical values of Z for upper, lower and twotailed tests can be found in the table of Z values to the right in "Other Resources." Critical values of t for upper, lower and twotailed tests can be found in the table of t values in "Other Resources."  Step 4. Compute the test statistic.
Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2. The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely). If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the pvalue and it will be less than the chosen level of significance if we reject H 0 . Statistical computing packages provide exact pvalues as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a pvalue. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  Step 1. Set up hypotheses and determine level of significance
H 0 : μ = 191 H 1 : μ > 191 α =0.05 The research hypothesis is that weights have increased, and therefore an upper tailed test is used.  Step 2. Select the appropriate test statistic.
Because the sample size is large (n > 30) the appropriate test statistic is  Step 3. Set up decision rule.
In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05. Reject H 0 if Z > 1.645. We now substitute the sample data into the formula for the test statistic identified in Step 2. We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the pvalue which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the pvalue is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the pvalue. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the pvalue. A statistical computing package would produce a more precise pvalue which would be in between 0.005 and 0.010. Here we are approximating the pvalue and would report p < 0.010. Type I and Type II ErrorsIn all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality). Table  Conclusions in Test of Hypothesis    is True  Correct Decision  Type I Error  is False  Type II Error  Correct Decision  In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ). When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0  H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true. The most common reason for a Type II error is a small sample size. Tests with One Sample, Continuous OutcomeHypothesis testing applications with a continuous outcome variable in a single population are performed according to the fivestep procedure outlined above. A key component is setting up the null and research hypotheses. The objective is to compare the mean in a single population to known mean (μ 0 ). The known value is generally derived from another study or report, for example a study in a similar, but not identical, population or a study performed some years ago. The latter is called a historical control. It is important in setting up the hypotheses in a one sample test that the mean specified in the null hypothesis is a fair and reasonable comparator. This will be discussed in the examples that follow. Test Statistics for Testing H 0 : μ= μ 0 Note that statistical computing packages will use the t statistic exclusively and make the necessary adjustments for comparing the test statistic to appropriate values from probability tables to produce a pvalue. The National Center for Health Statistics (NCHS) published a report in 2005 entitled Health, United States, containing extensive information on major trends in the health of Americans. Data are provided for the US population as a whole and for specific ages, sexes and races. The NCHS report indicated that in 2002 Americans paid an average of $3,302 per year on health care and prescription drugs. An investigator hypothesizes that in 2005 expenditures have decreased primarily due to the availability of generic drugs. To test the hypothesis, a sample of 100 Americans are selected and their expenditures on health care and prescription drugs in 2005 are measured. The sample data are summarized as follows: n=100, x̄ =$3,190 and s=$890. Is there statistical evidence of a reduction in expenditures on health care and prescription drugs in 2005? Is the sample mean of $3,190 evidence of a true reduction in the mean or is it within chance fluctuation? We will run the test using the fivestep approach.  Step 1. Set up hypotheses and determine level of significance
H 0 : μ = 3,302 H 1 : μ < 3,302 α =0.05 The research hypothesis is that expenditures have decreased, and therefore a lowertailed test is used. This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < 1.645.  Step 4. Compute the test statistic.
We do not reject H 0 because 1.26 > 1.645. We do not have statistically significant evidence at α=0.05 to show that the mean expenditures on health care and prescription drugs are lower in 2005 than the mean of $3,302 reported in 2002. Recall that when we fail to reject H 0 in a test of hypothesis that either the null hypothesis is true (here the mean expenditures in 2005 are the same as those in 2002 and equal to $3,302) or we committed a Type II error (i.e., we failed to reject H 0 when in fact it is false). In summarizing this test, we conclude that we do not have sufficient evidence to reject H 0 . We do not conclude that H 0 is true, because there may be a moderate to high probability that we committed a Type II error. It is possible that the sample size is not large enough to detect a difference in mean expenditures. The NCHS reported that the mean total cholesterol level in 2002 for all adults was 203. Total cholesterol levels in participants who attended the seventh examination of the Offspring in the Framingham Heart Study are summarized as follows: n=3,310, x̄ =200.3, and s=36.8. Is there statistical evidence of a difference in mean cholesterol levels in the Framingham Offspring? Here we want to assess whether the sample mean of 200.3 in the Framingham sample is statistically significantly different from 203 (i.e., beyond what we would expect by chance). We will run the test using the fivestep approach. H 0 : μ= 203 H 1 : μ≠ 203 α=0.05 The research hypothesis is that cholesterol levels are different in the Framingham Offspring, and therefore a twotailed test is used.  Step 3. Set up decision rule.
This is a twotailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < 1.960 or is Z > 1.960. We reject H 0 because 4.22 ≤ 1. .960. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level in the Framingham Offspring is different from the national average of 203 reported in 2002. Because we reject H 0 , we also approximate a pvalue. Using the twosided significance levels, p < 0.0001. Statistical Significance versus Clinical (Practical) SignificanceThis example raises an important concept of statistical versus clinical or practical significance. From a statistical standpoint, the total cholesterol levels in the Framingham sample are highly statistically significantly different from the national average with p < 0.0001 (i.e., there is less than a 0.01% chance that we are incorrectly rejecting the null hypothesis). However, the sample mean in the Framingham Offspring study is 200.3, less than 3 units different from the national mean of 203. The reason that the data are so highly statistically significant is due to the very large sample size. It is always important to assess both statistical and clinical significance of data. This is particularly relevant when the sample size is large. Is a 3 unit difference in total cholesterol a meaningful difference? Consider again the NCHSreported mean total cholesterol level in 2002 for all adults of 203. Suppose a new drug is proposed to lower total cholesterol. A study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients are enrolled in the study and asked to take the new drug for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows: n=15, x̄ =195.9 and s=28.7. Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new drug for 6 weeks? We will run the test using the fivestep approach. H 0 : μ= 203 H 1 : μ< 203 α=0.05  Step 2. Select the appropriate test statistic.
Because the sample size is small (n<30) the appropriate test statistic is This is a lower tailed test, using a t statistic and a 5% level of significance. In order to determine the critical value of t, we need degrees of freedom, df, defined as df=n1. In this example df=151=14. The critical value for a lower tailed test with df=14 and a =0.05 is 2.145 and the decision rule is as follows: Reject H 0 if t < 2.145. We do not reject H 0 because 0.96 > 2.145. We do not have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower than the national mean in patients taking the new drug for 6 weeks. Again, because we failed to reject the null hypothesis we make a weaker concluding statement allowing for the possibility that we may have committed a Type II error (i.e., failed to reject H 0 when in fact the drug is efficacious). This example raises an important issue in terms of study design. In this example we assume in the null hypothesis that the mean cholesterol level is 203. This is taken to be the mean cholesterol level in patients without treatment. Is this an appropriate comparator? Alternative and potentially more efficient study designs to evaluate the effect of the new drug could involve two treatment groups, where one group receives the new drug and the other does not, or we could measure each patient's baseline or pretreatment cholesterol level and then assess changes from baseline to 6 weeks posttreatment. These designs are also discussed here. Video  Comparing a Sample Mean to Known Population Mean (8:20) Link to transcript of the video Tests with One Sample, Dichotomous OutcomeHypothesis testing applications with a dichotomous outcome variable in a single population are also performed according to the fivestep procedure. Similar to tests for means, a key component is setting up the null and research hypotheses. The objective is to compare the proportion of successes in a single population to a known proportion (p 0 ). That known proportion is generally derived from another study or report and is sometimes called a historical control. It is important in setting up the hypotheses in a one sample test that the proportion specified in the null hypothesis is a fair and reasonable comparator. In one sample tests for a dichotomous outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the sample proportion which is computed by taking the ratio of the number of successes to the sample size, We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H 0 : p = p 0 if min(np 0 , n(1p 0 )) > 5 The formula above is appropriate for large samples, defined when the smaller of np 0 and n(1p 0 ) is at least 5. This is similar, but not identical, to the condition required for appropriate use of the confidence interval formula for a population proportion, i.e., Here we use the proportion specified in the null hypothesis as the true proportion of successes rather than the sample proportion. If we fail to satisfy the condition, then alternative procedures, called exact methods must be used to test the hypothesis about the population proportion. Example: The NCHS report indicated that in 2002 the prevalence of cigarette smoking among American adults was 21.1%. Data on prevalent smoking in n=3,536 participants who attended the seventh examination of the Offspring in the Framingham Heart Study indicated that 482/3,536 = 13.6% of the respondents were currently smoking at the time of the exam. Suppose we want to assess whether the prevalence of smoking is lower in the Framingham Offspring sample given the focus on cardiovascular health in that community. Is there evidence of a statistically lower prevalence of smoking in the Framingham Offspring study as compared to the prevalence among all Americans? H 0 : p = 0.211 H 1 : p < 0.211 α=0.05 We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1p 0 )) = min( 3,536(0.211), 3,536(10.211))=min(746, 2790)=746. The sample size is more than adequate so the following formula can be used: This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < 1.645. We reject H 0 because 10.93 < 1.645. We have statistically significant evidence at α=0.05 to show that the prevalence of smoking in the Framingham Offspring is lower than the prevalence nationally (21.1%). Here, p < 0.0001. The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data? Calculate this on your own before checking the answer. Video  Hypothesis Test for One Sample and a Dichotomous Outcome (3:55) Tests with Two Independent Samples, Continuous OutcomeThere are many applications where it is of interest to compare two independent groups with respect to their mean scores on a continuous outcome. Here we compare means between groups, but rather than generating an estimate of the difference, we will test whether the observed difference (increase, decrease or difference) is statistically significant or not. Remember, that hypothesis testing gives an assessment of statistical significance, whereas estimation gives an estimate of effect and both are important. Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows: for sample 1: for sample 2: The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2. In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ 1 μ 2 . The null hypothesis is always that there is no difference between groups with respect to means, i.e., The null hypothesis can also be written as follows: H 0 : μ 1 = μ 2 . In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H 1 : μ 1 > μ 2 ), that the first mean is smaller than the second (H 1 : μ 1 < μ 2 ), or that the means are different (H 1 : μ 1 ≠ μ 2 ). The three different alternatives represent upper, lower, and twotailed tests, respectively. The following test statistics are used to test these hypotheses. Test Statistics for Testing H 0 : μ 1 = μ 2  if n 1 > 30 and n 2 > 30
 if n 1 < 30 or n 2 < 30
NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s 1 2 = s 2 2 ). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s 1 2 /s 2 2 is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances. The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows: Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population. Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s 1 and s 2 .) Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.    Characteristic  n   S  n   s  Systolic Blood Pressure  1,623  128.2  17.5  1,911  126.5  20.1  Diastolic Blood Pressure  1,622  75.6  9.8  1,910  72.6  9.7  Total Serum Cholesterol  1,544  192.4  35.2  1,766  207.1  36.7  Weight  1,612  194.0  33.8  1,894  157.7  34.6  Height  1,545  68.9  2.7  1,781  63.4  2.5  Body Mass Index  1,545  28.8  4.6  1,781  27.6  5.9  Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance. H 0 : μ 1 = μ 2 H 1 : μ 1 ≠ μ 2 α=0.05 Because both samples are large ( > 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s 1 2 /s 2 2 . Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.5 2 /20.1 2 = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation. Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample. Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes. Now the test statistic: We reject H 0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The pvalue is p < 0.010. Here again we find that there is a statistically significant difference in mean systolic blood pressures between men and women at p < 0.010. Notice that there is a very small difference in the sample means (128.2126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 + 1.26 or (0.44, 2.96). The confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and pvalue provide an assessment of the statistical significance of the difference. Above we performed a study to evaluate a new drug designed to lower total cholesterol. The study involved one sample of patients, each patient took the new drug for 6 weeks and had their cholesterol measured. As a means of evaluating the efficacy of the new drug, the mean total cholesterol following 6 weeks of treatment was compared to the NCHSreported mean total cholesterol level in 2002 for all adults of 203. At the end of the example, we discussed the appropriateness of the fixed comparator as well as an alternative study design to evaluate the effect of the new drug involving two treatment groups, where one group receives the new drug and the other does not. Here, we revisit the example with a concurrent or parallel control group, which is very typical in randomized controlled trials or clinical trials (refer to the EP713 module on Clinical Trials). A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows. Treatment     New Drug  15  195.9  28.7  Placebo  15  227.4  30.3  Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the fivestep approach. H 0 : μ 1 = μ 2 H 1 : μ 1 < μ 2 α=0.05 Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s 1 2 /s 2 2 =28.7 2 /30.3 2 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is: This is a lowertailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n 1 +n 2 2 = 15+152=28. The critical value for a lower tailed test with df=28 and α=0.05 is 1.701 and the decision rule is: Reject H 0 if t < 1.701. Now the test statistic, We reject H 0 because 2.92 < 1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005. The clinical trial in this example finds a statistically significant reduction in total cholesterol, whereas in the previous example where we had a historical control (as opposed to a parallel control group) we did not demonstrate efficacy of the new drug. Notice that the mean total cholesterol level in patients taking placebo is 217.4 which is very different from the mean cholesterol reported among all Americans in 2002 of 203 and used as the comparator in the prior example. The historical control value may not have been the most appropriate comparator as cholesterol levels have been increasing over time. In the next section, we present another design that can be used to assess the efficacy of the new drug. Video  Comparison of Two Independent Samples With a Continuous Outcome (8:02) Tests with Matched Samples, Continuous OutcomeIn the previous section we compared two groups with respect to their mean scores on a continuous outcome. An alternative study design is to compare matched or paired samples. The two comparison groups are said to be dependent, and the data can arise from a single sample of participants where each participant is measured twice (possibly before and after an intervention) or from two samples that are matched on specific characteristics (e.g., siblings). When the samples are dependent, we focus on difference scores in each participant or between members of a pair and the test of hypothesis is based on the mean difference, μ d . The null hypothesis again reflects "no difference" and is stated as H 0 : μ d =0 . Note that there are some instances where it is of interest to test whether there is a difference of a particular magnitude (e.g., μ d =5) but in most instances the null hypothesis reflects no difference (i.e., μ d =0). The appropriate formula for the test of hypothesis depends on the sample size. The formulas are shown below and are identical to those we presented for estimating the mean of a single sample presented (e.g., when comparing against an external or historical control), except here we focus on difference scores. Test Statistics for Testing H 0 : μ d =0 A new drug is proposed to lower total cholesterol and a study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients agree to participate in the study and each is asked to take the new drug for 6 weeks. However, before starting the treatment, each patient's total cholesterol level is measured. The initial measurement is a pretreatment or baseline value. After taking the drug for 6 weeks, each patient's total cholesterol level is measured again and the data are shown below. The rightmost column contains difference scores for each patient, computed by subtracting the 6 week cholesterol level from the baseline level. The differences represent the reduction in total cholesterol over 4 weeks. (The differences could have been computed by subtracting the baseline total cholesterol level from the level measured at 6 weeks. The way in which the differences are computed does not affect the outcome of the analysis only the interpretation.)     1  215  205  10  2  190  156  34  3  230  190  40  4  220  180  40  5  214  201  13  6  240  227  13  7  210  197  13  8  193  173  20  9  210  204  6  10  230  217  13  11  180  142  38  12  260  262  2  13  210  207  3  14  190  184  6  15  200  193  7  Because the differences are computed by subtracting the cholesterols measured at 6 weeks from the baseline values, positive differences indicate reductions and negative differences indicate increases (e.g., participant 12 increases by 2 units over 6 weeks). The goal here is to test whether there is a statistically significant reduction in cholesterol. Because of the way in which we computed the differences, we want to look for an increase in the mean difference (i.e., a positive reduction). In order to conduct the test, we need to summarize the differences. In this sample, we have The calculations are shown below.    1  10  100  2  34  1156  3  40  1600  4  40  1600  5  13  169  6  13  169  7  13  169  8  20  400  9  6  36  10  13  169  11  38  1444  12  2  4  13  3  9  14  6  36  15  7  49     Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new medication for 6 weeks? We will run the test using the fivestep approach. H 0 : μ d = 0 H 1 : μ d > 0 α=0.05 NOTE: If we had computed differences by subtracting the baseline level from the level measured at 6 weeks then negative differences would have reflected reductions and the research hypothesis would have been H 1 : μ d < 0.  Step 2 . Select the appropriate test statistic.
This is an uppertailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table at the right, with df=151=14. The critical value for an uppertailed test with df=14 and α=0.05 is 2.145 and the decision rule is Reject H 0 if t > 2.145. We now substitute the sample data into the formula for the test statistic identified in Step 2. We reject H 0 because 4.61 > 2.145. We have statistically significant evidence at α=0.05 to show that there is a reduction in cholesterol levels over 6 weeks. Here we illustrate the use of a matched design to test the efficacy of a new drug to lower total cholesterol. We also considered a parallel design (randomized clinical trial) and a study using a historical comparator. It is extremely important to design studies that are best suited to detect a meaningful difference when one exists. There are often several alternatives and investigators work with biostatisticians to determine the best design for each application. It is worth noting that the matched design used here can be problematic in that observed differences may only reflect a "placebo" effect. All participants took the assigned medication, but is the observed reduction attributable to the medication or a result of these participation in a study. Video  Hypothesis Testing With a Matched Sample and a Continuous Outcome (3:11) Tests with Two Independent Samples, Dichotomous OutcomeThere are several approaches that can be used to test hypotheses concerning two independent proportions. Here we present one approach  the chisquare test of independence is an alternative, equivalent, and perhaps more popular approach to the same analysis. Hypothesis testing with the chisquare test is addressed in the third module in this series: BS704_HypothesisTestingChiSquare. In tests of hypothesis comparing proportions between two independent groups, one test is performed and results can be interpreted to apply to a risk difference, relative risk or odds ratio. As a reminder, the risk difference is computed by taking the difference in proportions between comparison groups, the risk ratio is computed by taking the ratio of proportions, and the odds ratio is computed by taking the ratio of the odds of success in the comparison groups. Because the null values for the risk difference, the risk ratio and the odds ratio are different, the hypotheses in tests of hypothesis look slightly different depending on which measure is used. When performing tests of hypothesis for the risk difference, relative risk or odds ratio, the convention is to label the exposed or treated group 1 and the unexposed or control group 2. For example, suppose a study is designed to assess whether there is a significant difference in proportions in two independent comparison groups. The test of interest is as follows: H 0 : p 1 = p 2 versus H 1 : p 1 ≠ p 2 . The following are the hypothesis for testing for a difference in proportions using the risk difference, the risk ratio and the odds ratio. First, the hypotheses above are equivalent to the following:  For the risk difference, H 0 : p 1  p 2 = 0 versus H 1 : p 1  p 2 ≠ 0 which are, by definition, equal to H 0 : RD = 0 versus H 1 : RD ≠ 0.
 If an investigator wants to focus on the risk ratio, the equivalent hypotheses are H 0 : RR = 1 versus H 1 : RR ≠ 1.
 If the investigator wants to focus on the odds ratio, the equivalent hypotheses are H 0 : OR = 1 versus H 1 : OR ≠ 1.
Suppose a test is performed to test H 0 : RD = 0 versus H 1 : RD ≠ 0 and the test rejects H 0 at α=0.05. Based on this test we can conclude that there is significant evidence, α=0.05, of a difference in proportions, significant evidence that the risk difference is not zero, significant evidence that the risk ratio and odds ratio are not one. The risk difference is analogous to the difference in means when the outcome is continuous. Here the parameter of interest is the difference in proportions in the population, RD = p 1 p 2 and the null value for the risk difference is zero. In a test of hypothesis for the risk difference, the null hypothesis is always H 0 : RD = 0. This is equivalent to H 0 : RR = 1 and H 0 : OR = 1. In the research hypothesis, an investigator can hypothesize that the first proportion is larger than the second (H 1 : p 1 > p 2 , which is equivalent to H 1 : RD > 0, H 1 : RR > 1 and H 1 : OR > 1), that the first proportion is smaller than the second (H 1 : p 1 < p 2 , which is equivalent to H 1 : RD < 0, H 1 : RR < 1 and H 1 : OR < 1), or that the proportions are different (H 1 : p 1 ≠ p 2 , which is equivalent to H 1 : RD ≠ 0, H 1 : RR ≠ 1 and H 1 : OR ≠ 1). The three different alternatives represent upper, lower and twotailed tests, respectively. The formula for the test of hypothesis for the difference in proportions is given below. Test Statistics for Testing H 0 : p 1 = p The formula above is appropriate for large samples, defined as at least 5 successes (np > 5) and at least 5 failures (n(1p > 5)) in each of the two samples. If there are fewer than 5 successes or failures in either comparison group, then alternative procedures, called exact methods must be used to estimate the difference in population proportions. The following table summarizes data from n=3,799 participants who attended the fifth examination of the Offspring in the Framingham Heart Study. The outcome of interest is prevalent CVD and we want to test whether the prevalence of CVD is significantly higher in smokers as compared to nonsmokers.  Free of CVD  History of CVD  Total  NonSmoker  2,757  298  3,055  Current Smoker  663  81  744  Total  3,420  379  3,799  The prevalence of CVD (or proportion of participants with prevalent CVD) among nonsmokers is 298/3,055 = 0.0975 and the prevalence of CVD among current smokers is 81/744 = 0.1089. Here smoking status defines the comparison groups and we will call the current smokers group 1 (exposed) and the nonsmokers (unexposed) group 2. The test of hypothesis is conducted below using the five step approach. H 0 : p 1 = p 2 H 1 : p 1 ≠ p 2 α=0.05  Step 2. Select the appropriate test statistic.
We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group. In this example, we have more than enough successes (cases of prevalent CVD) and failures (persons free of CVD) in each comparison group. The sample size is more than adequate so the following formula can be used: Reject H 0 if Z < 1.960 or if Z > 1.960. We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes: We now substitute to compute the test statistic. We do not reject H 0 because 1.960 < 0.927 < 1.960. We do not have statistically significant evidence at α=0.05 to show that there is a difference in prevalent CVD between smokers and nonsmokers. A 95% confidence interval for the difference in prevalent CVD (or risk difference) between smokers and nonsmokers as 0.0114 + 0.0247, or between 0.0133 and 0.0361. Because the 95% confidence interval for the risk difference includes zero we again conclude that there is no statistically significant difference in prevalent CVD between smokers and nonsmokers. Smoking has been shown over and over to be a risk factor for cardiovascular disease. What might explain the fact that we did not observe a statistically significant difference using data from the Framingham Heart Study? HINT: Here we consider prevalent CVD, would the results have been different if we considered incident CVD? A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 010 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.     New Pain Reliever  50  23  0.46  Standard Pain Reliever  50  11  0.22  We now test whether there is a statistically significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using the five step approach. H 0 : p 1 = p 2 H 1 : p 1 ≠ p 2 α=0.05 Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2. We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group, i.e., In this example, we have min(50(0.46), 50(10.46), 50(0.22), 50(10.22)) = min(23, 27, 11, 39) = 11. The sample size is adequate so the following formula can be used We reject H 0 because 2.526 > 1960. We have statistically significant evidence at a =0.05 to show that there is a difference in the proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever. A 95% confidence interval for the difference in proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever is 0.24 + 0.18 or between 0.06 and 0.42. Because the 95% confidence interval does not include zero we concluded that there was a statistically significant difference in proportions which is consistent with the test of hypothesis result. Again, the procedures discussed here apply to applications where there are two independent comparison groups and a dichotomous outcome. There are other applications in which it is of interest to compare a dichotomous outcome in matched or paired samples. For example, in a clinical trial we might wish to test the effectiveness of a new antibiotic eye drop for the treatment of bacterial conjunctivitis. Participants use the new antibiotic eye drop in one eye and a comparator (placebo or active control treatment) in the other. The success of the treatment (yes/no) is recorded for each participant for each eye. Because the two assessments (success or failure) are paired, we cannot use the procedures discussed here. The appropriate test is called McNemar's test (sometimes called McNemar's test for dependent proportions). Vide0  Hypothesis Testing With Two Independent Samples and a Dichotomous Outcome (2:55) Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. There are many details to consider in hypothesis testing. The first is to determine the appropriate test. We discussed Z and t tests here for different applications. The appropriate test depends on the distribution of the outcome variable (continuous or dichotomous), the number of comparison groups (one, two) and whether the comparison groups are independent or dependent. The following table summarizes the different tests of hypothesis discussed here.  Continuous Outcome, One Sample: H0: μ = μ0
 Continuous Outcome, Two Independent Samples: H0: μ1 = μ2
 Continuous Outcome, Two Matched Samples: H0: μd = 0
 Dichotomous Outcome, One Sample: H0: p = p 0
 Dichotomous Outcome, Two Independent Samples: H0: p1 = p2, RD=0, RR=1, OR=1
Once the type of test is determined, the details of the test must be specified. Specifically, the null and alternative hypotheses must be clearly stated. The null hypothesis always reflects the "no change" or "no difference" situation. The alternative or research hypothesis reflects the investigator's belief. The investigator might hypothesize that a parameter (e.g., a mean, proportion, difference in means or proportions) will increase, will decrease or will be different under specific conditions (sometimes the conditions are different experimental conditions and other times the conditions are simply different groups of participants). Once the hypotheses are specified, data are collected and summarized. The appropriate test is then conducted according to the five step approach. If the test leads to rejection of the null hypothesis, an approximate pvalue is computed to summarize the significance of the findings. When tests of hypothesis are conducted using statistical computing packages, exact pvalues are computed. Because the statistical tables in this textbook are limited, we can only approximate pvalues. If the test fails to reject the null hypothesis, then a weaker concluding statement is made for the following reason. In hypothesis testing, there are two types of errors that can be committed. A Type I error occurs when a test incorrectly rejects the null hypothesis. This is referred to as a false positive result, and the probability that this occurs is equal to the level of significance, α. The investigator chooses the level of significance in Step 1, and purposely chooses a small value such as α=0.05 to control the probability of committing a Type I error. A Type II error occurs when a test fails to reject the null hypothesis when in fact it is false. The probability that this occurs is equal to β. Unfortunately, the investigator cannot specify β at the outset because it depends on several factors including the sample size (smaller samples have higher b), the level of significance (β decreases as a increases), and the difference in the parameter under the null and alternative hypothesis. We noted in several examples in this chapter, the relationship between confidence intervals and tests of hypothesis. The approaches are different, yet related. It is possible to draw a conclusion about statistical significance by examining a confidence interval. For example, if a 95% confidence interval does not contain the null value (e.g., zero when analyzing a mean difference or risk difference, one when analyzing relative risks or odds ratios), then one can conclude that a twosided test of hypothesis would reject the null at α=0.05. It is important to note that the correspondence between a confidence interval and test of hypothesis relates to a twosided test and that the confidence level corresponds to a specific level of significance (e.g., 95% to α=0.05, 90% to α=0.10 and so on). The exact significance of the test, the pvalue, can only be determined using the hypothesis testing approach and the pvalue provides an assessment of the strength of the evidence and not an estimate of the effect. Answers to Selected ProblemsDental services problem  bottom of page 5.  Step 1: Set up hypotheses and determine the level of significance.
α=0.05  Step 2: Select the appropriate test statistic.
First, determine whether the sample size is adequate. Therefore the sample size is adequate, and we can use the following formula:  Step 3: Set up the decision rule.
Reject H0 if Z is less than or equal to 1.96 or if Z is greater than or equal to 1.96.  Step 4: Compute the test statistic
 Step 5: Conclusion.
We reject the null hypothesis because 6.15<1.96. Therefore there is a statistically significant difference in the proportion of children in Boston using dental services compated to the national proportion. User PreferencesContent preview. Arcu felis bibendum ut tristique et egestas quis:  Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
 Duis aute irure dolor in reprehenderit in voluptate
 Excepteur sint occaecat cupidatat non proident
Keyboard Shortcuts5.2  writing hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)). When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (nondirectional, righttailed or lefttailed), and (3) the value of the hypothesized parameter.  At this point we can write hypotheses for a single mean (\(\mu\)), paired means(\(\mu_d\)), a single proportion (\(p\)), the difference between two independent means (\(\mu_1\mu_2\)), the difference between two proportions (\(p_1p_2\)), a simple linear regression slope (\(\beta\)), and a correlation (\(\rho\)).
 The research question will give us the information necessary to determine if the test is twotailed (e.g., "different from," "not equal to"), righttailed (e.g., "greater than," "more than"), or lefttailed (e.g., "less than," "fewer than").
 The research question will also give us the hypothesized parameter value. This is the number that goes in the hypothesis statements (i.e., \(\mu_0\) and \(p_0\)). For the difference between two groups, regression, and correlation, this value is typically 0.
Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)). The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =). One Group Mean Research Question  Is the population mean different from \( \mu_{0} \)?  Is the population mean greater than \(\mu_{0}\)?  Is the population mean less than \(\mu_{0}\)?  Null Hypothesis, \(H_{0}\)  \(\mu=\mu_{0} \)  \(\mu=\mu_{0} \)  \(\mu=\mu_{0} \)  Alternative Hypothesis, \(H_{a}\)  \(\mu\neq \mu_{0} \)  \(\mu> \mu_{0} \)  \(\mu<\mu_{0} \)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  Paired Means Research Question  Is there a difference in the population?  Is there a mean increase in the population?  Is there a mean decrease in the population?  Null Hypothesis, \(H_{0}\)  \(\mu_d=0 \)  \(\mu_d =0 \)  \(\mu_d=0 \)  Alternative Hypothesis, \(H_{a}\)  \(\mu_d \neq 0 \)  \(\mu_d> 0 \)  \(\mu_d<0 \)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  One Group Proportion Research Question  Is the population proportion different from \(p_0\)?  Is the population proportion greater than \(p_0\)?  Is the population proportion less than \(p_0\)?  Null Hypothesis, \(H_{0}\)  \(p=p_0\)  \(p= p_0\)  \(p= p_0\)  Alternative Hypothesis, \(H_{a}\)  \(p\neq p_0\)  \(p> p_0\)  \(p< p_0\)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  Difference between Two Independent Means Research Question  Are the population means different?  Is the population mean in group 1 greater than the population mean in group 2?  Is the population mean in group 1 less than the population mean in groups 2?  Null Hypothesis, \(H_{0}\)  \(\mu_1=\mu_2\)  \(\mu_1 = \mu_2 \)  \(\mu_1 = \mu_2 \)  Alternative Hypothesis, \(H_{a}\)  \(\mu_1 \ne \mu_2 \)  \(\mu_1 \gt \mu_2 \)  \(\mu_1 \lt \mu_2\)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  Difference between Two Proportions Research Question  Are the population proportions different?  Is the population proportion in group 1 greater than the population proportion in groups 2?  Is the population proportion in group 1 less than the population proportion in group 2?  Null Hypothesis, \(H_{0}\)  \(p_1 = p_2 \)  \(p_1 = p_2 \)  \(p_1 = p_2 \)  Alternative Hypothesis, \(H_{a}\)  \(p_1 \ne p_2\)  \(p_1 \gt p_2 \)  \(p_1 \lt p_2\)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  Simple Linear Regression: Slope Research Question  Is the slope in the population different from 0?  Is the slope in the population positive?  Is the slope in the population negative?  Null Hypothesis, \(H_{0}\)  \(\beta =0\)  \(\beta= 0\)  \(\beta = 0\)  Alternative Hypothesis, \(H_{a}\)  \(\beta\neq 0\)  \(\beta> 0\)  \(\beta< 0\)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  Correlation (Pearson's ) Research Question  Is the correlation in the population different from 0?  Is the correlation in the population positive?  Is the correlation in the population negative?  Null Hypothesis, \(H_{0}\)  \(\rho=0\)  \(\rho= 0\)  \(\rho = 0\)  Alternative Hypothesis, \(H_{a}\)  \(\rho \neq 0\)  \(\rho > 0\)  \(\rho< 0\)  Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional  Hypothesis Testing Calculator $\text{Test Statistic: }$   =     $\text{Degrees of Freedom: } $  $df$  =   $ \text{Level of Significance: } $  $\alpha$  =   Type II Error $H_o$:  $\mu$    $H_a$:  $\mu$  ≠  $\mu_0$  $\text{Level of Significance: }$  $\alpha$  =   The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.  $\sigma$ Known  $\sigma$ Unknown  Test Statistic  $ z = \dfrac{\bar{x}\mu_0}{\sigma/\sqrt{{\color{Black} n}}} $  $ t = \dfrac{\bar{x}\mu_0}{s/\sqrt{n}} $  Next, the test statistic is used to conduct the test using either the pvalue approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or twotailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a twotailed test. To switch from a lower tail test to an upper tail or twotailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively. Lower Tail Test  Upper Tail Test  TwoTailed Test  $H_0 \colon \mu \geq \mu_0$  $H_0 \colon \mu \leq \mu_0$  $H_0 \colon \mu = \mu_0$  $H_a \colon \mu  $H_a \colon \mu \neq \mu_0$  In the pvalue approach, the test statistic is used to calculate a pvalue. If the test is a lower tail test, the pvalue is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the pvalue is the probability of getting a value for the test statistic at least as large as the value from the sample. In a twotailed test, the pvalue is the probability of getting a value for the test statistic at least as unlikely as the value from the sample. To test the hypothesis in the pvalue approach, compare the pvalue to the level of significance. If the pvalue is less than or equal to the level of signifance, reject the null hypothesis. If the pvalue is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or twotailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the pvalue from the test statistic by clicking on the switch symbol twice. In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a twotailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic. To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the pvalue approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a twotailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis. Lower Tail Test  Upper Tail Test  TwoTailed Test  If $z \leq z_\alpha$, reject $H_0$.  If $z \geq z_\alpha$, reject $H_0$.  If $z \leq z_{\alpha/2}$ or $z \geq z_{\alpha/2}$, reject $H_0$.  If $t \leq t_\alpha$, reject $H_0$.  If $t \geq t_\alpha$, reject $H_0$.  If $t \leq t_{\alpha/2}$ or $t \geq t_{\alpha/2}$, reject $H_0$.  When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.   Condition     $H_0$ True  $H_a$ True  Conclusion  Accept $H_0$  Correct  Type II Error  Reject $H_0$  Type I Error  Correct  Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page. Module 10: Inference for Means Hypothesis Test for a Population Mean (1 of 5)Learning outcomes.  Recognize when to use a hypothesis test or a confidence interval to draw a conclusion about a population mean.
 Under appropriate conditions, conduct a hypothesis test about a population mean. State a conclusion in context.
IntroductionIn Inference for Means , our focus is on inference when the variable is quantitative, so the parameters and statistics are means. In “Estimating a Population Mean,” we learned how to use a sample mean to calculate a confidence interval. The confidence interval estimates a population mean. In “Hypothesis Test for a Population Mean,” we learn to use a sample mean to test a hypothesis about a population mean. We did hypothesis tests in earlier modules. In Inference for One Proportion , each claim involved a single population proportion. In Inference for Two Proportions , the claim was a statement about a treatment effect or a difference in population proportions. In “Hypothesis Test for a Population Mean,” the claims are statements about a population mean. But we will see that the steps and the logic of the hypothesis test are the same. Before we get into the details, let’s practice identifying research questions and studies that involve a population mean. Cell Phone DataCell phones and cell phone plans can be very expensive, so consumers must think carefully when choosing a cell phone and service. This decision is as much about choosing the right cellular company as it is about choosing the right phone. Many people use the data/Internet capabilities of a phone as much as, if not more than, they use voice capability. The data service of a cell company is therefore an important factor in this decision. In the following example, a student named Melanie from Los Angeles applies what she learned in her statistics class to help her make a decision about buying a data plan for her smartphone. Melanie read an advertisement from the Cell Phone Giants (CPG, for short, and yes, we’re using a fictitious company name) that she thinks is too good to be true. The CPG ad states that customers in Los Angeles get average data download speeds of 4 Mbps. With this speed, the ad claims, it takes, on average, only 12 seconds to download a typical 3minute song from iTunes. Only 12 seconds on average to download a 3minute song from iTunes! Melanie has her doubts about this claim, so she gathers data to test it. She asks a friend who uses the CPG plan to download a song, and it takes 13 seconds to download a 3minute song using the CPG network. Melanie decides to gather more evidence. She uses her friend’s phone and times the download of the same 3minute song from various locations in Los Angeles. She gets a mean download time of 13.5 seconds for her sample of downloads. What can Melanie conclude? Her sample has a mean download time that is greater than 12 seconds. Isn’t this evidence that the CPG claim is wrong? Why is a hypothesis test necessary? Isn’t the conclusion clear? Let’s review the reason Melanie needs to do a hypothesis test before she can reach a conclusion. Why should Melanie do a hypothesis test? Melanie’s data (with a mean of 13.5 seconds) suggest that the average download time overall is greater than the 12 seconds claimed by the manufacturer. But wait. We know that samples will vary. If the CPG claim is correct, we don’t expect all samples to have a mean download time exactly equal to 12 seconds. There will be variability in the sample means. But if the overall average download time is 12 seconds, how much variability in sample means do we expect to see? We need to determine if the difference Melanie observed can be explained by chance. We have to judge Melanie’s data against random samples that come from a population with a mean of 12. For this reason, we must do a simulation or use a mathematical model to examine the sampling distribution of sample means. Based on the sampling distribution, we ask, Is it likely that the samples will have mean download times that are greater than 13.5 seconds if the overall mean is 12 seconds? This probability (the Pvalue) determines whether Melanie’s data provides convincing evidence against the CPG claim. Now let’s do the hypothesis test. Step 1: Determine the hypotheses. As always, hypotheses come from the research question. The null hypothesis is a hypothesis that the population mean equals a specific value. The alternative hypothesis reflects our claim. The alternative hypothesis says the population mean is “greater than” or “less than” or “not equal to” the value we assume is true in the null hypothesis. Melanie’s hypotheses:  H 0 : It takes 12 seconds on average to download Melanie’s song from iTunes with the CPG network in Los Angeles.
 H a : It takes more than 12 seconds on average to download Melanie’s song from iTunes using the CPG network in Los Angeles.
We can write the hypotheses in terms of µ. When we do so, we should always define µ. Here μ = the average number of seconds it takes to download Melanie’s song on the CPG network in Los Angeles. Step 2: Collect the data. To conduct a hypothesis test, Melanie knows she has to use a tmodel of the sampling distribution. She thinks ahead to the conditions required, which helps her collect a useful sample. Recall the conditions for use of a tmodel.  There is no reason to think the download times are normally distributed (they might be, but this isn’t something Melanie could know for sure). So the sample has to be large (more than 30).
 The sample has to be random. Melanie decides to use one phone but randomly selects days, times, and locations in Los Angeles.
Melanie collects a random sample of 45 downloads by using her friend’s phone to download her song from iTunes according to the randomly selected days, times, and locations. Melanie’s sample of size 45 downloads has an average download time of 13.5 seconds. The standard deviation for the sample is 3.2 seconds. Now Melanie needs to determine how unlikely this data is if CPG’s claim is actually true. Step 3: Assess the evidence. Assuming the average download time for Melanie’s song is really 12 seconds, what is the probability that 45 random downloads of this song will have a mean of 13.5 seconds or more? This is a question about sampling variability. Melanie must determine the standard error. She knows the standard error of random sample means is [latex]\sigma/\sqrt{n}[/latex]. Since she has no way of knowing the population standard deviation, σ, Melanie uses the sample standard deviation, s = 3.2, as an approximation. Therefore, Melanie approximates the standard error of all sample means ( n = 45) to be [latex]s/\sqrt{n} = 3.2/\sqrt{45}=0.48[/latex] Now she can assess how far away her sample is from the claimed mean in terms of standard errors. That is, she can compute the tscore of her sample mean. T = [latex]\frac{statisticparameter}{standarderror} = \frac{\bar{x}\mu}{s/\sqrt{n}}=\frac{13.512}{0.48}[/latex]=3.14 The sample mean for Melanie’s random sample is approximately 3.14 standard errors above the overall mean of 12. We know from previous experience that a sample mean this far above µ is very unlikely. With a tscore this large, the Pvalue is very small. We use a simulation of the tmodel for 44 degrees of freedom to verify this. We want the probability that the sample mean is greater than 13.5. This corresponds to the probability that T is greater than 3.14. The Pvalue is 0.0015. Step 4: State a conclusion. Here the logic is the same as for other hypothesis tests. We use the Pvalue to make a decision. The Pvalue helps us determine if the difference we see between the data and the hypothesized value of µ is statistically significant or due to chance. One of two outcomes can occur:  One possibility is that results similar to the actual sample are extremely unlikely. This means the data does not fit with results from random samples selected from the population described by the null hypothesis. In this case, it is unlikely that the data came from this population. The probability as measured by the Pvalue is small, so we view this as strong evidence against the null hypothesis. We reject the null hypothesis in favor of the alternative hypothesis.
 The other possibility is that results similar to the actual sample are fairly likely (not unusual). This means the data fits with typical results from random samples selected from the population described by the null hypothesis. The probability as measured by the Pvalue is large. In this case, we do not have evidence against the null hypothesis, so we cannot reject it in favor of the alternative hypothesis.
Melanie’s data is very unlikely if µ = 12. The probability is essentially zero (Pvalue = 0.0015). This means we will rarely see sample means greater than 13.5 if µ = 12. So we reject the null and accept the alternative hypothesis. In other words, this sample provides strong evidence that CPG has overstated the speed of its data download capability. The following activities give you an opportunity to practice parts of the hypothesis testing process for a population mean. Later you will have the opportunity to practice the hypothesis test from start to finish. For the following scenarios, give the null and alternative hypotheses and state in words what µ represents in your hypotheses. A good definition of µ describes both the variable and the population. In the previous example, Melanie did not state a significance level for her test. If she had, the logic is the same as we used for hypothesis tests in Modules 8 and 9. To come to a conclusion about H 0 , we compare the Pvalue to the significance level α.  If P ≤ α, we reject H 0 . We conclude there is significant evidence in favor of H a .
 If P > α, we fail to reject H 0 . We conclude the sample does not provide significant evidence in favor of H a .
Use this simulation when needed to answer questions below.  Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution
Concepts in Statistics Copyright © 2023 by CUNY School of Professional Studies is licensed under a Creative Commons AttributionShareAlike 4.0 International License , except where otherwise noted. Share This BookModule 10: Inference for MeansHypothesis test for a difference in two population means (1 of 2), learning outcomes.  Under appropriate conditions, conduct a hypothesis test about a difference between two population means. State a conclusion in context.
Using the Hypothesis Test for a Difference in Two Population MeansThe general steps of this hypothesis test are the same as always. As expected, the details of the conditions for use of the test and the test statistic are unique to this test (but similar in many ways to what we have seen before.) Step 1: Determine the hypotheses.The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0 , is again a statement of “no effect” or “no difference.”  H 0 : μ 1 – μ 2 = 0, which is the same as H 0 : μ 1 = μ 2
The alternative hypothesis, H a , can be any one of the following.  H a : μ 1 – μ 2 < 0, which is the same as H a : μ 1 < μ 2
 H a : μ 1 – μ 2 > 0, which is the same as H a : μ 1 > μ 2
 H a : μ 1 – μ 2 ≠ 0, which is the same as H a : μ 1 ≠ μ 2
Step 2: Collect the data.As usual, how we collect the data determines whether we can use it in the inference procedure. We have our usual two requirements for data collection.  Samples must be random to remove or minimize bias.
 Samples must be representative of the populations in question.
We use this hypothesis test when the data meets the following conditions.  The two random samples are independent .
 The variable is normally distributed in both populations . If this variable is not known, samples of more than 30 will have a difference in sample means that can be modeled adequately by the tdistribution. As we discussed in “Hypothesis Test for a Population Mean,” tprocedures are robust even when the variable is not normally distributed in the population. If checking normality in the populations is impossible, then we look at the distribution in the samples. If a histogram or dotplot of the data does not show extreme skew or outliers, we take it as a sign that the variable is not heavily skewed in the populations, and we use the inference procedure. (Note: This is the same condition we used for the onesample ttest in “Hypothesis Test for a Population Mean.”)
Step 3: Assess the evidence. If the conditions are met, then we calculate the ttest statistic. The ttest statistic has a familiar form. [latex]T\text{}=\text{}\frac{(\mathrm{Observed}\text{}\mathrm{difference}\text{}\mathrm{in}\text{}\mathrm{sample}\text{}\mathrm{means})(\mathrm{Hypothesized}\text{}\mathrm{difference}\text{}\mathrm{in}\text{}\mathrm{population}\text{}\mathrm{means})}{\mathrm{Standard}\text{}\mathrm{error}}[/latex] [latex]T\text{}=\text{}\frac{({\stackrel{¯}{x}}_{1}{\stackrel{¯}{x}}_{2})({μ}_{1}{μ}_{2})}{\sqrt{\frac{{{s}_{1}}^{2}}{{n}_{1}}+\frac{{{s}_{2}}^{2}}{{n}_{2}}}}[/latex] Since the null hypothesis assumes there is no difference in the population means, the expression (μ 1 – μ 2 ) is always zero. As we learned in “Estimating a Population Mean,” the tdistribution depends on the degrees of freedom (df) . In the onesample and matchedpair cases df = n – 1. For the twosample ttest, determining the correct df is based on a complicated formula that we do not cover in this course. We will either give the df or use technology to find the df . With the ttest statistic and the degrees of freedom, we can use the appropriate tmodel to find the Pvalue, just as we did in “Hypothesis Test for a Population Mean.” We can even use the same simulation. Step 4: State a conclusion. To state a conclusion, we follow what we have done with other hypothesis tests. We compare our Pvalue to a stated level of significance.  If the Pvalue ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.
 If the Pvalue > α, we fail to reject the null hypothesis. We do not have enough evidence to support the alternative hypothesis.
As always, we state our conclusion in context, usually by referring to the alternative hypothesis. “Context and Calories”Does the company you keep impact what you eat? This example comes from an article titled “Impact of Group Settings and Gender on Meals Purchased by College Students” (AllenO’Donnell, M., T. C. Nowak, K. A. Snyder, and M. D. Cottingham, Journal of Applied Social Psychology 49(9), 2011, onlinelibrary.wiley.com/doi/10.1111/j.15591816.2011.00804.x/full) . In this study, researchers examined this issue in the context of genderrelated theories in their field. For our purposes, we look at this research more narrowly. Step 1: Stating the hypotheses. In the article, the authors make the following hypothesis. “The attempt to appear feminine will be empirically demonstrated by the purchase of fewer calories by women in mixedgender groups than by women in samegender groups.” We translate this into a simpler and narrower research question: Do women purchase fewer calories when they eat with men compared to when they eat with women? Here the two populations are “women eating with women” (population 1) and “women eating with men” (population 2). The variable is the calories in the meal. We test the following hypotheses at the 5% level of significance. The null hypothesis is always H 0 : μ 1 – μ 2 = 0, which is the same as H 0 : μ 1 = μ 2 . The alternative hypothesis H a : μ 1 – μ 2 > 0, which is the same as H a : μ 1 > μ 2 . Here μ 1 represents the mean number of calories ordered by women when they were eating with other women, and μ 2 represents the mean number of calories ordered by women when they were eating with men. Note: It does not matter which population we label as 1 or 2, but once we decide, we have to stay consistent throughout the hypothesis test. Since we expect the number of calories to be greater for the women eating with other women, the difference is positive if “women eating with women” is population 1. If you prefer to work with positive numbers, choose the group with the larger expected mean as population 1. This is a good general tip. Step 2: Collect Data. As usual, there are two major things to keep in mind when considering the collection of data.  Samples need to be representative of the population in question.
 Samples need to be random in order to remove or minimize bias.
Representative Samples? The researchers state their hypothesis in terms of “women.” We did the same. But the researchers gathered data by watching people eat at the HUB Rock Café II on the campus of Indiana University of Pennsylvania during the Spring semester of 2006. Almost all of the women in the data set were white undergraduates between the ages of 18 and 24, so there are some definite limitations on the scope of this study. These limitations will affect our conclusion (and the specific definition of the population means in our hypotheses.) Random Samples? The observations were collected on February 13, 2006, through February 22, 2006, between 11 a.m. and 7 p.m. We can see that the researchers included both lunch and dinner. They also made observations on all days of the week to ensure that weekly customer patterns did not confound their findings. The authors state that “since the time period for observations and the place where [they] observed students were limited, the sample was a convenience sample.” Despite these limitations, the researchers conducted inference procedures with the data, and the results were published in a reputable journal. We will also conduct inference with this data, but we also include a discussion of the limitations of the study with our conclusion. The authors did this, also. Do the data met the conditions for use of a ttest? The researchers reported the following sample statistics.  In a sample of 45 women dining with other women, the average number of calories ordered was 850, and the standard deviation was 252.
 In a sample of 27 women dining with men, the average number of calories ordered was 719, and the standard deviation was 322.
One of the samples has fewer than 30 women. We need to make sure the distribution of calories in this sample is not heavily skewed and has no outliers, but we do not have access to a spreadsheet of the actual data. Since the researchers conducted a ttest with this data, we will assume that the conditions are met. This includes the assumption that the samples are independent. As noted previously, the researchers reported the following sample statistics. To compute the ttest statistic, make sure sample 1 corresponds to population 1. Here our population 1 is “women eating with other women.” So x 1 = 850, s 1 = 252, n 1 =45, and so on. [latex]T\text{}=\text{}\frac{{\stackrel{¯}{x}}_{1}\text{}\text{−}\text{}{\stackrel{¯}{x}}_{2}}{\sqrt{\frac{{{s}_{1}}^{2}}{{n}_{1}}+\frac{{{s}_{2}}^{2}}{{n}_{2}}}}\text{}=\text{}\frac{850\text{}\text{−}\text{}719}{\sqrt{\frac{{252}^{2}}{45}+\frac{{322}^{2}}{27}}}\text{}\approx \text{}\frac{131}{72.47}\text{}\approx \text{}1.81[/latex] Using technology, we determined that the degrees of freedom are about 45 for this data. To find the Pvalue, we use our familiar simulation of the tdistribution. Since the alternative hypothesis is a “greater than” statement, we look for the area to the right of T = 1.81. The Pvalue is 0.0385. Generic Conclusion The hypotheses for this test are H 0 : μ 1 – μ 2 = 0 and H a : μ 1 – μ 2 > 0. Since the Pvalue is less than the significance level (0.0385 < 0.05), we reject H 0 and accept H a . Conclusion in context At Indiana University of Pennsylvania, the mean number of calories ordered by undergraduate women eating with other women is greater than the mean number of calories ordered by undergraduate women eating with men (Pvalue = 0.0385). Comment about ConclusionsIn the conclusion above, we did not generalize the findings to all women. Since the samples included only undergraduate women at one university, we included this information in our conclusion. But our conclusion is a cautious statement of the findings. The authors see the results more broadly in the context of theories in the field of social psychology. In the context of these theories, they write, “Our findings support the assertion that meal size is a tool for influencing the impressions of others. For traditionalage, predominantly White college women, diminished meal size appears to be an attempt to assert femininity in groups that include men.” This viewpoint is echoed in the following summary of the study for the general public on National Public Radio (npr.org).  Both men and women appear to choose larger portions when they eat with women, and both men and women choose smaller portions when they eat in the company of men, according to new research published in the Journal of Applied Social Psychology . The study, conducted among a sample of 127 college students, suggests that both men and women are influenced by unconscious scripts about how to behave in each other’s company. And these scripts change the way men and women eat when they eat together and when they eat apart.
Should we be concerned that the findings of this study are generalized in this way? Perhaps. But the authors of the article address this concern by including the following disclaimer with their findings: “While the results of our research are suggestive, they should be replicated with larger, representative samples. Studies should be done not only with primarily White, middleclass college students, but also with students who differ in terms of race/ethnicity, social class, age, sexual orientation, and so forth.” This is an example of good statistical practice. It is often very difficult to select truly random samples from the populations of interest. Researchers therefore discuss the limitations of their sampling design when they discuss their conclusions. In the following activities, you will have the opportunity to practice parts of the hypothesis test for a difference in two population means. On the next page, the activities focus on the entire process and also incorporate technology. National Health and Nutrition SurveyContribute. Improve this page Learn More  Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

COMMENTS
The hypotheses are claims about the population mean, µ. The null hypothesis is a hypothesis that the mean equals a specific value, µ 0. The alternative hypothesis is the competing claim that µ is less than, greater than, or not equal to the . When is < or > , the test is a onetailed test. When is ≠ , the test is a twotailed test.
Steps for performing Hypothesis Test of a Single Population Mean . Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure. Find or identify the sample size, n, the sample mean, \(\bar{x}\) and the sample standard deviation, s.
Hypothesis Test about the Population Mean (μ) when the Population Standard Deviation (σ) is Known. We are going to examine two equivalent ways to perform a hypothesis test: the classical approach and the pvalue approach. The classical approach is based on standard deviations. This method compares the test statistic (Zscore) to a critical ...
The hypothesis test for a population mean is a well established process: Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex]. Include appropriate units with the values of the mean. Use the form of the alternative hypothesis to determine if the test is lefttailed, righttailed, or twotailed.
In "Hypothesis Test for a Population Mean," we learn to use a sample mean to test a hypothesis about a population mean. We did hypothesis tests in earlier modules. In Inference for One Proportion, each claim involved a single population proportion. In Inference for Two Proportions, the claim was a statement about a treatment effect or a ...
Hypothesis Test about the Population Mean (μ) when the Population Standard Deviation (σ) is Unknown. Frequently, the population standard deviation (σ) is not known. We can estimate the population standard deviation (σ) with the sample standard deviation (s). However, the test statistic will no longer follow the standard normal distribution.
In this "Hypothesis Test for a Population Mean," we looked at the four steps of a hypothesis test as they relate to a claim about a population mean. Step 1: Determine the hypotheses. The hypotheses are claims about the population mean, µ. The null hypothesis is a hypothesis that the mean equals a specific value, µ 0.
Hypothesis Testing about a Population Mean. 1. State the null ( H ) and alternative (. o Hα ) hypotheses in plain English. 2. State the null and alternative hypotheses using the correct statistical measure (the value of "a" is the hypothesized mean given in the problem) • There are three possibilities: Uppertailed test  Testing the ...
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below: State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Hypothesis Testing is a key part of inferential statistics and a common tool of continuous improvement experts and quality engineers alike!In the video I tea...
Hypothesis testing applications with a continuous outcome variable in a single population are performed according to the fivestep procedure outlined above. A key component is setting up the null and research hypotheses. The objective is to compare the mean in a single population to known mean (μ 0). The known value is generally derived from ...
5.2  Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis ( H 0) and an alternative hypothesis ( H a ). Null Hypothesis. The statement that there is not a difference in the population (s), denoted as H 0.
Each histogram in the following questions represents a random sample. We do not know if the variable has a normal distribution in the population, but we want to run a ttest to test a claim about the population mean. For each histogram, choose the option that best describes how to proceed with the hypothesis test.
where μ μ denotes the mean distance between the holes. Step 2. The sample is small and the population standard deviation is unknown. Thus the test statistic is. T = x¯ −μ0 s/ n−−√ T = x ¯ − μ 0 s / n. and has the Student t t distribution with n − 1 = 4 − 1 = 3 n − 1 = 4 − 1 = 3 degrees of freedom. Step 3.
In this "Hypothesis Test for a Population Mean," we looked at the four steps of a hypothesis test as they relate to a claim about a population mean. Step 1: Determine the hypotheses. The hypotheses are claims about the population mean, µ. The null hypothesis is a hypothesis that the mean equals a specific value, µ 0.
Hypothesis Testing Calculator. The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is ...
The confidence interval estimates a population mean. In "Hypothesis Test for a Population Mean," we learn to use a sample mean to test a hypothesis about a population mean. We did hypothesis tests in earlier modules. In Inference for One Proportion, each claim involved a single population proportion. In Inference for Two Proportions, the ...
What you'll learn to do: Conduct and interpret results from a hypothesis test about a population mean. In this section we will learn to conduct a hypothesis test about a population mean and state a conclusion in context under appropriate conditions. Matched pairs design is when there is a "before and after" situation i.e. two quantitative ...
HYPOTHESIS TESTS FOR POPULATION MEAN We have a population (represented by distribution) Example. Let 𝑋 denote the age of employees working in a cannery. 𝜎 = 𝑆𝐷 𝑋 = 3. 1. Collect a sample from the population 20, 23, 15, 25, 26, 28, 30 2.
Hypothesis Testing for Population Mean with Known and Unknown Population Standard Deviation Hypothesis tests are used to make decisions or judgments about the value of a parameter, such as the population mean. There are two approaches for conducting a hypothesis test; the critical value approach and the Pvalue approach.
9.3.E: Hypothesis Testing with One Sample (Exercises) 9.4: PowerPoints 9: Hypothesis Testing about Population Mean and Proportion is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.
Step 1: Determine the hypotheses. The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0, is again a statement of "no effect" or "no difference.". H 0: μ 1  μ 2 = 0, which is the same as H 0: μ 1 = μ 2. The alternative hypothesis, H a ...
Step 1: Determine the hypotheses. The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0, is again a statement of "no effect" or "no difference.". H 0: μ 1  μ 2 = 0, which is the same as H 0: μ 1 = μ 2. The alternative hypothesis, H a ...