## Mean Difference / Difference in Means (MD)

- What is a Mean Difference?

## Standardized Mean Difference

Hypothesized mean difference.

- Sampling Distribution of the Difference Between Means

## Testing for Differences Between Means

What is a mean difference (md).

The mean difference, or difference in means, measures the absolute difference between the mean value in two different groups. In clinical trials, it gives you an idea of how much difference there is between the averages of the experimental group and control groups.

Note : Although a lot of authors use the term mean difference, it makes more intuitive sense to say difference between means . That’s because you aren’t actually calculating any means; You’ll already have two or more means, and all you need to do is find a difference between them. In other words, you’re finding a difference between means and not a mean of differences.

## Why “Absolute Differences?”

Sometimes you’ll want to compare means between groups but you can’t because they have different unit measurements. For example, studies measuring depression might use different depression rating scales . The standardized mean difference (SMD) is a way to measure effect size ; it standardizes test results so that they can be compared. For example, a SMD of 0.60 based on outcomes A from one study is equal in comparison to a SMD of 0.60 calculated on the same outcome A in a separate study (SMDs are typically rounded off to two decimal places).

The general formula is:

SMD = Difference in mean outcome between groups / Standard deviation of outcome among participants

However, the formula differs slightly according to which SMD statistic you use. For example, the Cohen’s D version uses a pooled standard deviation while the Hedges’ g version uses a weighted and pooled standard deviation.

The hypothesized mean difference comes up in programs like Excel, when you run certain tests (like a t-test ). You’re basically telling the program what’s in your hypothesis statements, so you must know your null hypothesis . For example, let’s say you had the following hypothesis statements:

- Null Hypothesis: M1 – M2 = 10
- Alternative Hypothesis: M1 – M2 ≠ 10

You’ll put 10 in the hypothesized mean difference box, because that’s what your null hypothesis states. If you hypothesize there’s no difference, enter 0. Excel doesn’t allow negative values here, so if you suspect there’s a negative difference, switch your variables around (so you’ll actually be testing for a positive difference).

## Sampling distribution of the difference between means

The sampling distribution of the difference between means is all possible differences a set of two means can have. The formula for the mean of the sampling distribution of the difference between means is: μ m 1 – m 2 = μ 1 – μ 2 For example, let’s say the mean score on a depression test for a group of 100 middle-aged men is 35 and for 100 middle-aged women it is 25. If you took a large number of samples from both these groups and calculated the mean differences, the mean of all of the differences between all sample means would be 35 – 25 = 10.

On its own, the mean difference doesn’t tell you a lot (other than give you a number for the difference). The number may be statistically significant , or it could just be due to random variations or chance. In order to test the hypothesis that your results could be significant, run a hypothesis test for differences between means.

To compare two independent means, run a two-sample t test . This test assumes that the variances for both samples are equal. If they are not, run Welch’s test for unequal variances instead.

For dependent samples (i.e. samples that are connected in some way) run a paired samples t-test.

Agresti A. (1990) Categorical Data Analysis. John Wiley and Sons, New York. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial. Klein, G. (2013). The Cartoon Introduction to Statistics. Hill & Wamg. Levine, D. (2014). Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics 3rd Edition. Pearson FT Press

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

## AP®︎/College Statistics

Course: ap®︎/college statistics > unit 11.

- Hypotheses for a two-sample t test
- Example of hypotheses for paired and two-sample t tests

## Writing hypotheses to test the difference of means

- Two-sample t test for difference of means
- Test statistic in a two-sample t test
- P-value in a two-sample t test
- Conclusion for a two-sample t test using a P-value
- Conclusion for a two-sample t test using a confidence interval
- Making conclusions about the difference of means

- (Choice A) Paired t test with H a : μ after − before > 0 A Paired t test with H a : μ after − before > 0
- (Choice B) Paired t test with H a : μ after − before ≠ 0 B Paired t test with H a : μ after − before ≠ 0
- (Choice C) Two-sample t test with H a : μ before > μ after C Two-sample t test with H a : μ before > μ after
- (Choice D) Two-sample t test with H a : μ before ≠ μ after D Two-sample t test with H a : μ before ≠ μ after
- (Choice E) Two-sample t test with H a : μ before < μ after E Two-sample t test with H a : μ before < μ after

Module 10: Inference for Means

## Hypothesis Test for a Difference in Two Population Means (1 of 2)

Learning outcomes.

- Under appropriate conditions, conduct a hypothesis test about a difference between two population means. State a conclusion in context.

## Using the Hypothesis Test for a Difference in Two Population Means

The general steps of this hypothesis test are the same as always. As expected, the details of the conditions for use of the test and the test statistic are unique to this test (but similar in many ways to what we have seen before.)

## Step 1: Determine the hypotheses.

The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0 , is again a statement of “no effect” or “no difference.”

- H 0 : μ 1 – μ 2 = 0, which is the same as H 0 : μ 1 = μ 2

The alternative hypothesis, H a , can be any one of the following.

- H a : μ 1 – μ 2 < 0, which is the same as H a : μ 1 < μ 2
- H a : μ 1 – μ 2 > 0, which is the same as H a : μ 1 > μ 2
- H a : μ 1 – μ 2 ≠ 0, which is the same as H a : μ 1 ≠ μ 2

## Step 2: Collect the data.

As usual, how we collect the data determines whether we can use it in the inference procedure. We have our usual two requirements for data collection.

- Samples must be random to remove or minimize bias.
- Samples must be representative of the populations in question.

We use this hypothesis test when the data meets the following conditions.

- The two random samples are independent .
- The variable is normally distributed in both populations . If this variable is not known, samples of more than 30 will have a difference in sample means that can be modeled adequately by the t-distribution. As we discussed in “Hypothesis Test for a Population Mean,” t-procedures are robust even when the variable is not normally distributed in the population. If checking normality in the populations is impossible, then we look at the distribution in the samples. If a histogram or dotplot of the data does not show extreme skew or outliers, we take it as a sign that the variable is not heavily skewed in the populations, and we use the inference procedure. (Note: This is the same condition we used for the one-sample t-test in “Hypothesis Test for a Population Mean.”)

Step 3: Assess the evidence.

If the conditions are met, then we calculate the t-test statistic. The t-test statistic has a familiar form.

[latex]T=\frac{Observeddifferenceinsamplemeans-Hypothesizeddiferenceinpopulationmeans}{ standarderror}[/latex]

[latex]T=\frac{(\bar{x}_{1}-\bar{x}_{2})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}}+\frac{s_{2}^{2}}{n_{2}}}[/latex]

Since the null hypothesis assumes there is no difference in the population means, the expression (μ 1 – μ 2 ) is always zero.

As we learned in “Estimating a Population Mean,” the t-distribution depends on the degrees of freedom (df) . In the one-sample and matched-pair cases df = n – 1. For the two-sample t-test, determining the correct df is based on a complicated formula that we do not cover in this course. We will either give the df or use technology to find the df . With the t-test statistic and the degrees of freedom, we can use the appropriate t-model to find the P-value, just as we did in “Hypothesis Test for a Population Mean.” We can even use the same simulation.

Step 4: State a conclusion.

To state a conclusion, we follow what we have done with other hypothesis tests. We compare our P-value to a stated level of significance.

- If the P-value ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.
- If the P-value > α, we fail to reject the null hypothesis. We do not have enough evidence to support the alternative hypothesis.

As always, we state our conclusion in context, usually by referring to the alternative hypothesis.

## “Context and Calories”

Does the company you keep impact what you eat? This example comes from an article titled “Impact of Group Settings and Gender on Meals Purchased by College Students” (Allen-O’Donnell, M., T. C. Nowak, K. A. Snyder, and M. D. Cottingham, Journal of Applied Social Psychology 49(9), 2011, onlinelibrary.wiley.com/doi/10.1111/j.1559-1816.2011.00804.x/full) . In this study, researchers examined this issue in the context of gender-related theories in their field. For our purposes, we look at this research more narrowly.

Step 1: Stating the hypotheses.

In the article, the authors make the following hypothesis. “The attempt to appear feminine will be empirically demonstrated by the purchase of fewer calories by women in mixed-gender groups than by women in same-gender groups.” We translate this into a simpler and narrower research question: Do women purchase fewer calories when they eat with men compared to when they eat with women?

Here the two populations are “women eating with women” (population 1) and “women eating with men” (population 2). The variable is the calories in the meal. We test the following hypotheses at the 5% level of significance.

The null hypothesis is always H 0 : μ 1 – μ 2 = 0, which is the same as H 0 : μ 1 = μ 2 .

The alternative hypothesis H a : μ 1 – μ 2 > 0, which is the same as H a : μ 1 > μ 2 .

Here μ 1 represents the mean number of calories ordered by women when they were eating with other women, and μ 2 represents the mean number of calories ordered by women when they were eating with men.

Note: It does not matter which population we label as 1 or 2, but once we decide, we have to stay consistent throughout the hypothesis test. Since we expect the number of calories to be greater for the women eating with other women, the difference is positive if “women eating with women” is population 1. If you prefer to work with positive numbers, choose the group with the larger expected mean as population 1. This is a good general tip.

Step 2: Collect Data.

As usual, there are two major things to keep in mind when considering the collection of data.

- Samples need to be representative of the population in question.
- Samples need to be random in order to remove or minimize bias.

Representative Samples?

The researchers state their hypothesis in terms of “women.” We did the same. But the researchers gathered data by watching people eat at the HUB Rock Café II on the campus of Indiana University of Pennsylvania during the Spring semester of 2006. Almost all of the women in the data set were white undergraduates between the ages of 18 and 24, so there are some definite limitations on the scope of this study. These limitations will affect our conclusion (and the specific definition of the population means in our hypotheses.)

Random Samples?

The observations were collected on February 13, 2006, through February 22, 2006, between 11 a.m. and 7 p.m. We can see that the researchers included both lunch and dinner. They also made observations on all days of the week to ensure that weekly customer patterns did not confound their findings. The authors state that “since the time period for observations and the place where [they] observed students were limited, the sample was a convenience sample.” Despite these limitations, the researchers conducted inference procedures with the data, and the results were published in a reputable journal. We will also conduct inference with this data, but we also include a discussion of the limitations of the study with our conclusion. The authors did this, also.

Do the data meet the conditions for use of a t-test?

The researchers reported the following sample statistics.

- In a sample of 45 women dining with other women, the average number of calories ordered was 850, and the standard deviation was 252.
- In a sample of 27 women dining with men, the average number of calories ordered was 719, and the standard deviation was 322.

One of the samples has fewer than 30 women. We need to make sure the distribution of calories in this sample is not heavily skewed and has no outliers, but we do not have access to a spreadsheet of the actual data. Since the researchers conducted a t-test with this data, we will assume that the conditions are met. This includes the assumption that the samples are independent.

As noted previously, the researchers reported the following sample statistics.

To compute the t-test statistic, make sure sample 1 corresponds to population 1. Here our population 1 is “women eating with other women.” So x 1 = 850, s 1 = 252, n 1 =45, and so on.

[latex]T=\frac{\bar{x}_{1}-\bar{x}_{2}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}}+\frac{s_{2}^{2}}{n_{2}}}= \frac{850-719}{\sqrt{\frac{252^{2}}{45}+\frac{322^{2}}{27}}}\approx \frac{131}{72.47}\approx 1.81[/latex]

Using technology, we determined that the degrees of freedom are about 45 for this data. To find the P-value, we use our familiar simulation of the t-distribution. Since the alternative hypothesis is a “greater than” statement, we look for the area to the right of T = 1.81. The P-value is 0.0385.

Generic Conclusion

The hypotheses for this test are H 0 : μ 1 – μ 2 = 0 and H a : μ 1 – μ 2 > 0. Since the P-value is less than the significance level (0.0385 < 0.05), we reject H 0 and accept H a .

Conclusion in context

At Indiana University of Pennsylvania, the mean number of calories ordered by undergraduate women eating with other women is greater than the mean number of calories ordered by undergraduate women eating with men (P-value = 0.0385).

## Comment about Conclusions

In the conclusion above, we did not generalize the findings to all women. Since the samples included only undergraduate women at one university, we included this information in our conclusion. But our conclusion is a cautious statement of the findings. The authors see the results more broadly in the context of theories in the field of social psychology. In the context of these theories, they write, “Our findings support the assertion that meal size is a tool for influencing the impressions of others. For traditional-age, predominantly White college women, diminished meal size appears to be an attempt to assert femininity in groups that include men.” This viewpoint is echoed in the following summary of the study for the general public on National Public Radio (npr.org).

- Both men and women appear to choose larger portions when they eat with women, and both men and women choose smaller portions when they eat in the company of men, according to new research published in the Journal of Applied Social Psychology . The study, conducted among a sample of 127 college students, suggests that both men and women are influenced by unconscious scripts about how to behave in each other’s company. And these scripts change the way men and women eat when they eat together and when they eat apart.

Should we be concerned that the findings of this study are generalized in this way? Perhaps. But the authors of the article address this concern by including the following disclaimer with their findings: “While the results of our research are suggestive, they should be replicated with larger, representative samples. Studies should be done not only with primarily White, middle-class college students, but also with students who differ in terms of race/ethnicity, social class, age, sexual orientation, and so forth.” This is an example of good statistical practice. It is often very difficult to select truly random samples from the populations of interest. Researchers therefore discuss the limitations of their sampling design when they discuss their conclusions.

In the following activities, you will have the opportunity to practice parts of the hypothesis test for a difference in two population means. On the next page, the activities focus on the entire process and also incorporate technology.

## National Health and Nutrition Survey

- Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

Concepts in Statistics Copyright © 2023 by CUNY School of Professional Studies is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

## Share This Book

## Difference in Means Hypothesis Test Calculator

Use the calculator below to analyze the results of a difference in sample means hypothesis test. Enter your sample means, sample standard deviations, sample sizes, hypothesized difference in means, test type, and significance level to calculate your results.

You will find a description of how to conduct a two sample t-test below the calculator.

## Define the Two Sample t-test

Significance Level | Difference in Means | |
---|---|---|

t-score | ||

Probability |

## The Difference Between the Sample Means Under the Null Distribution

Conducting a hypothesis test for the difference in means.

When two populations are related, you can compare them by analyzing the difference between their means.

A hypothesis test for the difference in samples means can help you make inferences about the relationships between two population means.

## Testing for a Difference in Means

For the results of a hypothesis test to be valid, you should follow these steps:

## Check Your Conditions

State your hypothesis, determine your analysis plan, analyze your sample, interpret your results.

To use the testing procedure described below, you should check the following conditions:

- Independence of Samples - Your samples should be collected independently of one another.
- Simple Random Sampling - You should collect your samples with simple random sampling. This type of sampling requires that every occurrence of a value in a population has an equal chance of being selected when taking a sample.
- Normality of Sample Distributions - The sampling distributions for both samples should follow the Normal or a nearly Normal distribution. A sampling distribution will be nearly Normal when the samples are collected independently and when the population distribution is nearly Normal. Generally, the larger the sample size, the more normally distributed the sampling distribution. Additionally, outlier data points can make a distribution less Normal, so if your data contains many outliers, exercise caution when verifying this condition.

You must state a null hypothesis and an alternative hypothesis to conduct an hypothesis test of the difference in means.

The null hypothesis is a skeptical claim that you would like to test.

The alternative hypothesis represents the alternative claim to the null hypothesis.

Your null hypothesis and alternative hypothesis should be stated in one of three mutually exclusive ways listed in the table below.

Null Hypothesis | Alternative Hypothesis | Number of Tails | Description |
---|---|---|---|

- μ = D | - μ ≠ D | Tests whether the sample means come from populations with a difference in means equal to D. If D = 0, then tests if the samples come from populations with means that are different from each other. | |

- μ ≤ D | - μ > D | Tests whether sample one comes from a population with a mean that is greater than sample two's population mean by a difference of D. If D = 0, then tests if sample one comes from a population with a mean greater than sample two's population mean. | |

- μ ≥ D | - μ < D | Tests whether sample one comes from a population with a mean that is less than sample two's population mean by a difference of D. If D = 0, then tests if sample one comes from a population with a mean less than sample two's population mean. |

D is the hypothesized difference between the populations' means that you would like to test.

Before conducting a hypothesis test, you must determine a reasonable significance level, α, or the probability of rejecting the null hypothesis assuming it is true. The lower your significance level, the more confident you can be of the conclusion of your hypothesis test. Common significance levels are 10%, 5%, and 1%.

To evaluate your hypothesis test at the significance level that you set, consider if you are conducting a one or two tail test:

- Two-tail tests divide the rejection region, or critical region, evenly above and below the null distribution, i.e. to the tails of the null sampling distribution. For example, in a two-tail test with a 5% significance level, your rejection region would be the upper and lower 2.5% of the null distribution. An alternative hypothesis of μ 1 - μ 2 ≠ D requires a two tail test.
- One-tail tests place the rejection region entirely on one side of the distribution i.e. to the right or left tail of the null distribution. For example, in a one-tail test evaluating if the actual difference in means, D, is above the null distribution with a 5% significance level, your rejection region would be the upper 5% of the null distribution. μ 1 - μ 2 > D and μ 1 - μ 2 < D alternative hypotheses require one-tail tests.

The graphical results section of the calculator above shades rejection regions blue.

After checking your conditions, stating your hypothesis, determining your significance level, and collecting your sample, you are ready to analyze your hypothesis.

Sample means follow the Normal distribution with the following parameters:

- The Difference in the Population Means, D - The true difference in the population means is unknown, but we use the hypothesized difference in the means, D, from the null hypothesis in the calculations.
- The Standard Error, SE - The standard error of the difference in the sample means can be computed as follows: SE = (s 1 2 /n 1 + s 2 2 /n 2 ) (1/2) with s 1 being the standard deviation of sample one, n 1 being the sample size of sample one, s 2 being the standard deviation of sample one, and n 2 being the sample size of sample two. The standard error defines how differences in sample means are expected to vary around the null difference in means sampling distribution given the sample sizes and under the assumption that the null hypothesis is true.
- The Degrees of Freedom, DF - The degrees of freedom calculation can be estimated as the smaller of n 1 - 1 or n 2 - 1. For more accurate results, use the following formula for the degrees of freedom (DF): DF = (s 1 2 /n 1 + s 2 2 /n 2 ) 2 / ((s 1 2 /n 1 ) 2 / (n 1 - 1) + (s 2 2 /n 2 ) 2 / (n 2 - 1))

In a difference in means hypothesis test, we calculate the probability that we would observe the difference in sample means (x̄ 1 - x̄ 2 ), assuming the null hypothesis is true, also known as the p-value . If the p-value is less than the significance level, then we can reject the null hypothesis.

You can determine a precise p-value using the calculator above, but we can find an estimate of the p-value manually by calculating the t-score, or t-statistic, as follows: t = (x̄ 1 - x̄ 2 - D) / SE

The t-score is a test statistic that tells you how far our observation is from the null hypothesis's difference in means under the null distribution. Using any t-score table, you can look up the probability of observing the results under the null distribution. You will need to look up the t-score for the type of test you are conducting, i.e. one or two tail. A hypothesis test for the difference in means is sometimes known as a two sample mean t-test because of the use of a t-score in analyzing results.

The conclusion of a hypothesis test for the difference in means is always either:

- Reject the null hypothesis
- Do not reject the null hypothesis

If you reject the null hypothesis, you cannot say that your sample difference in means is the true difference between the means. If you do not reject the null hypothesis, you cannot say that the hypothesized difference in means is true.

A hypothesis test is simply a way to look at evidence and conclude if it provides sufficient evidence to reject the null hypothesis.

## Example: Hypothesis Test for the Difference in Two Means

Let’s say you are a manager at a company that designs batteries for smartphones. One of your engineers believes that she has developed a battery that will last more than two hours longer than your standard battery.

Before you can consider if you should replace your standard battery with the new one, you need to test the engineer’s claim. So, you decided to run a difference in means hypothesis test to see if her claim that the new battery will last two hours longer than the standard one is reasonable.

You direct your team to run a study. They will take a sample of 100 of the new batteries and compare their performance to 1,000 of the old standard batteries.

- Check the conditions - Your test consists of independent samples . Your team collects your samples using simple random sampling , and you have reason to believe that all your batteries' performances are always close to normally distributed . So, the conditions are met to conduct a two sample t-test.
- State Your Hypothesis - Your null hypothesis is that the charge of the new battery lasts at most two hours longer than your standard battery (i.e. μ 1 - μ 2 ≤ 2). Your alternative hypothesis is that the new battery lasts more than two hours longer than the standard battery (i.e. μ 1 - μ 2 > 2).
- Determine Your Analysis Plan - You believe that a 1% significance level is reasonable. As your test is a one-tail test, you will evaluate if the difference in mean charge between the samples would occur at the upper 1% of the null distribution.
- Analyze Your Sample - After collecting your samples (which you do after steps 1-3), you find the new battery sample had a mean charge of 10.4 hours, x̄ 1 , with a 0.8 hour standard deviation, s 1 . Your standard battery sample had a mean charge of 8.2 hours, x̄ 2 , with a standard deviation of 0.2 hours, s 2 . Using the calculator above, you find that a difference in sample means of 2.2 hours [2 = 10.4 – 8.2] would results in a t-score of 2.49 under the null distribution, which translates to a p-value of 0.72%.
- Interpret Your Results - Since your p-value of 0.72% is less than the significance level of 1%, you have sufficient evidence to reject the null hypothesis.

In this example, you found that you can reject your null hypothesis that the new battery design does not result in more than 2 hours of extra battery life. The test does not guarantee that your engineer’s new battery lasts two hours longer than your standard battery, but it does give you strong reason to believe her claim.

## Hypothesis test for a difference between means

This document is prepared automatically using the following R command.

library(interpretCI) |

## Given Problem : One-Tailed Test

The local baseball team conducts a study to find the amount spent on refreshments at the ball park. Over the course of the season they gather simple random samples of 100 men and 100 women. For men, the average expenditure was $200, with a standard deviation of $40. For women, it was $190, with a standard deviation of $20. |

## Hypothesis test

This lesson explains how to conduct a hypothesis test for the difference between two means. The test procedure, called the two-sample t-test, is appropriate when the following conditions are met:

The sampling method for each sample is simple random sampling.

The samples are independent.

Each population is at least 20 times larger than its respective sample.

The sampling distribution is approximately normal, which is generally the case if any of the following conditions apply.

The population distribution is normal.

The population data are symmetric, unimodal, without outliers, and the sample size is 15 or less.

The population data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40.

The sample size is greater than 40, without outliers.

## This approach consists of four steps:

state the hypotheses

formulate an analysis plan

analyze sample data

interpret results.

## 1. State the hypotheses

The first step is to state the null hypothesis and an alternative hypothesis.

\[Null\ hypothesis(H_0): \mu_1-\mu_2 <= 7\] \[Alternative\ hypothesis(H_1): \mu_1-\mu_2 > 7\]

Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected if the difference between sample means is too big..

## 2. Formulate an analysis plan.

For this analysis, the significance level is 95%. Using sample data, we will conduct a two-sample t-test of the null hypothesis.

## 3. Analyze sample data

Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t statistic test statistic (t).

\[SE=\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\] \[SE=\sqrt{\frac{40^2}{100}+\frac{20^2}{100}}\] \[SE=4.472\]

\[DF=\frac{(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2})^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}\]

\[DF=\frac{(\frac{40^2}{100}+\frac{20^2}{100})^2}{\frac{(40^2/100)^2}{100-1}+\frac{(20^2/100)^2}{100-1}}\] \[DF=145.59\]

\[t=\frac{(\bar{x_1}-\bar{x_2})-d}{SE} = \frac{(200 -190)-7}{4.472}=0.671\]

where \(s_1\) is the standard deviation of sample 1, \(s_2\) is the standard deviation of sample 2, \(n_1\) is the size of sample 1, \(n_2\) is the size of sample 2, \(\bar{x_1}\) is the mean of sample 1, \(\bar{x_2}\) is the mean of sample 2, d is the hypothesized difference between population means, and SE is the standard error.

We can plot the mean difference.

Since we have a one-tailed test, the P-value is the probability that the t statistic having 145.59 degrees of freedom is or greater than 0.67.

We use the t Distribution curve to find p value.

## 4. Interpret results.

Since the P-value (0.252) is greater than the significance level (0.05), we cannot reject the null hypothesis.

## Result of meanCI()

The contents of this document are modified from StatTrek.com. Berman H.B., “AP Statistics Tutorial”, [online] Available at: https://stattrek.com/hypothesis-test/difference-in-means.aspx?tutorial=AP URL[Accessed Data: 1/23/2022].

Teach yourself statistics

## Hypothesis Test for a Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:

- The sampling method is simple random sampling .
- The sampling distribution is normal or nearly normal.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

- The population distribution is normal.
- The population distribution is symmetric , unimodal , without outliers , and the sample size is 15 or less.
- The population distribution is moderately skewed , unimodal, without outliers, and the sample size is between 16 and 40.
- The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

## State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

The table below shows three sets of hypotheses. Each makes a statement about how the population mean μ is related to a specified value M . (In the table, the symbol ≠ means " not equal to ".)

Set | Null hypothesis | Alternative hypothesis | Number of tails |
---|---|---|---|

1 | μ = M | μ ≠ M | 2 |

2 | μ M | μ < M | 1 |

3 | μ M | μ > M | 1 |

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

## Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

- Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
- Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

## Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.

SE = s * sqrt{ ( 1/n ) * [ ( N - n ) / ( N - 1 ) ] }

SE = s / sqrt( n )

- Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one. Thus, DF = n - 1.

t = ( x - μ) / SE

- P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

## Sample Size Calculator

As you probably noticed, the process of hypothesis testing can be complex. When you need to test a hypothesis about a mean score, consider using the Sample Size Calculator. The calculator is fairly easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

## Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

## Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test of a mean score. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. From his stock of 2000 engines, the inventor selects a simple random sample of 50 engines for testing. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

Null hypothesis: μ = 300

Alternative hypothesis: μ ≠ 300

- Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a one-sample t-test .

SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83

DF = n - 1 = 50 - 1 = 49

t = ( x - μ) / SE = (295 - 300)/2.83 = -1.77

where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the t statistic having 49 degrees of freedom is less than -1.77 or greater than 1.77. We use the t Distribution Calculator to find P(t < -1.77) is about 0.04.

- If you enter 1.77 as the sample mean in the t Distribution Calculator, you will find the that the P(t < 1.77) is about 0.04. Therefore, P(t > 1.77) is 1 minus 0.96 or 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
- Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the population was normally distributed, and the sample size was small relative to the population size (less than 5%).

Problem 2: One-Tailed Test

Bon Air Elementary School has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the population of engines are normally distributed.)

Null hypothesis: μ >= 110

Alternative hypothesis: μ < 110

- Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a one-sample t-test .

SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236

DF = n - 1 = 20 - 1 = 19

t = ( x - μ) / SE = (108 - 110)/2.236 = -0.894

Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis.

The observed sample mean produced a t statistic test statistic of -0.894. We use the t Distribution Calculator to find P(t < -0.894) is about 0.19.

- This means we would expect to find a sample mean of 108 or smaller in 19 percent of our samples, if the true population IQ were 110. Thus the P-value in this analysis is 0.19.
- Interpret results . Since the P-value (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.

## User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident

## Keyboard Shortcuts

10.3 - tests for differences.

Significance tests are often used examine the difference between groups in comparative experiments and observational studies. We still use the same four basic steps to carry out the test. Here are two examples.

## Example 10.11: Biases in Academic Hiring Section

In a September 2014 paper in the Proceedings of the National Academy of Science, researchers from Cornell University examined how the lifestyles of job candidates might affect how they are evaluated by hiring committees for academic jobs in the sciences. In this experiment 144 professors on hiring committees (80 men and 64 women) were shown two applicant files with equivalent qualifications in terms of published research, teaching abilities, and professional service. However, one file was for a divorced female candidate with two children while the other file was for a married male candidate with two children and a non-working spouse. The results of the hiring preferences exhibited are given in Table 10.1 (note - the actual research report examined a wide variety of gender and lifestyle cases - here we are only showing one comparison studied) .

45 (70.3%) | 19 (29.7%) | 64 | |

34 (42.5%) | 46 (57.5%) | 80 | |

79 (54.9%) | 65 (45.1%) | 144 |

Research Question : Does the gender of the evaluator affect the way they would view the fictional divorced female versus the fictional married male candidates?

- Null Hypothesis : The gender of the evaluator does not affect the population proportion of evaluators who prefer the female candidate \(( p_{\text{females}} = p_{\text{males}})\) or \(( p_{\text{females}} - p_{\text{males}}=0)\)
- Alternative Hypothesis : The gender of the evaluator does affect the population proportion of evaluators who prefer the female candidate \(( p_{\text{females}} \neq p_{\text{males}})\) or \(( p_{\text{females}} - p_{\text{males}}\neq 0)\) This is a two sided alternative.

The sample proportion for the female evaluators was 0.703 while the sample proportion for the male evaluators was 0.425; a difference of 0.703 - 0.425 = 0.278. If the null hypothesis is true then p females = p males and the best estimate of this common overall probability of preferring the divorced female candidate would be 0.549. Thus, the standard error for the difference in proportions under the null hypothesis would be

Thus the standardized test statistic would be (0.278 - 0) / 0.0834 = 3.33.

Since the sample size is fairly large for each group the difference between the two sample proportions would follow the normal curve. Since this is a two-sided alternative, we calculate the p -value by considering both the area above 3.33 and the area below -3.33 on the normal curve (this comes out to about 0.00045 + 0.00045 = 0.0009).

Interpretation of the p -value. The likelihood of getting our test statistic of 3.33 or any more extreme value (like those above it or below -3.33), if in fact, the null hypothesis is true, is about 0.0009; a bit less than one-tenth of one percent.

Since the p -value is so small the results are highly significant; the null hypothesis provides a poor explanation of the data. We have good evidence that there is an association between the gender of the evaluator and the preferences they would hold between candidates with these gender and lifestyle combinations.

## Example 10.12: Don't Drink from the Blue Mug Section

In a November 2014 article in the journal Flavour , researchers from the University of Oxford in England and the Federation University of Australia investigated whether the aroma of a cup of coffee might be affected by the color of the mug they drink it from. In the experiment, 12 people were randomly selected to drink their coffee from a mug with a blue sleeve and 12 were randomly selected to drink from a mug with a white sleeve (Figure 10.1 shows the mugs used). The subjects were asked to subjectively rate the coffee's aroma on a hundred point scale. The coffee in the white sleeve mugs received an average rating of 57.33 with a standard deviation of 16.27 while the coffee in the blue sleeved mugs received an average rating of 35.57 with a standard deviation of 25.34.

Research Question : Does the color of a mug affect the perceived aroma of the coffee inside the mug?

Explanatory variable: the color of the mug (blue or white)

Response variable: subjective rating of aroma on 100 point scale

- Null Hypothesis : The color of the mug does not affect the population average aroma rating (mean blue = mean white or mean blue - mean blue = 0 ).
- Alternative Hypothesis : The color of the mug does affect the population average aroma rating (mean blue = mean white or mean blue - mean blue = 0 ). This is a two-sided alternative.

The sample mean for the white cups was 57.33 while the sample mean for the blue cups was 35.57; a difference of 57.33 - 35.57 = 21.76. The standard error of the mean for the white cups was \(\frac{16.27}{\sqrt{12}}= 4.70\) while the standard error for the blue cups was \(\frac{25.34}{\sqrt{12}}= 7.32\). Thus, the standard error for the difference in proportions under the null hypothesis would be \(\sqrt{(4.70)^{2} + (7.32)^{2}}= 8.70\).

Finally, the standardized test statistic would be (21.76 - 0) / 8.70 = 2.5

While the difference in the sample means might nearly follow the normal curve, the estimate of the standard error of the differences might be a bit off from the actual standard error so the use of the t- curve, rather than the normal curve, would be the appropriate reference distribution to calculate the p-value. Since this is a two-sided alternative, we calculate the p -value by considering both the area above 2.5 and the area below -2.5 on the t curve with sample sizes of 12 in each group (this comes out to about 0.01 + 0.01 = 0.02).

Interpretation of the p -value. The likelihood of getting our test statistic of 2.5 or any more extreme value (like those above it or below -2.5), if, in fact, the null hypothesis is true, is about 2%.

Since the p -value is less than 5% so the results might be considered significant; the null hypothesis provides a fairly poor explanation of the data. We have some evidence that there is an association between the color of the mug and the perceived aroma of the coffee.

8.4 Statistical Test for Difference of Population Means Our last (!) test applies to differences of means. Such tests are very common when you conduct a study involving two groups. In many medical trials, for example, subjects are randomly divided into two groups. One group receives a new drug, the second receives a placebo (sugar pill). Then the researcher measures any differences between the two groups. Fortunately, we know how to do Hypothesis testing, and in this case we will exclusively use Excel to perform the caluclations for us. Here is the setup for this test: Null Hypothesis : two means M 1 and M 2 differ by a fixed amount c, i.e. M 1 - M 2 = c Alternative Hypothesis : the two means M 1 and M 2 do not differ by the amount c, i.e. M 1 - M 2 not equal to c (2-tail) Test Statistics : as computed by Excel Rejection Region : probability as computed by Excel Example 1: Two procedures to determine the amylase in human body fluids were studied. The "original" method is considered to be an acceptable standard method, while the "new" method uses a smaller volume of water, making it more convenient as well as more economical. It is claimed that the amylase values obtained by the new method average at least 10 units greater than the orresponding values from the orignal method. A test using the original method was conducted on 14 subjects, the test with the new method on 15 subjects, giving the data displayed in the table below. Test the claim at the 1% level. Original New 38 46 48 57 58 73 53 60 75 86 58 67 59 65 46 58 69 85 59 74 81 96 44 55 56 71 50 63 74

- Null Hypothesis : M 1 - M 2 = 10
- Alternative Hypothesis : M 1 - M 2 not equal to 10
- In the Variable 1 Range : enter the range for the data from the "New" method (column B)
- In the Variable 2 Range : enter the range for the data from the "Original" method (column A)
- In the Hypothesized Mean Difference : enter the number 10
- For the Alpha value: enter the number 0.01
- Make sure to check the Labels box and click on Okay.
- Test Statistics : as computed by Excel, t = 0.4169
- Rejection Region : probability as computed by Excel: p = 0.68 (2-tail)
- Excel requires that the hypothesized difference is not negative. If you want to test for a negative difference, switch the variables around and the difference will be positive.
- The actual difference, for this data, is 68.66 - 56.71 = 11.95. That difference is different from 10, but not significantly different, according to our test.
- Null Hypothesis : M 1 - M 2 = 0
- Alternative Hypothesis : M 1 - M 2 not equal to 0
- Test Statistics : as computed by Excel, t = 2.55242
- Rejection Region : probability as computed by Excel: p = 0.016668 (2-tail)
- if women are variable 1 and men are variable 2, then women making $10,000 less than men means M 1 - M 2 = -10000
- if men are variable 1 and women are variable 2, then women making $10,000 less than men means M 1 - M 2 = 10000
- Null Hypothesis : M 1 - M 2 = 10000
- Alternative Hypothesis : M 1 - M 2 not equal to 10000
- Test Statistics : as computed by Excel, t = 4.10335
- Rejection Region : probability as computed by Excel: p = 5.089E-05 (2-tail)

## A two-sample t-test for a difference in means will be conducted to investigate whether the average amount of money spent per customer at Department Store M is different from that at Department Store V. From a random sample of 35 customers at Store M, the average amount spent was $300 with standard deviation $40. From a random sample of 40 customers at Store V, the average amount spent was $290 with standard deviation $35. Assuming a null hypothesis of no difference in population means, what is the test statistic for the appropriate test to investigate whether there is a difference in population means?

The test statistic is 1.145 for the appropriate test to investigate whether there is a difference in population means.

A test statistic is a numerical value calculated from sample data in hypothesis testing.

Sample 1: Store M

Sample size , [tex]n_1[/tex] = 35,

Sample mean, [tex]\bar{X_1}[/tex] = 300,

Standard deviation , [tex]s_1[/tex] = 40.

Sample 2: Store V

Sample size, [tex]n_2[/tex] = 40,

Sample mean , [tex]\bar{X_2}[/tex] = 290,

Standard deviation, [tex]s_2[/tex] = 35.

The null hypothesis , [tex]H_o[/tex] : there is no difference in population means, i.e.,

[tex]\mu_1 =\mu_2[/tex].

The alternate hypothesis, [tex]H_1[/tex] : there is a significant difference in population means, i.e.,

[tex]\mu_1 \neq\mu_2[/tex].

Under the null hypothesis, the test statistic is

[tex]t = \dfrac{\bar{X_1}-\bar{X_2}}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2} }}[/tex]

[tex]= \dfrac{300-290}{\sqrt{\frac{40^2}{35}+\frac{35^2}{40} }}\\=\dfrac{10}{\sqrt{\frac{1600}{35}+\frac{1225}{40} }} \\=\dfrac{10}{\sqrt{45.714+30.625 }} \\=\dfrac{10}{\sqrt{76.339 }}\\=\dfrac{10}{8.737}\\=1.145[/tex]

Hence the test statistic is 1.145.

Learn more about test statistics here:

https://brainly.com/question/34153666

## Confidence Interval/Hypothesis Testing for the Difference of Means Calculator

Enter n | Enter X | Enter σ | Enter n | Enter X | Enter σ | Enter Confidence % |
---|---|---|---|---|---|---|

## How does the Confidence Interval/Hypothesis Testing for the Difference of Means Calculator work?

What 2 formulas are used for the confidence interval/hypothesis testing for the difference of means calculator, what 7 concepts are covered in the confidence interval/hypothesis testing for the difference of means calculator.

An Automated Online Math Tutor serving 8.1 million parents and students in 235 countries and territories.

## Our Services

- All Subjects
- A.I. Training Data and Analytics
- Get Paid as an Affiliate

## Top Categories

- Trigonometry
- Pre-Algebra
- Pre-Calculus
- Join the MathCelebrity Skool Group
- Post a Math Problem

## Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

## How to test for differences between two group means when the data is not normally distributed?

I'll eliminate all the biological details and experiments and quote just the problem at hand and what I have done statistically. I would like to know if its right, and if not, how to proceed. If the data (or my explanation) isn't clear enough, I'll try to explain better by editing.

Suppose I have two groups/observations, X and Y, with size $N_x=215$ and $N_y=40$. I would like to know if the means of these two observations are equal. My first question is:

If the assumptions are satisfied, is it relevant to use a parametric two-sample t-test here? I ask this because from my understanding its usually applied when the size is small?

I plotted histograms of both X and Y and they were not normally distributed, one of the assumptions of a two-sample t-test. My confusion is that, I consider them to be two populations and that's why I checked for normal distribution. But then I am about to perform a two-SAMPLE t-test... Is this right?

From central limit theorem, I understand that if you perform sampling (with/without repetition depending on your population size) multiple times and compute the average of the samples each time, then it will be approximately normally distributed. And, the mean of this random variables will be a good estimate of the population mean. So, I decided to do this on both X and Y, 1000 times, and obtained samples, and I assigned a random variable to the mean of each sample. The plot was very much normally distributed. The mean of X and Y were 4.2 and 15.8 (which were the same as population +- 0.15) and the variance was 0.95 and 12.11. I performed a t-test on these two observations (1000 data points each) with unequal variances, because they are very different (0.95 and 12.11). And the null hypothesis was rejected. Does this make sense at all? Is this correct / meaningful approach or a two-sample z-test is sufficient or its totally wrong?

I also performed a non-parametric Wilcoxon test just to be sure (on original X and Y) and the null hypothesis was convincingly rejected there as well. In the event that my previous method was utterly wrong, I suppose doing a non-parametric test is good, except for statistical power maybe?

In both cases, the means were significantly different. However, I would like to know if either or both the approaches are faulty/totally wrong and if so, what is the alternative?

- hypothesis-testing
- normality-assumption
- wilcoxon-mann-whitney-test
- central-limit-theorem

## 2 Answers 2

The idea that the t-test is only for small samples is a historical hold over. Yes it was originally developed for small samples, but there is nothing in the theory that distinguishes small from large. In the days before computers were common for doing statistics the t-tables often only went up to around 30 degrees of freedom and the normal was used beyond that as a close approximation of the t distribution. This was for convenience to keep the t-table's size reasonable. Now with computers we can do t-tests for any sample size (though for very large samples the difference between the results of a z-test and a t-test are very small). The main idea is to use a t-test when using the sample to estimate the standard deviations and the z-test if the population standard deviations are known (very rare).

The Central Limit Theorem lets us use the normal theory inference (t-tests in this case) even if the population is not normally distributed as long as the sample sizes are large enough. This does mean that your test is approximate (but with your sample sizes, the appromition should be very good).

The Wilcoxon test is not a test of means (unless you know that the populations are perfectly symmetric and other unlikely assumptions hold). If the means are the main point of interest then the t-test is probably the better one to quote.

Given that your standard deviations are so different, and the shapes are non-normal and possibly different from each other, the difference in the means may not be the most interesting thing going on here. Think about the science and what you want to do with your results. Are decisions being made at the population level or the individual level? Think of this example: you are comparing 2 drugs for a given disease, on drug A half the sample died immediatly the other half recovered in about a week; on drug B all survived and recovered, but the time to recovery was longer than a week. In this case would you really care about which mean recovery time was shorter? Or replace the half dying in A with just taking a really long time to recover (longer than anyone in the B group). When deciding which drug I would want to take I would want the full information, not just which was quicker on average.

- $\begingroup$ Thank you Greg. I assume there's nothing wrong with the procedure per-se? I understand that I might not be asking the right question, but my concern is equally about the statistical test/procedure and understanding itself given two samples. I'll check if I am asking the right question and come back with questions, if any. Maybe if I explain the biological problem, it would help with more suggestions. Thanks again. $\endgroup$ – Arun Commented Sep 16, 2011 at 20:08

One addition to Greg's already very comprehensive answer.

If I understand you the right way, your point 3 states the following procedure:

- Observe $n$ samples of a distribution $X$.
- Then, draw $m$ of those $n$ values and compute their mean.
- Repeat this 1000 times, save the corresponding means
- Finally, compute the mean of those means and assume that the mean of $X$ equals the mean computed that way.

Now your assumption is, that for this mean the central limit theorem holds and the corresponding random variable will be normally distributed.

Maybe let's have a look at the math behind your computation to identify the error:

We will call your samples of $X$ $X_1,\ldots,X_n$, or, in statistical terminology, you have $X_1,\ldots, X_n\sim X$. Now, we draw samples of size $m$ and compute their mean. The $k$-th of those means looks somehow like this:

$$ Y_k=\frac{1}{m}\sum_{i=1}^m X_{\mu^k_{i}} $$

where $\mu^k_i$ denotes the value between 1 and $n$ that has been drawn at draw $i$. Computing the mean of all those means thus results in

$$ \frac{1}{1000}\sum_{k=1}^{1000} \frac{1}{m}\sum_{i=1}^m X_{\mu^k_{i}} $$

To spare you the exact mathematical terminology just take a look at this sum. What happens is that the $X_i$ are just added multiple times to the sum. All in all, you add up $1000m$ numbers and divide them by $1000m$. In fact, you are computing a weighted mean of the $X_i$ with random weights.

Now, however, the Central Limit Theorem states that the sum of a lot of independent random variables is approximately normal. (Which results in also being the mean approx. normal).

Your sum above does not produce independent samples. You perhaps have random weights, but that does not make your samples independent at all. Thus, the procedure written in 3 is not legal.

However, as Greg already stated, using a $t$-test on your original data may be approximately correct - if you are really interested at the mean.

- $\begingroup$ Thank you. It seems t-test already takes care of the problem using CLT (from greg's reply which I overlooked). Thanks for pointing that out and for the clear explanation of 3) which is what I actually wanted to know. I'll have to invest more time to grasp these concepts. $\endgroup$ – Arun Commented Sep 17, 2011 at 12:37
- 2 $\begingroup$ Keep in mind that the CLT performs differently well depending on the distribution at hand (or, even worse, the expected value or the variance of the distribution do not exist - then CLT is not even valid). If in doubt it is always a good idea to generate a distribution that looks similar to the one you observed and then simulate your test using this distribution a few hundred times. You will get a feeling on the quality of the approximation CLT supplies. $\endgroup$ – Thilo Commented Sep 17, 2011 at 18:12

## Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

## Not the answer you're looking for? Browse other questions tagged hypothesis-testing t-test normality-assumption wilcoxon-mann-whitney-test central-limit-theorem or ask your own question .

- Featured on Meta
- We spent a sprint addressing your requests — here’s how it went
- Upcoming initiatives on Stack Overflow and across the Stack Exchange network...

## Hot Network Questions

- Pre-90's (?) Sinbad-like fantasy movie with an invisible monster in a Roman-like arena
- I feel guilty about past behavior in my college
- Source for a story about algebraic number theory?
- Did any other European leader praise China for its peace initiatives since the outbreak of the Ukraine war?
- Given a special name
- Does the oven temperature for a lasagna really matter?
- Why were early (personal) computer connectors so bulky?
- Continued calibration of atomic clocks
- Paradox in Prisoner's dilemma
- Pattern on a PCB
- What happened to the job market for assembly programmers once high level languages became mainstream?
- o y u (or and or)
- Is an employment Conflict of Interest necessary when redundant with my Affiliation?
- What does "I'll do, I'll do, and I'll do" mean?
- Positive sum can always be presented as a sum with strictly positive incremental sub-sums
- Linux disk space running full
- Why does RBF rule #3 exist?
- Could a Black Market exist in a cashless society (digital currency)?
- Confusion about the probability of a continuous random variable at a given point
- Is this a potentially more intuitive approach to MergeSort?
- How to choose between 3/4 and 6/8 time?
- What side-effects, if any, are okay when importing a python module?
- Reading strategies
- Are hot-air balloons regulated similar to jet aircraft?

## COMMENTS

How to conduct a hypothesis test to determine whether the difference between two mean scores is significant. Includes examples for one- and two-tailed tests.

Learning Objectives Under appropriate conditions, conduct a hypothesis test about a difference between two population means. State a conclusion in context.

Say you test your sample the way Sal does it, and realize that the probability of you getting that sample was 1%. Normally, you would reject the null hypothesis. But say the null hypothesis was indeed correct. This means you just happened to choose a lot samples from the far left or far right of the population mean.

Given data from two samples, we can do a signficance test to compare the sample means with a test statistic and p-value, and determine if there is enough evidence to suggest a difference between the two population means.

What is a mean difference / difference between means? Simple definition in plain English. How to run hypothesis tests for differences between means.

The same process for the hypothesis test for one mean can be applied. The test for the mean difference may be referred to as the paired t-test or the test for paired means.

Finally, the null value is the difference in sample means under the null hypothesis. Just as in Chapter 4, the test statistic Z is used to identify the p-value.

Writing hypotheses to test the difference of means. An exercise scientist wanted to test the effectiveness of a new program designed to increase the flexibility of senior citizens. They recruited participants and rated their flexibility according to a standard scale before starting the program.

Figure 12.3.1 shows that the probability value for a two-tailed test is 0.0164. The two-tailed test is used when the null hypothesis can be rejected regardless of the direction of the effect. As shown in Figure 12.3.1, it is the probability of a t < − 2.533 or a t > 2.533. Figure 12.3.1: The two-tailed probability.

Typical cases in Hypothesis testing For both dependent and independent samples, the formulation of the hypothesis test for difference of means may present the following cases: σ2 equal and known, n >30 σ2 equal and unknown, n >30 σ2 equal and unknown, n <30 Notice that in all cases, we are assuming that variances are equal. This is a very strong assumption in Statistics.

Difference between Two Means (Independent Groups) Author (s) David M. Lane Prerequisites Sampling Distribution of Difference between Means, Confidence Intervals, Confidence Interval on the Difference between Means, Logic of Hypothesis Testing, Testing a Single Mean Learning Objectives

Learn how to compare two independent population means using hypothesis testing, assumptions, and examples in this LibreTexts course.

Step 1: Determine the hypotheses. The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0, is again a statement of "no effect" or "no difference.". The alternative hypothesis, H a, can be any one of the following.

Calculate the results of your two sample t-test. Use the calculator below to analyze the results of a difference in sample means hypothesis test. Enter your sample means, sample standard deviations, sample sizes, hypothesized difference in means, test type, and significance level to calculate your results.

How to conduct a hypothesis test for the difference between paired means. Includes step-by-step example of the test procedure, a matched-pairs t-test.

Hypothesis test This lesson explains how to conduct a hypothesis test for the difference between two means. The test procedure, called the two-sample t-test, is appropriate when the following conditions are met:

How to conduct a hypothesis test for a mean value, using a one-sample t-test. The test procedure is illustrated with examples for one- and two-tailed tests.

Hypothesis Test for the Difference of Paired Means, μ d In this section, we will develop the hypothesis test for the mean difference for paired samples. As we learned in the previous section, if we consider the difference rather than the two samples, then we are back in the one-sample mean scenario.

10.3 - Tests for Differences Significance tests are often used examine the difference between groups in comparative experiments and observational studies. We still use the same four basic steps to carry out the test. Here are two examples.

Our last (!) test applies to differences of means. Such tests are very common when you conduct a study involving two groups. In many medical trials, for example, subjects are randomly divided into two groups. One group receives a new drug, the second receives a placebo (sugar pill). Then the researcher measures any differences between the two groups. Fortunately, we know how to do Hypothesis ...

The test statistic is 1.145 for the appropriate test to investigate whether there is a difference in population means. A test statistic is a numerical value calculated from sample data in hypothesis testing.

hypothesis testing statistical test using a statement of a possible explanation for some conclusions mean A statistical measurement also known as the average null hypothesis in a statistical test, the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental ...

10.26: Hypothesis Test for a Population Mean (5 of 5) Interpret the P-value as a conditional probability. We finish our discussion of the hypothesis test for a population mean with a review of the meaning of the P-value, along with a review of type I and type II errors.

The plot was very much normally distributed. The mean of X and Y were 4.2 and 15.8 (which were the same as population +- 0.15) and the variance was 0.95 and 12.11. I performed a t-test on these two observations (1000 data points each) with unequal variances, because they are very different (0.95 and 12.11). And the null hypothesis was rejected.

The sample standard deviation of the differences was... Paired Differences Test For a random sample of 20 data pairs, the sample mean of the differences was 2 .

The more interesting question to me was how it was that someone who'd thought about decision analysis and hypothesis testing could have thought that "any decision [we] make ends up being a dichotomous claim based on a critical value, and I [Lakens] will be able to recompute that critical value into a p-value."