0 mg
In conducting this experiment, the experimenter had two research questions:
To answer these questions, the experimenter intends to use one-way analysis of variance.
Before you crunch the first number in one-way analysis of variance, you must be sure that one-way analysis of variance is the correct technique. That means you need to ask two questions:
Let's address both of those questions.
As we discussed in the previous lesson (see One-Way Analysis of Variance: Fixed Effects ), one-way analysis of variance is only appropriate with one experimental design - a completely randomized design. That is exactly the design used in our cholesterol study, so we can check the experimental design box.
We also learned in the previous lesson that one-way analysis of variance makes three critical assumptions:
Therefore, for the cholesterol study, we need to make sure our data set is consistent with the critical assumptions.
The assumption of independence is the most important assumption. When that assumption is violated, the resulting statistical tests can be misleading.
The independence assumption is satisfied by the design of the study, which features random selection of subjects and random assignment to treatment groups. Randomization tends to distribute effects of extraneous variables evenly across groups.
Violations of normality can be a problem when sample size is small, as it is in this cholesterol study. Therefore, it is important to be on the lookout for any indication of non-normality.
There are many different ways to check for normality. On this website, we describe three at: How to Test for Normality: Three Simple Tests . Given the small sample size, our best option for testing normality is to look at the following descriptive statistics:
The table below shows the mean, median, skewness, and kurtosis for each group from our study.
Group 1, 0 mg | Group 2, 50 mg | Group 3, 100 mg | |
---|---|---|---|
Mean | 258 | 246 | 210 |
Median | 270 | 240 | 210 |
Range | 90 | 60 | 60 |
Skewness | -0.40 | -0.51 | 0.00 |
Kurtosis | -0.18 | -0.61 | 2.00 |
In all three groups, the difference between the mean and median looks small (relative to the range ). And skewness and kurtosis measures are consistent with a normal distribution (i.e., between -2 and +2). These are crude tests, but they provide some confidence for the assumption of normality in each group.
Note: With Excel, you can easily compute the descriptive statistics in Table 1. To see how, go to: How to Test for Normality: Example 1 .
When the normality of variance assumption is satisfied, you can use Hartley's Fmax test to test for homogeneity of variance. Here's how to implement the test:
Σj=1 - X ) | |
s = | |
( n - 1 ) |
where X i, j is the score for observation i in Group j , X j is the mean of Group j , and n j is the number of observations in Group j .
Here is the variance ( s 2 j ) for each group in the cholesterol study.
Group 1, 0 mg | Group 2, 50 mg | Group 3, 100 mg |
---|---|---|
1170 | 630 | 450 |
F RATIO = s 2 MAX / s 2 MIN
F RATIO = 1170 / 450
F RATIO = 2.6
where s 2 MAX is the largest group variance, and s 2 MIN is the smallest group variance.
where n is the largest sample size in any group.
Note: The critical F values in the table are based on a significance level of 0.05.
Here, the F ratio (2.6) is smaller than the Fmax value (15.5), so we conclude that the variances are homogeneous.
Note: Other tests, such as Bartlett's test , can also test for homogeneity of variance. For the record, Bartlett's test yields the same conclusion for the cholesterol study; namely, the variances are homogeneous.
Having confirmed that the critical assumptions are tenable, we can proceed with a one-way analysis of variance. That means taking the following steps:
Now, let's execute each step, one-by-one, with our cholesterol medication experiment.
For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable. In our experiment, the dependent variable ( X ) is the cholesterol level of a subject, and the independent variable ( β ) is the dosage level administered to a subject.
For example, here is the fixed-effects model for a completely randomized design:
X i j = μ + β j + ε i ( j )
where X i j is the cholesterol level for subject i in treatment group j , μ is the population mean, β j is the effect of the dosage level administered to subjects in group j ; and ε i ( j ) is the effect of all other extraneous variables on subject i in treatment j .
For fixed-effects models, it is common practice to write statistical hypotheses in terms of the treatment effect β j . With that in mind, here is the null hypothesis and the alternative hypothesis for a one-way analysis of variance:
H 0 : β j = 0 for all j
H 1 : β j ≠ 0 for some j
If the null hypothesis is true, the mean score (i.e., mean cholesterol level) in each treatment group should equal the population mean. Thus, if the null hypothesis is true, mean scores in the k treatment groups should be equal. If the null hypothesis is false, at least one pair of mean scores should be unequal.
The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it is actually true. The significance level for an experiment is specified by the experimenter, before data collection begins.
Experimenters often choose significance levels of 0.05 or 0.01. For this experiment, let's use a significance level of 0.05.
Analysis of variance begins by computing a grand mean and group means:
X = ( 1 / 15 ) * ( 210 + 210 + ... + 270 + 240 )
X 1 = 258
X 2 = 246
X 3 = 210
In the equations above, n is the total sample size across all groups; and n j is the sample size in Group j .
A sum of squares is the sum of squared deviations from a mean score. One-way analysis of variance makes use of three sums of squares:
SSB = 5 * [ ( 238-258 ) 2 + ( 238-246) 2 + ( 238-210 ) 2 ]
SSW = 2304 + ... + 900 = 9000
SST = 784 + 4 + 1084 + ... + 784 + 784 + 4
SST = 15,240
It turns out that the total sum of squares is equal to the between-groups sum of squares plus the within-groups sum of squares, as shown below:
SST = SSB + SSW
15,240 = 6240 + 9000
The term degrees of freedom (df) refers to the number of independent sample points used to compute a statistic minus the number of parameters estimated from the sample points.
To illustrate what is going on, let's find the degrees of freedom associated with the various sum of squares computations:
Here, the formula uses k independent sample points, the sample means X j . And it uses one parameter estimate, the grand mean X , which was estimated from the sample points. So, the between-groups sum of squares has k - 1 degrees of freedom ( df BG ).
df BG = k - 1 = 5 - 1 = 4
Here, the formula uses n independent sample points, the individual subject scores X i j . And it uses k parameter estimates, the group means X j , which were estimated from the sample points. So, the within-groups sum of squares has n - k degrees of freedom ( df WG ).
n = Σ n i = 5 + 5 + 5 = 15
df WG = n - k = 15 - 3 = 12
Here, the formula uses n independent sample points, the individual subject scores X i j . And it uses one parameter estimate, the grand mean X , which was estimated from the sample points. So, the total sum of squares has n - 1 degrees of freedom ( df TOT ).
df TOT = n - 1 = 15 - 1 = 14
The degrees of freedom for each sum of squares are summarized in the table below:
Sum of squares | Degrees of freedom |
---|---|
Between-groups | k - 1 = 2 |
Within-groups | n - k =12 |
Total | n - 1 = 14 |
A mean square is an estimate of population variance. It is computed by dividing a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:
MS = SS / df
To conduct a one-way analysis of variance, we are interested in two mean squares:
MS WG = SSW / df WG
MS WG = 9000 / 12 = 750
MS BG = SSB / df BG
MS BG = 6240 / 2 = 3120
The expected value of a mean square is the average value of the mean square over a large number of experiments.
Statisticians have derived formulas for the expected value of the within-groups mean square ( MS WG ) and for the expected value of the between-groups mean square ( MS BG ). For one-way analysis of variance, the expected value formulas are:
E( MS WG ) = σ ε 2
Σj=1 | |
E( MS ) = σ + | |
( k - 1 ) |
E( MS BG ) = σ ε 2 + nσ β 2
In the equations above, E( MS WG ) is the expected value of the within-groups mean square; E( MS BG ) is the expected value of the between-groups mean square; n is total sample size; k is the number of treatment groups; β j is the treatment effect in Group j ; σ ε 2 is the variance attributable to everything except the treatment effect (i.e., all the extraneous variables); and σ β 2 is the variance due to random selection of treatment levels.
Notice that MS BG should equal MS WG when the variation due to treatment effects ( β j for fixed effects and σ β 2 for random effects) is zero (i.e., when the independent variable does not affect the dependent variable). And MS BG should be bigger than the MS WG when the variation due to treatment effects is not zero (i.e., when the independent variable does affect the dependent variable)
Conclusion: By examining the relative size of the mean squares, we can make a judgment about whether an independent variable affects a dependent variable.
Suppose we use the mean squares to define a test statistic F as follows:
F(v 1 , v 2 ) = MS BG / MS WG
F(2, 12) = 3120 / 750 = 4.16
where MS BG is the between-groups mean square, MS WG is the within-groups mean square, v 1 is the degrees of freedom for MS BG , and v 2 is the degrees of freedom for MS WG .
Defined in this way, the F ratio measures the size of MS BG relative to MS WG . The F ratio is a convenient measure that we can use to test the null hypothesis. Here's how:
What does it mean for the F ratio to be significantly greater than one? To answer that question, we need to talk about the P-value.
In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome, assuming the null hypothesis is true.
With analysis of variance, the F ratio is the observed experimental outcome that we are interested in. So, the P-value would be the probability that an F statistic would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.
We can use Stat Trek's F Distribution Calculator to find the probability that an F statistic will be bigger than the actual F ratio observed in the experiment. Enter the between-groups degrees of freedom (2), the within-groups degrees of freedom (12), and the observed F ratio (4.16) into the calculator; then, click the Calculate button.
From the calculator, we see that the P ( F > 4.16 ) equals about 0.04. Therefore, the P-Value is 0.04.
Recall that we specified a significance level 0.05 for this experiment. Once you know the significance level and the P-value, the hypothesis test is routine. Here's the decision rule for accepting or rejecting the null hypothesis:
Since the P-value (0.04) in our experiment is smaller than the significance level (0.05), we reject the null hypothesis that drug dosage had no effect on cholesterol level. And we conclude that the mean cholesterol level in at least one treatment group differed significantly from the mean cholesterol level in another group.
The hypothesis test tells us whether the independent variable in our experiment has a statistically significant effect on the dependent variable, but it does not address the magnitude of the effect. Here's the issue:
With this in mind, it is customary to supplement analysis of variance with an appropriate measure of effect size. Eta squared (η 2 ) is one such measure. Eta squared is the proportion of variance in the dependent variable that is explained by a treatment effect. The eta squared formula for one-way analysis of variance is:
η 2 = SSB / SST
where SSB is the between-groups sum of squares and SST is the total sum of squares.
Given this formula, we can compute eta squared for this drug dosage experiment, as shown below:
η 2 = SSB / SST = 6240 / 15240 = 0.41
Thus, 41 percent of the variance in our dependent variable (cholesterol level) can be explained by variation in our independent variable (dosage level). It appears that the relationship between dosage level and cholesterol level is significant not only in a statistical sense; it is significant in a practical sense as well.
It is traditional to summarize ANOVA results in an analysis of variance table. The analysis that we just conducted provides all of the information that we need to produce the following ANOVA summary table:
Analysis of Variance Table
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
BG | 6,240 | 2 | 3,120 | 4.16 | 0.04 |
WG | 9,000 | 12 | 750 | ||
Total | 15,240 | 14 |
This ANOVA table allows any researcher to interpret the results of the experiment, at a glance.
The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the F ratio shown in the table, assuming the null hypothesis is true. When the P-value is bigger than the significance level, we accept the null hypothesis; when it is smaller, we reject it. Here, the P-value (0.04) is smaller than the significance level (0.05), so we reject the null hypothesis.
To assess the strength of the treatment effect, an experimenter might compute eta squared (η 2 ). The computation is easy, using sum of squares entries from the ANOVA table, as shown below:
η 2 = SSB / SST = 6,240 / 15,240 = 0.41
For this experiment, an eta squared of 0.41 means that 41% of the variance in the dependent variable can be explained by the effect of the independent variable.
In this lesson, we showed all of the hand calculations for a one-way analysis of variance. In the real world, researchers seldom conduct analysis of variance by hand. They use statistical software. In the next lesson, we'll analyze data from this problem with Excel. Hopefully, we'll get the same result.
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Published on March 20, 2020 by Rebecca Bevans . Revised on June 22, 2023.
ANOVA (Analysis of Variance) is a statistical test used to analyze the difference between the means of more than two groups.
A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. Use a two-way ANOVA when you want to know how two independent variables, in combination, affect a dependent variable.
When to use a two-way anova, how does the anova test work, assumptions of the two-way anova, how to perform a two-way anova, interpreting the results of a two-way anova, how to present the results of a a two-way anova, other interesting articles, frequently asked questions about two-way anova.
You can use a two-way ANOVA when you have collected data on a quantitative dependent variable at multiple levels of two categorical independent variables.
A quantitative variable represents amounts or counts of things. It can be divided to find a group mean.
A categorical variable represents types or categories of things. A level is an individual category within the categorical variable.
You should have enough observations in your data set to be able to find the mean of the quantitative dependent variable at each combination of levels of the independent variables.
Both of your independent variables should be categorical. If one of your independent variables is categorical and one is quantitative, use an ANCOVA instead.
ANOVA tests for significance using the F test for statistical significance . The F test is a groupwise comparison test, which means it compares the variance in each group mean to the overall variance in the dependent variable.
If the variance within groups is smaller than the variance between groups, the F test will find a higher F value, and therefore a higher likelihood that the difference observed is real and not due to chance.
A two-way ANOVA with interaction tests three null hypotheses at the same time:
A two-way ANOVA without interaction (a.k.a. an additive two-way ANOVA) only tests the first two of these hypotheses.
Null hypothesis (H ) | Alternate hypothesis (H ) |
---|---|
There is no difference in average yield for any fertilizer type. | There is a difference in average yield by fertilizer type. |
There is no difference in average yield at either planting density. | There is a difference in average yield by planting density. |
The effect of one independent variable on average yield does not depend on the effect of the other independent variable (a.k.a. no interaction effect). | There is an interaction effect between planting density and fertilizer type on average yield. |
To use a two-way ANOVA your data should meet certain assumptions.Two-way ANOVA makes all of the normal assumptions of a parametric test of difference:
The variation around the mean for each group being compared should be similar among all groups. If your data don’t meet this assumption, you may be able to use a non-parametric alternative , like the Kruskal-Wallis test.
Your independent variables should not be dependent on one another (i.e. one should not cause the other). This is impossible to test with categorical variables – it can only be ensured by good experimental design .
In addition, your dependent variable should represent unique observations – that is, your observations should not be grouped within locations or individuals.
If your data don’t meet this assumption (i.e. if you set up experimental treatments within blocks), you can include a blocking variable and/or use a repeated-measures ANOVA.
The values of the dependent variable should follow a bell curve (they should be normally distributed ). If your data don’t meet this assumption, you can try a data transformation.
The dataset from our imaginary crop yield experiment includes observations of:
The two-way ANOVA will test whether the independent variables (fertilizer type and planting density) have an effect on the dependent variable (average crop yield). But there are some other possible sources of variation in the data that we want to take into account.
We applied our experimental treatment in blocks, so we want to know if planting block makes a difference to average crop yield. We also want to check if there is an interaction effect between two independent variables – for example, it’s possible that planting density affects the plants’ ability to take up fertilizer.
Because we have a few different possible relationships between our variables, we will compare three models:
Model 1 assumes there is no interaction between the two independent variables. Model 2 assumes that there is an interaction between the two independent variables. Model 3 assumes there is an interaction between the variables, and that the blocking variable is an important source of variation in the data.
By running all three versions of the two-way ANOVA with our data and then comparing the models, we can efficiently test which variables, and in which combinations, are important for describing the data, and see whether the planting block matters for average crop yield.
This is not the only way to do your analysis, but it is a good method for efficiently comparing models based on what you think are reasonable combinations of variables.
We will run our analysis in R. To try it yourself, download the sample dataset.
Sample dataset for a two-way ANOVA
After loading the data into the R environment, we will create each of the three models using the aov() command, and then compare them using the aictab() command. For a full walkthrough, see our guide to ANOVA in R .
This first model does not predict any interaction between the independent variables, so we put them together with a ‘+’.
In the second model, to test whether the interaction of fertilizer type and planting density influences the final yield, use a ‘ * ‘ to specify that you also want to know the interaction effect.
Because our crop treatments were randomized within blocks, we add this variable as a blocking factor in the third model. We can then compare our two-way ANOVAs with and without the blocking variable to see whether the planting location matters.
Now we can find out which model is the best fit for our data using AIC ( Akaike information criterion ) model selection.
AIC calculates the best-fit model by finding the model that explains the largest amount of variation in the response variable while using the fewest parameters. We can perform a model comparison in R using the aictab() function.
The output looks like this:
The AIC model with the best fit will be listed first, with the second-best listed next, and so on. This comparison reveals that the two-way ANOVA without any interaction or blocking effects is the best fit for the data.
Discover proofreading & editing
You can view the summary of the two-way model in R using the summary() command. We will take a look at the results of the first model, which we found was the best fit for our data.
The model summary first lists the independent variables being tested (‘fertilizer’ and ‘density’). Next is the residual variance (‘Residuals’), which is the variation in the dependent variable that isn’t explained by the independent variables.
The following columns provide all of the information needed to interpret the model:
From this output we can see that both fertilizer type and planting density explain a significant amount of variation in average crop yield ( p values < 0.001).
ANOVA will tell you which parameters are significant, but not which levels are actually different from one another. To test this we can use a post-hoc test. The Tukey’s Honestly-Significant-Difference (TukeyHSD) test lets us see which groups are different from one another.
This output shows the pairwise differences between the three types of fertilizer ($fertilizer) and between the two levels of planting density ($density), with the average difference (‘diff’), the lower and upper bounds of the 95% confidence interval (‘lwr’ and ‘upr’) and the p value of the difference (‘p-adj’).
From the post-hoc test results, we see that there are significant differences ( p < 0.05) between:
but no difference between fertilizer groups 2 and 1.
Once you have your model output, you can report the results in the results section of your thesis , dissertation or research paper .
When reporting the results you should include the F statistic, degrees of freedom, and p value from your model output.
You can discuss what these findings mean in the discussion section of your paper.
You may also want to make a graph of your results to illustrate your findings.
Your graph should include the groupwise comparisons tested in the ANOVA, with the raw data points, summary statistics (represented here as means and standard error bars), and letters or significance values above the groups to show which groups are significantly different from the others.
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Methodology
Research bias
The only difference between one-way and two-way ANOVA is the number of independent variables . A one-way ANOVA has one independent variable, while a two-way ANOVA has two.
All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a t-test instead.
In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.
Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).
If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.
A factorial ANOVA is any ANOVA that uses more than one categorical independent variable . A two-way ANOVA is a type of factorial ANOVA.
Some examples of factorial ANOVAs include:
Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).
Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).
You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bevans, R. (2023, June 22). Two-Way ANOVA | Examples & When To Use It. Scribbr. Retrieved June 24, 2024, from https://www.scribbr.com/statistics/two-way-anova/
Other students also liked, anova in r | a complete step-by-step guide with examples, one-way anova | when and how to use it (with examples), what is your plagiarism score.
Statistics By Jim
Making statistics intuitive
By Jim Frost Leave a Comment
Use one way ANOVA to compare the means of three or more groups. This analysis is an inferential hypothesis test that uses samples to draw conclusions about populations. Specifically, it tells you whether your sample provides sufficient evidence to conclude that the groups’ population means are different. ANOVA stands for analysis of variance.
To perform one-way ANOVA, you’ll need a continuous dependent (outcome) variable and a categorical independent variable to form the groups.
For example, one-way ANOVA can determine whether parts made from four materials have different mean strengths.
In this post, learn about the hypotheses, assumptions, and interpreting the results for one-way ANOVA.
Related post : Descriptive vs. Inferential Statistics and Independent and Dependent Variables .
One-way ANOVA has the following hypotheses:
Reject the null when your p-value is less than your significance level (e.g., 0.05). The differences between the means are statistically significant. Your sample provides sufficiently strong evidence to conclude that the population means are not all equal.
Note that one-way ANOVA is an omnibus test, providing overall results for your data. It tells you whether any group means are different—Yes or No. However, it doesn’t specify which pairs of means are different. To make that determination, follow up a statistically significant one-way ANOVA with a post hoc test that can identify specific group differences that are significant.
Related posts : Interpreting P Values and Null Hypothesis Definition .
For reliable one-way ANOVA results, your data should satisfy the following assumptions:
Use random sampling to help ensure your sample represents your target population. If your data do not reflect the population, your one-way ANOVA results will not be valid.
Additionally, the method assumes your sampling method obtains independent observations. Selecting one subject does not affect the chances of choosing any others.
Finally, the procedure uses independent samples. Each group contains a unique set of items.
Related posts : Representative Samples: Definition, Uses & Examples and Independent and Dependent Samples
One-way ANOVA requires continuous data . Typically, you quantity continuous variables using a scale that can be meaningfully divided into smaller fractions. For example, temperature, mass, length, and duration are continuous data.
Learn more about Hypothesis Tests for Continuous, Binary, and Count Data .
One-way ANOVA assumes your group data follow the normal distribution . However, your groups can be skewed if your sample size is large enough because of the central limit theorem.
Here are the sample size guidelines:
For one-way ANOVA, unimodal data can be mildly skewed and the results will still be valid when all groups exceed the guidelines. Read here for more information about the simulation studies that support these sample size guidelines.
However, if your sample size is smaller, graph your data and determine whether the groups are skewed. If they are, you might need to use a nonparametric test . The Kruskal-Wallis test is the nonparametric test corresponding to one-way ANOVA.
Be sure to look for outliers because they can produce misleading results.
Related posts : Central Limit Theorem & Skewed Distributions
One-way ANOVA has two methods for handling group variances. The traditional F-test ANOVA assumes that all groups have equal variances. On the other hand, Welch’s ANOVA does not assume they are equal. If in doubt, just use Welch’s ANOVA because it works well for either case.
Related posts : Variances and Standard Deviations
Suppose we are a manufacturer testing four materials to make a part. We collect a random sample of parts made using the four materials and measure their strengths. Download the CSV dataset for this example: PostHocTests .
First, I’ll graph the data to see what we’re working with.
The bar chart shows differences between the group means. However, a graph doesn’t indicate whether those differences are due to chance during random sampling or reflect underlying population differences. One-way ANOVA can help us out with that!
Let’s use one-way ANOVA to determine whether the mean differences between these groups are statistically significant. Below are the statistical results.
The p-value of 0.004 is less than our significance level of 0.05. We reject the null and conclude that all four population means are not all equal. While the Means table shows the group means at the bottom, we don’t know which differences between pairs of groups are statistically significant.
To perform pairwise comparisons between these four groups, we need to use a post hoc test, also known as multiple comparisons. To continue with this example and find the significant group differences, read my post Using Post Hoc Tests with ANOVA .
Related posts : How to do One-Way ANOVA in Excel
Comments and questions cancel reply.
Content preview.
Arcu felis bibendum ut tristique et egestas quis:
Lesson 10: introduction to anova, overview section .
In the previous lessons, we learned how to perform inference for a population mean from one sample and also how to compare population means from two samples (independent and paired). In this Lesson, we introduce Analysis of Variance or ANOVA. ANOVA is a statistical method that analyzes variances to determine if the means from more than two populations are the same. In other words, we have a quantitative response variable and a categorical explanatory variable with more than two levels. In ANOVA, the categorical explanatory is typically referred to as the factor.
Julia Simkus
Editor at Simply Psychology
BA (Hons) Psychology, Princeton University
Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.
Learn about our Editorial Process
Saul Mcleod, PhD
Editor-in-Chief for Simply Psychology
BSc (Hons) Psychology, MRes, PhD, University of Manchester
Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.
On This Page:
An ANOVA test is a statistical test used to determine if there is a statistically significant difference between two or more categorical groups by testing for differences of means using a variance.
Another key part of ANOVA is that it splits the independent variable into two or more groups.
For example, one or more groups might be expected to influence the dependent variable, while the other group is used as a control group and is not expected to influence the dependent variable.
The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:
There are different types of ANOVA tests. The two most common are a “One-Way” and a “Two-Way.”
The difference between these two types depends on the number of independent variables in your test.
A one-way ANOVA (analysis of variance) has one categorical independent variable (also known as a factor) and a normally distributed continuous (i.e., interval or ratio level) dependent variable.
The independent variable divides cases into two or more mutually exclusive levels, categories, or groups.
The one-way ANOVA test for differences in the means of the dependent variable is broken down by the levels of the independent variable.
An example of a one-way ANOVA includes testing a therapeutic intervention (CBT, medication, placebo) on the incidence of depression in a clinical sample.
Note : Both the One-Way ANOVA and the Independent Samples t-Test can compare the means for two groups. However, only the One-Way ANOVA can compare the means across three or more groups.
A two-way ANOVA (analysis of variance) has two or more categorical independent variables (also known as a factor) and a normally distributed continuous (i.e., interval or ratio level) dependent variable.
The independent variables divide cases into two or more mutually exclusive levels, categories, or groups. A two-way ANOVA is also called a factorial ANOVA.
An example of factorial ANOVAs include testing the effects of social contact (high, medium, low), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
In ANOVA, “groups” or “levels” refer to the different categories of the independent variable being compared.
For example, if the independent variable is “eggs,” the levels might be Non-Organic, Organic, and Free Range Organic. The dependent variable could then be the price per dozen eggs.
The test statistic for an ANOVA is denoted as F . The formula for ANOVA is F = variance caused by treatment/variance due to random chance.
The ANOVA F value can tell you if there is a significant difference between the levels of the independent variable, when p < .05. So, a higher F value indicates that the treatment variables are significant.
Note that the ANOVA alone does not tell us specifically which means were different from one another. To determine that, we would need to follow up with multiple comparisons (or post-hoc) tests.
When the initial F test indicates that significant differences exist between group means, post hoc tests are useful for determining which specific means are significantly different when you do not have specific hypotheses that you wish to test.
Post hoc tests compare each pair of means (like t-tests), but unlike t-tests, they correct the significance estimate to account for the multiple comparisons.
Replication requires a study to be repeated with different subjects and experimenters. This would enable a statistical analyzer to confirm a prior study by testing the same hypothesis with a new sample.
For large datasets, it is best to run an ANOVA in statistical software such as R or Stata. Let’s refer to our Egg example above.
Non-Organic, Organic, and Free-Range Organic Eggs would be assigned quantitative values (1,2,3). They would serve as our independent treatment variable, while the price per dozen eggs would serve as the dependent variable. Other erroneous variables may include “Brand Name” or “Laid Egg Date.”
Using data and the aov() command in R, we could then determine the impact Egg Type has on the price per dozen eggs.
T-tests and ANOVA tests are both statistical techniques used to compare differences in means and spreads of the distributions across populations.
The t-test determines whether two populations are statistically different from each other, whereas ANOVA tests are used when an individual wants to test more than two levels within an independent variable.
Referring back to our egg example, testing Non-Organic vs. Organic would require a t-test while adding in Free Range as a third option demands ANOVA.
Rather than generate a t-statistic, ANOVA results in an f-statistic to determine statistical significance.
ANOVA stands for Analysis of Variance. It’s a statistical method to analyze differences among group means in a sample. ANOVA tests the hypothesis that the means of two or more populations are equal, generalizing the t-test to more than two groups.
It’s commonly used in experiments where various factors’ effects are compared. It can also handle complex experiments with factors that have different numbers of levels.
ANOVA should be used when one independent variable has three or more levels (categories or groups). It’s designed to compare the means of these multiple groups.
An ANOVA test tells you if there are significant differences between the means of three or more groups. If the test result is significant, it suggests that at least one group’s mean differs from the others. It does not, however, specify which groups are different from each other.
You use the chi-square test instead of ANOVA when dealing with categorical data to test associations or independence between two categorical variables. In contrast, ANOVA is used for continuous data to compare the means of three or more groups.
Related Articles
Exploratory Data Analysis
Research Methodology , Statistics
What Is Face Validity In Research? Importance & How To Measure
Criterion Validity: Definition & Examples
Convergent Validity: Definition and Examples
Content Validity in Research: Definition & Examples
Construct Validity In Psychology Research
Hypothesis Testing - Analysis of Variance (ANOVA)
Lisa Sullivan, PhD
Professor of Biostatistics
Boston University School of Public Health
This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. For example, in some clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new medication for asthma, investigators might compare an experimental medication to a placebo and to a standard treatment (i.e., a medication currently being used). In an observational study such as the Framingham Heart Study, it might be of interest to compare mean blood pressure or mean cholesterol levels in persons who are underweight, normal weight, overweight and obese.
The technique to test for a difference in more than two independent means is an extension of the two independent samples procedure discussed previously which applies when there are exactly two independent comparison groups. The ANOVA technique applies when there are two or more than two independent groups. The ANOVA procedure is used to compare the means of the comparison groups and is conducted using the same five step approach used in the scenarios discussed in previous sections. Because there are more than two groups, however, the computation of the test statistic is more involved. The test statistic must take into account the sample sizes, sample means and sample standard deviations in each of the comparison groups.
If one is examining the means observed among, say three groups, it might be tempting to perform three separate group to group comparisons, but this approach is incorrect because each of these comparisons fails to take into account the total data, and it increases the likelihood of incorrectly concluding that there are statistically significate differences, since each comparison adds to the probability of a type I error. Analysis of variance avoids these problemss by asking a more global question, i.e., whether there are significant differences among the groups, without addressing differences between any two groups in particular (although there are additional tests that can do this if the analysis of variance indicates that there are differences among the groups).
The fundamental strategy of ANOVA is to systematically examine variability within groups being compared and also examine variability among the groups being compared.
After completing this module, the student will be able to:
Consider an example with four independent groups and a continuous outcome measure. The independent groups might be defined by a particular characteristic of the participants such as BMI (e.g., underweight, normal weight, overweight, obese) or by the investigator (e.g., randomizing participants to one of four competing treatments, call them A, B, C and D). Suppose that the outcome is systolic blood pressure, and we wish to test whether there is a statistically significant difference in mean systolic blood pressures among the four groups. The sample data are organized as follows:
|
|
|
|
|
---|---|---|---|---|
| n | n | n | n |
|
|
|
|
|
| s | s | s | s |
The hypotheses of interest in an ANOVA are as follows:
where k = the number of independent comparison groups.
In this example, the hypotheses are:
The null hypothesis in ANOVA is always that there is no difference in means. The research or alternative hypothesis is always that the means are not all equal and is usually written in words rather than in mathematical symbols. The research hypothesis captures any difference in means and includes, for example, the situation where all four means are unequal, where one is different from the other three, where two are different, and so on. The alternative hypothesis, as shown above, capture all possible situations other than equality of all means specified in the null hypothesis.
The test statistic for testing H 0 : μ 1 = μ 2 = ... = μ k is:
and the critical value is found in a table of probability values for the F distribution with (degrees of freedom) df 1 = k-1, df 2 =N-k. The table can be found in "Other Resources" on the left side of the pages.
NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population variances are equal, or s 1 2 = s 2 2 = ... = s k 2 ). This means that the outcome is equally variable in each of the comparison populations. This assumption is the same as that assumed for appropriate use of the test statistic to test equality of two independent means. It is possible to assess the likelihood that the assumption of equal variances is true and the test can be conducted in most statistical computing packages. If the variability in the k comparison groups is not similar, then alternative techniques must be used.
The F statistic is computed by taking the ratio of what is called the "between treatment" variability to the "residual or error" variability. This is where the name of the procedure originates. In analysis of variance we are testing for a difference in means (H 0 : means are all equal versus H 1 : means are not all equal) by evaluating variability in the data. The numerator captures between treatment variability (i.e., differences among the sample means) and the denominator contains an estimate of the variability in the outcome. The test statistic is a measure that allows us to assess whether the differences among the sample means (numerator) are more than would be expected by chance if the null hypothesis is true. Recall in the two independent sample test, the test statistic was computed by taking the ratio of the difference in sample means (numerator) to the variability in the outcome (estimated by Sp).
The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established for t tests. The decision rule again depends on the level of significance and the degrees of freedom. The F statistic has two degrees of freedom. These are denoted df 1 and df 2 , and called the numerator and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:
df 1 = k-1 and df 2 =N-k,
where k is the number of comparison groups and N is the total number of observations in the analysis. If the null hypothesis is true, the between treatment variation (numerator) will not exceed the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-hand) tail of the distribution as shown below.
Rejection Region for F Test with a =0.05, df 1 =3 and df 2 =36 (k=4, N=40)
For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.
We will next illustrate the ANOVA procedure using the five step approach. Because the computation of the test statistic is involved, the computations are often organized in an ANOVA table. The ANOVA table breaks down the components of variation in the data into variation between treatments and error or residual variation. Statistical computing packages also produce ANOVA tables as part of their standard output for ANOVA, and the ANOVA table is set up as follows:
Source of Variation | Sums of Squares (SS) | Degrees of Freedom (df) | Mean Squares (MS) | F |
---|---|---|---|---|
Between Treatments |
| k-1 |
|
|
Error (or Residual) |
| N-k |
| |
Total |
| N-1 |
where
The ANOVA table above is organized as follows.
and is computed by summing the squared differences between each treatment (or group) mean and the overall mean. The squared differences are weighted by the sample sizes per group (n j ). The error sums of squares is:
and is computed by summing the squared differences between each observation and its group mean (i.e., the squared differences between each observation in group 1 and the group 1 mean, the squared differences between each observation in group 2 and the group 2 mean, and so on). The double summation ( SS ) indicates summation of the squared differences within each treatment and then summation of these totals across treatments to produce a single value. (This will be illustrated in the following examples). The total sums of squares is:
and is computed by summing the squared differences between each observation and the overall sample mean. In an ANOVA, data are organized by comparison or treatment groups. If all of the data were pooled into a single sample, SST would reflect the numerator of the sample variance computed on the pooled or total sample. SST does not figure into the F statistic directly. However, SST = SSB + SSE, thus if two sums of squares are known, the third can be computed from the other two.
A clinical trial is run to compare weight loss programs and participants are randomly assigned to one of the comparison programs and are counseled on the details of the assigned program. Participants follow the assigned program for 8 weeks. The outcome of interest is weight loss, defined as the difference in weight measured at the start of the study (baseline) and weight measured at the end of the study (8 weeks), measured in pounds.
Three popular weight loss programs are considered. The first is a low calorie diet. The second is a low fat diet and the third is a low carbohydrate diet. For comparison purposes, a fourth group is considered as a control group. Participants in the fourth group are told that they are participating in a study of healthy behaviors with weight loss only one component of interest. The control group is included here to assess the placebo effect (i.e., weight loss due to simply participating in the study). A total of twenty patients agree to participate in the study and are randomly assigned to one of the four diet groups. Weights are measured at baseline and patients are counseled on the proper implementation of the assigned diet (with the exception of the control group). After 8 weeks, each patient's weight is again measured and the difference in weights is computed by subtracting the 8 week weight from the baseline weight. Positive differences indicate weight losses and negative differences indicate weight gains. For interpretation purposes, we refer to the differences in weights as weight losses and the observed weight losses are shown below.
Low Calorie | Low Fat | Low Carbohydrate | Control |
---|---|---|---|
8 | 2 | 3 | 2 |
9 | 4 | 5 | 2 |
6 | 3 | 4 | -1 |
7 | 5 | 2 | 0 |
3 | 1 | 3 | 3 |
Is there a statistically significant difference in the mean weight loss among the four diets? We will run the ANOVA using the five-step approach.
H 0 : μ 1 = μ 2 = μ 3 = μ 4 H 1 : Means are not all equal α=0.05
The test statistic is the F statistic for ANOVA, F=MSB/MSE.
The appropriate critical value can be found in a table of probabilities for the F distribution(see "Other Resources"). In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k. In this example, df 1 =k-1=4-1=3 and df 2 =N-k=20-4=16. The critical value is 3.24 and the decision rule is as follows: Reject H 0 if F > 3.24.
To organize our computations we complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean based on the total sample.
| Low Calorie | Low Fat | Low Carbohydrate | Control |
---|---|---|---|---|
n | 5 | 5 | 5 | 5 |
Group mean | 6.6 | 3.0 | 3.4 | 1.2 |
We can now compute
So, in this case:
Next we compute,
SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants in the low calorie diet:
| 6.6 |
|
---|---|---|
8 | 1.4 | 2.0 |
9 | 2.4 | 5.8 |
6 | -0.6 | 0.4 |
7 | 0.4 | 0.2 |
3 | -3.6 | 13.0 |
Totals | 0 | 21.4 |
For the participants in the low fat diet:
| 3.0 |
|
---|---|---|
2 | -1.0 | 1.0 |
4 | 1.0 | 1.0 |
3 | 0.0 | 0.0 |
5 | 2.0 | 4.0 |
1 | -2.0 | 4.0 |
Totals | 0 | 10.0 |
For the participants in the low carbohydrate diet:
|
|
|
---|---|---|
3 | -0.4 | 0.2 |
5 | 1.6 | 2.6 |
4 | 0.6 | 0.4 |
2 | -1.4 | 2.0 |
3 | -0.4 | 0.2 |
Totals | 0 | 5.4 |
For the participants in the control group:
|
|
|
---|---|---|
2 | 0.8 | 0.6 |
2 | 0.8 | 0.6 |
-1 | -2.2 | 4.8 |
0 | -1.2 | 1.4 |
3 | 1.8 | 3.2 |
Totals | 0 | 10.6 |
We can now construct the ANOVA table .
Source of Variation | Sums of Squares (SS) | Degrees of Freedom (df) | Means Squares (MS) | F |
---|---|---|---|---|
Between Treatmenst | 75.8 | 4-1=3 | 75.8/3=25.3 | 25.3/3.0=8.43 |
Error (or Residual) | 47.4 | 20-4=16 | 47.4/16=3.0 | |
Total | 123.2 | 20-1=19 |
We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to show that there is a difference in mean weight loss among the four diets.
ANOVA is a test that provides a global assessment of a statistical difference in more than two independent means. In this example, we find that there is a statistically significant difference in mean weight loss among the four diets considered. In addition to reporting the results of the statistical test of hypothesis (i.e., that there is a statistically significant difference in mean weight losses at α=0.05), investigators should also report the observed sample means to facilitate interpretation of the results. In this example, participants in the low calorie diet lost an average of 6.6 pounds over 8 weeks, as compared to 3.0 and 3.4 pounds in the low fat and low carbohydrate groups, respectively. Participants in the control group lost an average of 1.2 pounds which could be called the placebo effect because these participants were not participating in an active arm of the trial specifically targeted for weight loss. Are the observed weight losses clinically meaningful?
Calcium is an essential mineral that regulates the heart, is important for blood clotting and for building healthy bones. The National Osteoporosis Foundation recommends a daily calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained in some foods, most adults do not get enough calcium in their diets and take supplements. Unfortunately some of the supplements have side effects such as gastric distress, making them difficult for some patients to take on a regular basis.
A study is designed to test whether there is a difference in mean daily calcium intake in adults with normal bone density, adults with osteopenia (a low bone density which may lead to osteoporosis) and adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and osteoporosis are selected at random from hospital records and invited to participate in the study. Each participant's daily calcium intake is measured based on reported food intake and supplements. The data are shown below.
|
|
|
---|---|---|
1200 | 1000 | 890 |
1000 | 1100 | 650 |
980 | 700 | 1100 |
900 | 800 | 900 |
750 | 500 | 400 |
800 | 700 | 350 |
Is there a statistically significant difference in mean calcium intake in patients with normal bone density as compared to patients with osteopenia and osteoporosis? We will run the ANOVA using the five-step approach.
H 0 : μ 1 = μ 2 = μ 3 H 1 : Means are not all equal α=0.05
In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k. In this example, df 1 =k-1=3-1=2 and df 2 =N-k=18-3=15. The critical value is 3.68 and the decision rule is as follows: Reject H 0 if F > 3.68.
To organize our computations we will complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean.
Normal Bone Density |
|
|
---|---|---|
n =6 | n =6 | n =6 |
|
|
|
If we pool all N=18 observations, the overall mean is 817.8.
We can now compute:
Substituting:
SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants with normal bone density:
|
|
|
---|---|---|
1200 | 261.6667 | 68,486.9 |
1000 | 61.6667 | 3,806.9 |
980 | 41.6667 | 1,738.9 |
900 | -38.3333 | 1,466.9 |
750 | -188.333 | 35,456.9 |
800 | -138.333 | 19,126.9 |
Total | 0 | 130,083.3 |
For participants with osteopenia:
|
|
|
---|---|---|
1000 | 200 | 40,000 |
1100 | 300 | 90,000 |
700 | -100 | 10,000 |
800 | 0 | 0 |
500 | -300 | 90,000 |
700 | -100 | 10,000 |
Total | 0 | 240,000 |
For participants with osteoporosis:
|
|
|
---|---|---|
890 | 175 | 30,625 |
650 | -65 | 4,225 |
1100 | 385 | 148,225 |
900 | 185 | 34,225 |
400 | -315 | 99,225 |
350 | -365 | 133,225 |
Total | 0 | 449,750 |
|
|
|
|
|
---|---|---|---|---|
Between Treatments | 152,477.7 | 2 | 76,238.6 | 1.395 |
Error or Residual | 819,833.3 | 15 | 54,655.5 | |
Total | 972,311.0 | 17 |
We do not reject H 0 because 1.395 < 3.68. We do not have statistically significant evidence at a =0.05 to show that there is a difference in mean calcium intake in patients with normal bone density as compared to osteopenia and osterporosis. Are the differences in mean calcium intake clinically meaningful? If so, what might account for the lack of statistical significance?
The video below by Mike Marin demonstrates how to perform analysis of variance in R. It also covers some other statistical issues, but the initial part of the video will be useful to you.
The ANOVA tests described above are called one-factor ANOVAs. There is one treatment or grouping factor with k > 2 levels and we wish to compare the means across the different categories of this factor. The factor might represent different diets, different classifications of risk for disease (e.g., osteoporosis), different medical treatments, different age groups, or different racial/ethnic groups. There are situations where it may be of interest to compare means of a continuous outcome across two or more factors. For example, suppose a clinical trial is designed to compare five different treatments for joint pain in patients with osteoarthritis. Investigators might also hypothesize that there are differences in the outcome by sex. This is an example of a two-factor ANOVA where the factors are treatment (with 5 levels) and sex (with 2 levels). In the two-factor ANOVA, investigators can assess whether there are differences in means due to the treatment, by sex or whether there is a difference in outcomes by the combination or interaction of treatment and sex. Higher order ANOVAs are conducted in the same way as one-factor ANOVAs presented here and the computations are again organized in ANOVA tables with more rows to distinguish the different sources of variation (e.g., between treatments, between men and women). The following example illustrates the approach.
Consider the clinical trial outlined above in which three competing treatments for joint pain are compared in terms of their mean time to pain relief in patients with osteoarthritis. Because investigators hypothesize that there may be a difference in time to pain relief in men versus women, they randomly assign 15 participating men to one of the three competing treatments and randomly assign 15 participating women to one of the three competing treatments (i.e., stratified randomization). Participating men and women do not know to which treatment they are assigned. They are instructed to take the assigned medication when they experience joint pain and to record the time, in minutes, until the pain subsides. The data (times to pain relief) are shown below and are organized by the assigned treatment and sex of the participant.
Table of Time to Pain Relief by Treatment and Sex
|
|
|
---|---|---|
| 12 | 21 |
15 | 19 | |
16 | 18 | |
17 | 24 | |
14 | 25 | |
| 14 | 21 |
17 | 20 | |
19 | 23 | |
20 | 27 | |
17 | 25 | |
| 25 | 37 |
27 | 34 | |
29 | 36 | |
24 | 26 | |
22 | 29 |
The analysis in two-factor ANOVA is similar to that illustrated above for one-factor ANOVA. The computations are again organized in an ANOVA table, but the total variation is partitioned into that due to the main effect of treatment, the main effect of sex and the interaction effect. The results of the analysis are shown below (and were generated with a statistical computing package - here we focus on interpretation).
ANOVA Table for Two-Factor ANOVA
|
|
|
|
|
|
---|---|---|---|---|---|
Model | 967.0 | 5 | 193.4 | 20.7 | 0.0001 |
Treatment | 651.5 | 2 | 325.7 | 34.8 | 0.0001 |
Sex | 313.6 | 1 | 313.6 | 33.5 | 0.0001 |
Treatment * Sex | 1.9 | 2 | 0.9 | 0.1 | 0.9054 |
Error or Residual | 224.4 | 24 | 9.4 | ||
Total | 1191.4 | 29 |
There are 4 statistical tests in the ANOVA table above. The first test is an overall test to assess whether there is a difference among the 6 cell means (cells are defined by treatment and sex). The F statistic is 20.7 and is highly statistically significant with p=0.0001. When the overall test is significant, focus then turns to the factors that may be driving the significance (in this example, treatment, sex or the interaction between the two). The next three statistical tests assess the significance of the main effect of treatment, the main effect of sex and the interaction effect. In this example, there is a highly significant main effect of treatment (p=0.0001) and a highly significant main effect of sex (p=0.0001). The interaction between the two does not reach statistical significance (p=0.91). The table below contains the mean times to pain relief in each of the treatments for men and women (Note that each sample mean is computed on the 5 observations measured under that experimental condition).
Mean Time to Pain Relief by Treatment and Gender
|
|
|
---|---|---|
A | 14.8 | 21.4 |
B | 17.4 | 23.2 |
C | 25.4 | 32.4 |
Treatment A appears to be the most efficacious treatment for both men and women. The mean times to relief are lower in Treatment A for both men and women and highest in Treatment C for both men and women. Across all treatments, women report longer times to pain relief (See below).
Notice that there is the same pattern of time to pain relief across treatments in both men and women (treatment effect). There is also a sex effect - specifically, time to pain relief is longer in women in every treatment.
Suppose that the same clinical trial is replicated in a second clinical site and the following data are observed.
Table - Time to Pain Relief by Treatment and Sex - Clinical Site 2
|
|
|
---|---|---|
| 22 | 21 |
25 | 19 | |
26 | 18 | |
27 | 24 | |
24 | 25 | |
| 14 | 21 |
17 | 20 | |
19 | 23 | |
20 | 27 | |
17 | 25 | |
| 15 | 37 |
17 | 34 | |
19 | 36 | |
14 | 26 | |
12 | 29 |
The ANOVA table for the data measured in clinical site 2 is shown below.
Table - Summary of Two-Factor ANOVA - Clinical Site 2
Source of Variation | Sums of Squares (SS) | Degrees of freedom (df) | Mean Squares (MS) | F | P-Value |
---|---|---|---|---|---|
Model | 907.0 | 5 | 181.4 | 19.4 | 0.0001 |
Treatment | 71.5 | 2 | 35.7 | 3.8 | 0.0362 |
Sex | 313.6 | 1 | 313.6 | 33.5 | 0.0001 |
Treatment * Sex | 521.9 | 2 | 260.9 | 27.9 | 0.0001 |
Error or Residual | 224.4 | 24 | 9.4 | ||
Total | 1131.4 | 29 |
Notice that the overall test is significant (F=19.4, p=0.0001), there is a significant treatment effect, sex effect and a highly significant interaction effect. The table below contains the mean times to relief in each of the treatments for men and women.
Table - Mean Time to Pain Relief by Treatment and Gender - Clinical Site 2
|
|
|
---|---|---|
| 24.8 | 21.4 |
| 17.4 | 23.2 |
| 15.4 | 32.4 |
Notice that now the differences in mean time to pain relief among the treatments depend on sex. Among men, the mean time to pain relief is highest in Treatment A and lowest in Treatment C. Among women, the reverse is true. This is an interaction effect (see below).
Notice above that the treatment effect varies depending on sex. Thus, we cannot summarize an overall treatment effect (in men, treatment C is best, in women, treatment A is best).
When interaction effects are present, some investigators do not examine main effects (i.e., do not test for treatment effect because the effect of treatment depends on sex). This issue is complex and is discussed in more detail in a later module.
What is the One-Way ANOVA?
One-Way ANOVA, standing for Analysis Of Variance, is a statistical method used to determine if there are significant differences between the averages of two or more unrelated groups. This technique is particularly useful when you want to compare the effect of one single factor (independent variable) across different groups on a specific outcome (dependent variable).
While ANOVA is primarily used to compare differences, it often goes a step further by exploring cause-and-effect relationships. It suggests that the differences observed among groups are due to one or more controlled factors. Essentially, these factors categorize the data points into groups, leading to variations in the average outcomes of these groups.
Imagine we’re curious about whether there’s a difference in hair length between genders. We gather a group of twenty undergraduate students, half identified as female and half as male, and measure their hair length.
Most statisticians lean towards the assertive approach, viewing ANOVA as a tool for analyzing dependencies. This perspective sees ANOVA as not just comparing averages but testing the influence of one or more factors on an outcome. In statistical language, it’s about examining how independent variables (like gender in our example) affect dependent variables (such as hair length), assuming a functional relationship (Y = f(x1, x2, x3, … xn)).
In essence, One-Way ANOVA is a powerful method for not only identifying significant differences between groups but also for hinting at potential underlying causes for these differences. It’s a foundational tool in the statistical analysis of data, enabling researchers to draw meaningful conclusions about the effects of various factors on specific outcomes.
Option 1: User-friendly Software
Transform raw data to written interpreted results in seconds.
Option 2: Professional Statistician
Collaborate with a statistician to complete and understand your results.
The ANOVA is a popular test; it is the test to use when conducting experiments. This is due to the fact that it only requires a nominal scale for the independent variables – other multivariate tests (e.g., regression analysis) require a continuous-level scale. This following table shows the required scales for some selected tests.
Independent Variable | |||
Metric | Non-metric | ||
DependentVariable | metric | Regression | |
Non-metric | Discriminant Analysis | χ² (Chi-Square) |
The F-test, the T-test, and the MANOVA are all similar to the ANOVA. The F-test is another name for an ANOVA that only compares the statistical means in two groups. This happens if the independent variable for the ANOVA has only two factor steps, for example male or female as a gender.
The T-test compares the means of two (and only two) groups when the variances are not equal. The equality of variances (also called homoscedasticity or homogeneity) is one of the main assumptions of the ANOVA (see assumptions, Levene Test, Bartlett Test). MANOVA stands for M ultivariate An alysis o f Va riance. Whereas the ANOVA can have one or more independent variables, it always has only one dependent variable. On the other hand the MANOVA can have two or more dependent variables.
Examples for typical questions the ANOVA answers are as follows:
The One-Way ANOVA in SPSS
Let’s consider our research question from the Education studies example. Do the standardized math test scores differ between students that passed the exam and students that failed the final exam? This question indicates that our independent variable is the exam result (fail vs. pass) and our dependent variable is the score from the math test. We must now check the assumptions.
First we examine the multivariate normality of the dependent variable. We can check graphically either with a histogram ( Analyze/Descriptive Statistics/Frequencies… and then in the menu Charts…) or with a Q-Q-Plot ( Analyze/Descriptive Statistics/Q-Q-Plot… ). Both plots show a somewhat normal distribution, with a skew around the mean.
Secondly, we can test for multivariate normality with the Kolmogorov-Smirnov goodness of fit test ( Analyze/Nonparacontinuous-level Test/Legacy Dialogs/1 Sample K S… ). An alternative to the K-S test is the Chi-Square goodness of fit test, but the K-S test is more robust for continuous-level variables.
The K-S test is not significant (p = 0.075) thus we cannot reject the null hypothesis that the sample distribution is multivariate normal. The K-S test is one of the few tests where a non-significant result (p > 0.05) is the desired outcome.
If normality is not present, we could exclude the outliers to fix the problem, center the variable by deducting the mean, or apply a non-linear transformation to the variable creating an index.
The ANOVA can be found in SPSS in Analyze/Compare Means/One Way ANOVA .
In the ANOVA dialog we need to specify our model. As described in the research question we want to test, the math test score is our dependent variable and the exam result is our independent variable. This would be enough for a basic analysis. But the dialog box has a couple more options around Contrasts, post hoc tests (also called multiple comparisons), and Options.
In the dialog box options we can specify additional statistics. If you find it useful you might include standard descriptive statistics. Generally you should select the Homogeneity of variance test (which is the Levene test of homoscedasticity), because as we find in our decision tree the outcome of this test is the criterion that decides between the t-test and the ANOVA.
Post Hoc Tests
Post Hoc tests are useful if your independent variable includes more than two groups. In our example the independent variable just specifies the outcome of the final exam on two factor levels – pass or fail. If more than two factor levels are given it might be useful to run pairwise tests to test which differences between groups are significant. Because executing several pairwise tests in one analysis decreases the degrees of freedom, the Bonferoni adjustment should be selected, which corrects for multiple pairwise comparisons. Another test method commonly employed is the Student-Newman-Keuls test (or short S-N-K), which pools the groups that do not differ significantly from each other. Therefore this improves the reliability of the post hoc comparison because it increases the sample size used in the comparison.
The last dialog box is contrasts. Contrasts are differences in mean scores. It allows you to group multiple groups into one and test the average mean of the two groups against our third group. Please note that the contrast is not always the mean of the pooled groups! Contrast = (mean first group + mean second group)/2. It is only equal to the pooled mean, if the groups are of equal size. It is also possible to specify weights for the contrasts, e.g., 0.7 for group 1 and 0.3 for group 2. We do not specify contrasts for this demonstration.
Check out our online course for conducting an ANOVA here .
Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:
Data Analysis Plan
Edit your research questions and null/alternative hypotheses
Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
Justify your sample size/power analysis, provide references
Explain your data analysis plan to you so you are comfortable and confident
Two hours of additional support with your statistician
Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )
Clean and code dataset
Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
Conduct analyses to examine each of your research questions
Write-up results
Provide APA 7 th edition tables and figures
Explain Chapter 4 findings
Ongoing support for entire results chapter statistics
Please call 727-442-4290 to request a quote based on the specifics of your research, schedule using the calendar on this page, or email [email protected]
Matthew J. C. Crump
A fun bit of stats history ( Salsburg 2001 ) . Sir Ronald Fisher invented the ANOVA, which we learn about in this section. He wanted to publish his new test in the journal Biometrika. The editor at the time was Karl Pearson (remember Pearson’s \(r\) for correlation?). Pearson and Fisher were apparently not on good terms, they didn’t like each other. Pearson refused to publish Fisher’s new test. So, Fisher eventually published his work in the Journal of Agricultural Science. Funnily enough, the feud continued onto the next generation. Years after Fisher published his ANOVA, Karl Pearson’s son Egon Pearson, and Jersey Neyman revamped Fisher’s ideas, and re-cast them into what is commonly known as null vs. alternative hypothesis testing. Fisher didn’t like this very much.
We present the ANOVA in the Fisherian sense, and at the end describe the Neyman-Pearson approach that invokes the concept of null vs. alternative hypotheses.
ANOVA stands for Analysis Of Variance. It is a widely used technique for assessing the likelihood that differences found between means in sample data could be produced by chance. You might be thinking, well don’t we have \(t\) -tests for that? Why do we need the ANOVA, what do we get that’s new that we didn’t have before?
What’s new with the ANOVA, is the ability to test a wider range of means beyond just two. In all of the \(t\) -test examples we were always comparing two things. For example, we might ask whether the difference between two sample means could have been produced by chance. What if our experiment had more than two conditions or groups? We would have more than 2 means. We would have one mean for each group or condition. That could be a lot depending on the experiment. How would we compare all of those means? What should we do, run a lot of \(t\) -tests, comparing every possible combination of means? Actually, you could do that. Or, you could do an ANOVA.
In practice, we will combine both the ANOVA test and \(t\) -tests when analyzing data with many sample means (from more than two groups or conditions). Just like the \(t\) -test, there are different kinds of ANOVAs for different research designs. There is one for between-subjects designs, and a slightly different one for repeated measures designs. We talk about both, beginning with the ANOVA for between-subjects designs.
The one-factor ANOVA is sometimes also called a between-subjects ANOVA, an independent factor ANOVA, or a one-way ANOVA (which is a bit of a misnomer as we discuss later). The critical ingredient for a one-factor, between-subjects ANOVA, is that you have one independent variable, with at least two-levels. When you have one IV with two levels, you can run a \(t\) -test. You can also run an ANOVA. Interestingly, they give you almost the exact same results. You will get a \(p\) -value from both tests that is identical (they are really doing the same thing under the hood). The \(t\) -test gives a \(t\) -value as the important sample statistic. The ANOVA gives you the \(F\) -value (for Fisher, the inventor of the test) as the important sample statistic. It turns out that \(t^2\) equals \(F\) , when there are only two groups in the design. They are the same test. Side-note, it turns out they are all related to Pearson’s r too (but we haven’t written about this relationship yet in this textbook).
Remember that \(t\) is computed directly from the data. It’s like a mean and standard error that we measure from the sample. In fact it’s the mean difference divided by the standard error of the sample. It’s just another descriptive statistic isn’t it.
The same thing is true about \(F\) . \(F\) is computed directly from the data. In fact, the idea behind \(F\) is the same basic idea that goes into making \(t\) . Here is the general idea behind the formula, it is again a ratio of the effect we are measuring (in the numerator), and the variation associated with the effect (in the denominator).
\(\text{name of statistic} = \frac{\text{measure of effect}}{\text{measure of error}}\)
\(\text{F} = \frac{\text{measure of effect}}{\text{measure of error}}\)
The difference with \(F\) , is that we use variances to describe both the measure of the effect and the measure of error. So, \(F\) is a ratio of two variances.
Remember what we said about how these ratios work. When the variance associated with the effect is the same size as the variance associated with sampling error, we will get two of the same numbers, this will result in an \(F\) -value of 1. When the variance due to the effect is larger than the variance associated with sampling error, then \(F\) will be greater than 1. When the variance associated with the effect is smaller than the variance associated with sampling error, \(F\) will be less than one.
Let’s rewrite in plainer English. We are talking about two concepts that we would like to measure from our data. 1) A measure of what we can explain, and 2) a measure of error, or stuff about our data we can’t explain. So, the \(F\) formula looks like this:
\(\text{F} = \frac{\text{Can Explain}}{\text{Can't Explain}}\)
When we can explain as much as we can’t explain, \(F\) = 1. This isn’t that great of a situation for us to be in. It means we have a lot of uncertainty. When we can explain much more than we can’t we are doing a good job, \(F\) will be greater than 1. When we can explain less than what we can’t, we really can’t explain very much, \(F\) will be less than 1. That’s the concept behind making \(F\) .
If you saw an \(F\) in the wild, and it was .6. Then you would automatically know the researchers couldn’t explain much of their data. If you saw an \(F\) of 5, then you would know the researchers could explain 5 times more than the couldn’t, that’s pretty good. And the point of this is to give you an intuition about the meaning of an \(F\) -value, even before you know how to compute it.
Fisher’s ANOVA is very elegant in my opinion. It starts us off with a big problem we always have with data. We have a lot of numbers, and there is a lot of variation in the numbers, what to do? Wouldn’t it be nice to split up the variation into to kinds, or sources. If we could know what parts of the variation were being caused by our experimental manipulation, and what parts were being caused by sampling error, we would be making really good progress. We would be able to know if our experimental manipulation was causing more change in the data than sampling error, or chance alone. If we could measure those two parts of the total variation, we could make a ratio, and then we would have an \(F\) value. This is what the ANOVA does. It splits the total variation in the data into two parts. The formula is:
Total Variation = Variation due to Manipulation + Variation due to sampling error
This is a nice idea, but it is also vague. We haven’t specified our measure of variation. What should we use?
Remember the sums of squares that we used to make the variance and the standard deviation? That’s what we’ll use. Let’s take another look at the formula, using sums of squares for the measure of variation:
\(SS_\text{total} = SS_\text{Effect} + SS_\text{Error}\)
The total sums of squares, or \(SS\text{Total}\) is a way of thinking about all of the variation in a set of data. It’s pretty straightforward to measure. No tricky business. All we do is find the difference between each score and the grand mean, then we square the differences and add them all up.
Let’s imagine we had some data in three groups, A, B, and C. For example, we might have 3 scores in each group. The data could look like this:
groups | scores | diff | diff_squared |
---|---|---|---|
A | 20 | 13 | 169 |
A | 11 | 4 | 16 |
A | 2 | -5 | 25 |
B | 6 | -1 | 1 |
B | 2 | -5 | 25 |
B | 7 | 0 | 0 |
C | 2 | -5 | 25 |
C | 11 | 4 | 16 |
C | 2 | -5 | 25 |
Sums | 63 | 0 | 302 |
Means | 7 | 0 | 33.5555555555556 |
The data is organized in long format, so that each row is a single score. There are three scores for the A, B, and C groups. The mean of all of the scores is called the Grand Mean . It’s calculated in the table, the Grand Mean = 7.
We also calculated all of the difference scores from the Grand Mean . The difference scores are in the column titled diff . Next, we squared the difference scores, and those are in the next column called diff_squared .
Remember, the difference scores are a way of measuring variation. They represent how far each number is from the Grand Mean. If the Grand Mean represents our best guess at summarizing the data, the difference scores represent the error between the guess and each actual data point. The only problem with the difference scores is that they sum to zero (because the mean is the balancing point in the data). So, it is convenient to square the difference scores, this turns all of them into positive numbers. The size of the squared difference scores still represents error between the mean and each score. And, the squaring operation exacerbates the differences as the error grows larger (squaring a big number makes a really big number, squaring a small number still makes a smallish number).
OK fine! We have the squared deviations from the grand mean, we know that they represent the error between the grand mean and each score. What next? SUM THEM UP!
When you add up all of the individual squared deviations (difference scores) you get the sums of squares. That’s why it’s called the sums of squares (SS).
Now, we have the first part of our answer:
\(SS_\text{total} = 302\) and
\(302 = SS_\text{Effect} + SS_\text{Error}\)
What next? If you think back to what you learned about algebra, and solving for X, you might notice that we don’t really need to find the answers to both missing parts of the equation. We only need one, and we can solve for the other. For example, if we found \(SS_\text{Effect}\) , then we could solve for \(SS_\text{Error}\) .
\(SS_\text{Total}\) gave us a number representing all of the change in our data, how all the scores are different from the grand mean.
What we want to do next is estimate how much of the total change in the data might be due to the experimental manipulation. For example, if we ran an experiment that causes causes change in the measurement, then the means for each group will be different from other. As a result, the manipulation forces change onto the numbers, and this will naturally mean that some part of the total variation in the numbers is caused by the manipulation.
The way to isolate the variation due to the manipulation (also called effect) is to look at the means in each group, and calculate the difference scores between each group mean and the grand mean, and then sum the squared deviations to find \(SS_\text{Effect}\) .
Consider this table, showing the calculations for \(SS_\text{Effect}\) .
groups | scores | means | diff | diff_squared |
---|---|---|---|---|
A | 20 | 11 | 4 | 16 |
A | 11 | 11 | 4 | 16 |
A | 2 | 11 | 4 | 16 |
B | 6 | 5 | -2 | 4 |
B | 2 | 5 | -2 | 4 |
B | 7 | 5 | -2 | 4 |
C | 2 | 5 | -2 | 4 |
C | 11 | 5 | -2 | 4 |
C | 2 | 5 | -2 | 4 |
Sums | 63 | 63 | 0 | 72 |
Means | 7 | 7 | 0 | 8 |
Notice we created a new column called means . For example, the mean for group A was 11. You can see there are three 11s, one for each observation in row A. The means for group B and C happen to both be 5. So, the rest of the numbers in the means column are 5s.
What we are doing here is thinking of each score in the data from the viewpoint of the group means. The group means are our best attempt to summarize the data in those groups. From the point of view of the mean, all of the numbers are treated as the same. The mean doesn’t know how far off it is from each score, it just knows that all of the scores are centered on the mean.
Let’s pretend you are the mean for group A. That means you are an 11. Someone asks you “hey, what’s the score for the first data point in group A?”. Because you are the mean, you say, I know that, it’s 11. “What about the second score?”…it’s 11… they’re all 11, so far as I can tell…“Am I missing something…”, asked the mean.
Now that we have converted each score to it’s mean value we can find the differences between each mean score and the grand mean, then square them, then sum them up. We did that, and found that the \(SS_\text{Effect} = 72\) .
\(SS_\text{Effect}\) represents the amount of variation that is caused by differences between the means. I also refer to this as the amount of variation that the researcher can explain (by the means, which represent differences between groups or conditions that were manipulated by the researcher).
Notice also that \(SS_\text{Effect} = 72\) , and that 72 is smaller than \(SS_\text{total} = 302\) . That is very important. \(SS_\text{Effect}\) by definition can never be larger than \(SS_\text{total}\) .
Great, we made it to SS Error. We already found SS Total, and SS Effect, so now we can solve for SS Error just like this:
switching around:
$ SS_ = SS_ - SS_ $
$ SS_ = 302 - 72 = 230 $
We could stop here and show you the rest of the ANOVA, we’re almost there. But, the next step might not make sense unless we show you how to calculate \(SS_\text{Error}\) directly from the data, rather than just solving for it. We should do this just to double-check our work anyway.
groups | scores | means | diff | diff_squared |
---|---|---|---|---|
A | 20 | 11 | -9 | 81 |
A | 11 | 11 | 0 | 0 |
A | 2 | 11 | 9 | 81 |
B | 6 | 5 | -1 | 1 |
B | 2 | 5 | 3 | 9 |
B | 7 | 5 | -2 | 4 |
C | 2 | 5 | 3 | 9 |
C | 11 | 5 | -6 | 36 |
C | 2 | 5 | 3 | 9 |
Sums | 63 | 63 | 0 | 230 |
Means | 7 | 7 | 0 | 25.5555555555556 |
Alright, we did almost the same thing as we did to find \(SS_\text{Effect}\) . Can you spot the difference? This time for each score we first found the group mean, then we found the error in the group mean estimate for each score. In other words, the values in the \(diff\) column are the differences between each score and it’s group mean. The values in the diff_squared column are the squared deviations. When we sum up the squared deviations, we get another Sums of Squares, this time it’s the \(SS_\text{Error}\) . This is an appropriate name, because these deviations are the ones that the group means can’t explain!
Degrees of freedom come into play again with ANOVA. This time, their purpose is a little bit more clear. \(Df\) s can be fairly simple when we are doing a relatively simple ANOVA like this one, but they can become complicated when designs get more complicated.
Let’s talk about the degrees of freedom for the \(SS_\text{Effect}\) and \(SS_\text{Error}\) .
The formula for the degrees of freedom for \(SS_\text{Effect}\) is
\(df_\text{Effect} = \text{Groups} -1\) , where Groups is the number of groups in the design.
In our example, there are 3 groups, so the df is 3-1 = 2. You can think of the df for the effect this way. When we estimate the grand mean (the overall mean), we are taking away a degree of freedom for the group means. Two of the group means can be anything they want (they have complete freedom), but in order for all three to be consistent with the Grand Mean, the last group mean has to be fixed.
The formula for the degrees of freedom for \(SS_\text{Error}\) is
\(df_\text{Error} = \text{scores} - \text{groups}\) , or the number of scores minus the number of groups. We have 9 scores and 3 groups, so our \(df\) for the error term is 9-3 = 6. Remember, when we computed the difference score between each score and its group mean, we had to compute three means (one for each group) to do that. So, that reduces the degrees of freedom by 3. 6 of the difference scores could be anything they want, but the last 3 have to be fixed to match the means from the groups.
OK, so we have the degrees of freedom. What’s next? There are two steps left. First we divide the \(SS\) es by their respective degrees of freedom to create something new called Mean Squared Error. Let’s talk about why we do this.
First of all, remember we are trying to accomplish this goal:
We want to build a ratio that divides a measure of an effect by a measure of error. Perhaps you noticed that we already have a measure of an effect and error! How about the \(SS_\text{Effect}\) and \(SS_\text{Error}\) . They both represent the variation due to the effect, and the leftover variation that is unexplained. Why don’t we just do this?
\(\frac{SS_\text{Effect}}{SS_\text{Error}}\)
Well, of course you could do that. What would happen is you can get some really big and small numbers for your inferential statistic. And, the kind of number you would get wouldn’t be readily interpretable like a \(t\) value or a \(z\) score.
The solution is to normalize the \(SS\) terms. Don’t worry, normalize is just a fancy word for taking the average, or finding the mean. Remember, the SS terms are all sums. And, each sum represents a different number of underlying properties.
For example, the SS_ represents the sum of variation for three means in our study. We might ask the question, well, what is the average amount of variation for each mean…You might think to divide SS_ by 3, because there are three means, but because we are estimating this property, we divide by the degrees of freedom instead (# groups - 1 = 3-1 = 2). Now we have created something new, it’s called the \(MSE_\text{Effect}\) .
\(MSE_\text{Effect} = \frac{SS_\text{Effect}}{df_\text{Effect}}\)
\(MSE_\text{Effect} = \frac{72}{2} = 36\)
This might look alien and seem a bit complicated. But, it’s just another mean. It’s the mean of the sums of squares for the effect. If this reminds you of the formula for the variance, good memory. The \(SME_\text{Effect}\) is a measure variance for the change in the data due to changes in the means (which are tied to the experimental conditions).
The \(SS_\text{Error}\) represents the sum of variation for nine scores in our study. That’s a lot more scores, so the \(SS_\text{Error}\) is often way bigger than than \(SS_\text{Effect}\) . If we left our SSes this way and divided them, we would almost always get numbers less than one, because the \(SS_\text{Error}\) is so big. What we need to do is bring it down to the average size. So, we might want to divide our \(SS_\text{Error}\) by 9, after all there were nine scores. However, because we are estimating this property, we divide by the degrees of freedom instead (scores-groups) = 9-3 = 6). Now we have created something new, it’s called the \(MSE_\text{Error}\) .
\(MSE_\text{Error} = \frac{SS_\text{Error}}{df_\text{Error}}\)
\(MSE_\text{Error} = \frac{230}{6} = 38.33\)
Now that we have done all of the hard work, calculating \(F\) is easy:
\(\text{F} = \frac{MSE_\text{Effect}}{MSE_\text{Error}}\)
\(\text{F} = \frac{36}{38.33} = .939\)
You might suspect we aren’t totally done here. We’ve walked through the steps of computing \(F\) . Remember, \(F\) is a sample statistic, we computed \(F\) directly from the data. There were a whole bunch of pieces we needed, the dfs, the SSes, the MSEs, and then finally the F.
All of these little pieces are conveniently organized by ANOVA tables. ANOVA tables look like this:
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
groups | 2 | 72 | 36.00000 | 0.9391304 | 0.4417359 |
Residuals | 6 | 230 | 38.33333 | NA | NA |
You are looking at the print-out of an ANOVA summary table from R. Notice, it had columns for \(Df\) , \(SS\) (Sum Sq), \(MSE\) (Mean Sq), \(F\) , and a \(p\) -value. There are two rows. The groups row is for the Effect (what our means can explain). The Residuals row is for the Error (what our means can’t explain). Different programs give slightly different labels, but they are all attempting to present the same information in the ANOVA table. There isn’t anything special about the ANOVA table, it’s just a way of organizing all the pieces. Notice, the MSE for the effect (36) is placed above the MSE for the error (38.333), and this seems natural because we divide 36/38.33 in or to get the \(F\) -value!
We’ve just noted that the ANOVA has a bunch of numbers that we calculated straight from the data. All except one, the \(p\) -value. We did not calculate the \(p\) -value from the data. Where did it come from, what does it mean? How do we use this for statistical inference. Just so you don’t get too worried, the \(p\) -value for the ANOVA has the very same general meaning as the \(p\) -value for the \(t\) -test, or the \(p\) -value for any sample statistic. It tells us that the probability that we would observe our test statistic or larger, under the distribution of no differences (the null).
As we keep saying, \(F\) is a sample statistic. Can you guess what we do with sample statistics in this textbook? We did it for the Crump Test, the Randomization Test, and the \(t\) -test… We make fake data, we simulate it, we compute the sample statistic we are interested in, then we see how it behaves over many replications or simulations.
Let’s do that for \(F\) . This will help you understand what \(F\) really is, and how it behaves. We are going to created the sampling distribution of \(F\) . Once we have that you will be able to see where the \(p\) -values come from. It’s the same basic process that we followed for the \(t\) tests, except we are measuring \(F\) instead of \(t\) .
Here is the set-up, we are going to run an experiment with three levels. In our imaginary experiment we are going to test whether a new magic pill can make you smarter. The independent variable is the number of magic pills you take: 1, 2, or 3. We will measure your smartness using a smartness test. We will assume the smartness test has some known properties, the mean score on the test is 100, with a standard deviation of 10 (and the distribution is normal).
The only catch is that our magic pill does NOTHING AT ALL. The fake people in our fake experiment will all take sugar pills that do absolutely nothing to their smartness. Why would we want to simulate such a bunch of nonsense? The answer is that this kind of simulation is critical for making inferences about chance if you were to conduct a real experiment.
Here are some more details for the experiment. Each group will have 10 different subjects, so there will be a total of 30 subjects. We are going to run this experiment 10,000 times. Each time drawing numbers randomly from the very same normal distribution. We are going to calculate \(F\) from our sample data every time, and then we are going to draw the histogram of \(F\) -values. Figure 7.1 shows the sampling distribution of \(F\) for our situation.
Let’s note a couple things about the \(F\) distribution. 1) The smallest value is 0, and there are no negative values. Does this make sense? \(F\) can never be negative because it is the ratio of two variances, and variances are always positive because of the squaring operation. So, yes, it makes sense that the sampling distribution of \(F\) is always 0 or greater. 2) it does not look normal. No it does not. \(F\) can have many different looking shapes, depending on the degrees of freedom in the numerator and denominator. However, these aspects are too important for now.
Remember, before we talked about some intuitive ideas for understanding \(F\) , based on the idea that \(F\) is a ratio of what we can explain (variance due to mean differences), divided by what we can’t explain (the error variance). When the error variance is higher than the effect variance, then we will always get an \(F\) -value less than one. You can see that we often got \(F\) -values less than one in the simulation. This is sensible, after all we were simulating samples coming from the very same distribution. On average there should be no differences between the means. So, on average the part of the total variance that is explained by the means should be less than one, or around one, because it should be roughly the same as the amount of error variance (remember, we are simulating no differences).
At the same time, we do see that some \(F\) -values are larger than 1. There are little bars that we can see going all the way up to about 5. If you were to get an \(F\) -value of 5, you might automatically think, that’s a pretty big \(F\) -value. Indeed it kind of is, it means that you can explain 5 times more of variance than you can’t explain. That seems like a lot. You can also see that larger \(F\) -values don’t occur very often. As a final reminder, what you are looking at is how the \(F\) -statistic (measured from each of 10,000 simulated experiments) behaves when the only thing that can cause differences in the means is random sampling error. Just by chance sometimes the means will be different. You are looking at another chance window. These are the \(F\) s that chance can produce.
We can use the sampling distribution of \(F\) (for the null) to make decisions about the role of chance in a real experiment. For example, we could do the following.
Let’s do that. I’ve drawn the line for the critical value onto the histogram in Figure 7.2 :
Alright, now we can see that only 5% of all \(F\) -values from from this sampling distribution will be 3.35 or larger. We can use this information.
How would we use it? Imagine we ran a real version of this experiment. And, we really used some pills that just might change smartness. If we ran the exact same design, with 30 people in total (10 in each group), we could set an \(F\) criterion of 3.35 for determining whether any of our results reflected a causal change in smartness due to the pills, and not due to random chance. For example, if we found an \(F\) -value of 3.34, which happens, just less than 5% of the time, we might conclude that random sampling error did not produce the differences between our means. Instead, we might be more confident that the pills actually did something, after all an \(F\) -value of 3.34 doesn’t happen very often, it is unlikely (only 5 times out of 100) to occur by chance.
Up to here we have been building your intuition for understanding \(F\) . We went through the calculation of \(F\) from sample data. We went through the process of simulating thousands of \(F\) s to show you the null distribution. We have not talked so much about what researchers really care about…The MEANS! The actual results from the experiment. Were the means different? that’s often what people want to know. So, now we will talk about the means, and \(F\) , together.
Notice, if I told you I ran an experiment with three groups, testing whether some manipulation changes the behavior of the groups, and I told you that I found a big \(F\) !, say an \(F\) of 6!. And, that the \(F\) of 6 had a \(p\) -value of .001. What would you know based on that information alone? You would only know that Fs of 6 don’t happen very often by chance. In fact they only happen 0.1% of the time, that’s hardly at all. If someone told me those values, I would believe that the results they found in their experiment were not likely due to chance. However, I still would not know what the results of the experiment were! Nobody told us what the means were in the different groups, we don’t know what happened!
IMPORTANT: even though we don’t know what the means were, we do know something about them, whenever we get \(F\) -values and \(p\) -values like that (big \(F\) s, and very small associated \(p\) s)… Can you guess what we know? I’ll tell you. We automatically know that there must have been some differences between the means . If there was no differences between the means, then the variance explained by the means (the numerator for \(F\) ) would not be very large. So, we know that there must be some differences, we just don’t know what they are. Of course, if we had the data, all we would need to do is look at the means for the groups (the ANOVA table doesn’t report this, we need to do it as a separate step).
This property of the ANOVA is why the ANOVA is sometimes called the omnibus test . Omnibus is a fun word, it sounds like a bus I’d like to ride. The meaning of omnibus, according to the dictionary, is “comprising several items”. The ANOVA is, in a way, one omnibus test, comprising several little tests.
For example, if you had three groups, A, B, and C. You get could differences between
That’s three possible differences you could get. You could run separate \(t\) -tests, to test whether each of those differences you might have found could have been produced by chance. Or, you could run an ANOVA, like what we have been doing, to ask one more general question about the differences. Here is one way to think about what the omnibus test is testing:
Hypothesis of no differences anywhere: $ A = B = C $
Any differences anywhere:
The \(\neq\) symbol means “does not equal”, it’s an equal sign with a cross through it (no equals allowed!).
How do we put all of this together. Generally, when we get a small \(F\) -value, with a large \(p\) -value, we will not reject the hypothesis of no differences. We will say that we do not have evidence that the means of the three groups are in any way different, and the differences that are there could easily have been produced by chance. When we get a large F with a small \(p\) -value (one that is below our alpha criterion), we will generally reject the hypothesis of no differences. We would then assume that at least one group mean is not equal to one of the others. That is the omnibus test. Rejecting the null in this way is rejecting the idea there are no differences. But, the \(F\) test still does not tell you which of the possible group differences are the ones that are different.
We just ran 10,000 experiments and we didn’t even once look at the group means for any of the experiments. Different patterns of group means under the null are shown in Figure 7.3 for a subset of 10 random simulations.
Whoa, that’s a lot to look at. What is going on here? Each little box represents the outcome of a simulated experiment. The dots are the means for each group (whether subjects took 1 , 2, or 3 magic pills). The y-axis shows the mean smartness for each group. The error bars are standard errors of the mean.
You can see that each of the 10 experiments turn out different. Remember, we sampled 10 numbers for each group from the same normal distribution with mean = 100, and sd = 10. So, we know that the correct means for each sample should actually be 100 every single time. However, they are not 100 every single time because of?… sampling error (Our good friend that we talk about all the time).
For most of the simulations the error bars are all overlapping, this suggests visually that the means are not different. However, some of them look like they are not overlapping so much, and this would suggest that they are different. This is the siren song of chance (sirens lured sailors to their deaths at sea…beware of the siren call of chance). If we concluded that any of these sets of means had a true difference, we would be committing a type I error. Because we made the simulation, we know that none of these means are actually different. But, when you are running a real experiment, you don’t get to know this for sure.
Let’s look at the exact same graph as above, but this time use bars to visually illustrate the means, instead of dots. We’ll re-do our simulation of 10 experiments, so the pattern will be a little bit different:
In Figure 7.4 the heights of the bars display the means for each pill group. The pattern across simulations is generally the same. Some of the fake experiments look like there might be differences, and some of them don’t.
We are now giving you some visual experience looking at what means look like from a particular experiment. This is for your stats intuition. We’re trying to improve your data senses.
What we are going to do now is similar to what we did before. Except this time we are going to look at 10 simulated experiments, where all of the \(F\) -values were less than 1. All of these \(F\) -values would also be associated with fairly large \(p\) -values. When F is less than one, we would not reject the hypothesis of no differences. So, when we look at patterns of means when F is less than 1, we should see mostly the same means, and no big differences.
In Figure 7.5 the numbers in the panels now tell us which simulations actually produced \(F\) s of less than 1.
We see here that all the bars aren’t perfectly flat, that’s OK. What’s more important is that for each panel, the error bars for each mean are totally overlapping with all the other error bars. We can see visually that our estimate of the mean for each sample is about the same for all of the bars. That’s good, we wouldn’t make any type I errors here.
Earlier we found that the critical value for \(F\) in our situation was 3.35, this was the location on the \(F\) distribution where only 5% of \(F\) s were 3.35 or greater. We would reject the hypothesis of no differences whenever \(F\) was greater than 3.35. In this case, whenever we did that, we would be making a type I error. That is because we are simulating the distribution of no differences (remember all of our sample means are coming from the exact same distribution). So, now we can take a look at what type I errors look like. In other words, we can run some simulations and look at the pattern in the means, only when \(F\) happens to be 3.35 or greater (this only happens 5% of the time, so we might have to let the computer simulate for a while). Let’s see what that looks like:
The numbers in the panels now tell us which simulations actually produced \(F\) s that were greater than 3.35
What do you notice about the pattern of means inside each panel of Figure 7.6 ? Now, every the panels show at least one mean that is different from the others. Specifically, the error bars for one mean do not overlap with the error bars for one or another mean. This is what mistakes looks like. These are all type I errors. They are insidious. When they happen to you by chance, the data really does appear to show a strong pattern, your \(F\) -value is large, and your \(p\) -value is small! It is easy to be convinced by a type I error (it’s the siren song of chance).
We’ve covered many fundamentals about the ANOVA, how to calculate the necessary values to obtain an \(F\) -statistic, and how to interpret the \(F\) -statistic along with it’s associate \(p\) -value once we have one. In general, you will be conducting ANOVAs and playing with \(F\) s and \(p\) s using software that will automatically spit out the numbers for you. It’s important that you understand what the numbers mean, that’s why we’ve spent time on the concepts. We also recommend that you try to compute an ANOVA by hand at least once. It builds character, and let’s you know that you know what you are doing with the numbers.
But, we’ve probably also lost the real thread of all this. The core thread is that when we run an experiment we use our inferential statistics, like ANOVA, to help us determine whether the differences we found are likely due to chance or not. In general, we like to find out that the differences that we find are not due to chance, but instead to due to our manipulation.
So, we return to the application of the ANOVA to a real data set with a real question. This is the same one that you will be learning about in the lab. We give you a brief overview here so you know what to expect.
Yup, you read that right. The research you will learn about tests whether playing Tetris after watching a scary movie can help prevent you from having bad memories from the movie ( James et al. 2015 ) . Sometimes in life people have intrusive memories, and they think about things they’d rather not have to think about. This research looks at one method that could reduce the frequency of intrusive memories.
Here’s what they did. Subjects watched a scary movie, then at the end of the week they reported how many intrusive memories about the movie they had. The mean number of intrusive memories was the measurement (the dependent variable). This was a between-subjects experiment with four groups. Each group of subjects received a different treatment following the scary movie. The question was whether any of these treatments would reduce the number of intrusive memories. All of these treatments occurred after watching the scary movie:
For reasons we elaborate on in the lab, the researchers hypothesized that the Reactivation+Tetris group would have fewer intrusive memories over the week than the other groups.
Let’s look at the findings. Note you will learn how to do all of these steps in the lab. For now, we just show the findings and the ANOVA table. Then we walk through how to interpret it.
OOooh, look at that. We did something fancy. Figure 7.7 shows the data from the four groups. The height of each bar shows the mean intrusive memories for the week. The dots show the individual scores for each subject in each group (useful to to the spread of the data). The error bars show the standard errors of the mean.
What can we see here? Right away it looks like there is some support for the research hypothesis. The green bar, for the Reactivation + Tetris group had the lowest mean number of intrusive memories. Also, the error bar is not overlapping with any of the other error bars. This implies that the mean for the Reactivation + Tetris group is different from the means for the other groups. And, this difference is probably not very likely by chance.
We can now conduct the ANOVA on the data to ask the omnibus question. If we get a an \(F\) -value with an associated \(p\) -value of less than .05 (the alpha criterion set by the authors), then we can reject the hypothesis of no differences. Let’s see what happens:
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
Condition | 3 | 114.8194 | 38.27315 | 3.794762 | 0.0140858 |
Residuals | 68 | 685.8333 | 10.08578 | NA | NA |
We see the ANOVA table, it’s up there. We could report the results from the ANOVA table like this:
There was a significant main effect of treatment condition, F(3, 68) = 3.79, MSE = 10.08, p=0.014.
We called this a significant effect because the \(p\) -value was less than 0.05. In other words, the \(F\) -value of 3.79 only happens 1.4% of the time when the null is true. Or, the differences we observed in the means only occur by random chance (sampling error) 1.4% of the time. Because chance rarely produces this kind of result, the researchers made the inference that chance DID NOT produce their differences, instead, they were inclined to conclude that the Reactivation + Tetris treatment really did cause a reduction in intrusive memories. That’s pretty neat.
Remember that the ANOVA is an omnibus test, it just tells us whether we can reject the idea that all of the means are the same. The F-test (synonym for ANOVA) that we just conducted suggested we could reject the hypothesis of no differences. As we discussed before, that must mean that there are some differences in the pattern of means.
Generally after conducting an ANOVA, researchers will conduct follow-up tests to compare differences between specific means. We will talk more about this practice throughout the textbook. There are many recommended practices for follow-up tests, and there is a lot of debate about what you should do. We are not going to wade into this debate right now. Instead we are going to point out that you need to do something to compare the means of interest after you conduct the ANOVA, because the ANOVA is just the beginning…It usually doesn’t tell you want you want to know. You might wonder why bother conducting the ANOVA in the first place…Not a terrible question at all. A good question. You will see as we talk about more complicated designs, why ANOVAs are so useful. In the present example, they are just a common first step. There are required next steps, such as what we do next.
How can you compare the difference between two means, from a between-subjects design, to determine whether or not the difference you observed is likely or unlikely to be produced by chance? We covered this one already, it’s the independent \(t\) -test. We’ll do a couple \(t\) -tests, showing the process.
What we really want to know is if Reactivation+Tetris caused fewer intrusive memories…but compared to what? Well, if it did something, the Reactivation+Tetris group should have a smaller mean than the Control group. So, let’s do that comparison:
We found that there was a significant difference between the control group (M=5.11) and Reactivation + Tetris group (M=1.89), t(34) = 2.99, p=0.005.
Above you just saw an example of reporting another \(t\) -test. This sentences does an OK job of telling the reader everything they want to know. It has the means for each group, and the important bits from the \(t\) -test.
More important, as we suspected the difference between the control and Reactivation + Tetris group was likely not due to chance.
Now we can really start wondering what caused the difference. Was it just playing Tetris? Does just playing Tetris reduce the number of intrusive memories during the week? Let’s compare that to control:
Here we did not find a significant difference. We found that no significant difference between the control group (M=5.11) and Tetris Only group (M=3.89), t(34) = 2.99, p=0.318.
So, it seems that not all of the differences between our means are large enough to be called statistically significant. In particular, the difference here, or larger, happens by chance 31.8% of the time.
You could go on doing more comparisons, between all of the different pairs of means. Each time conducting a \(t\) -test, and each time saying something more specific about the patterns across the means than you get to say with the omnibus test provided by the ANOVA.
Usually, it is the pattern of differences across the means that you as a researcher are primarily interested in understanding. Your theories will make predictions about how the pattern turns out (e.g., which specific means should be higher or lower and by how much). So, the practice of doing comparisons after an ANOVA is really important for establishing the patterns in the means.
We have just finished a rather long introduction to the ANOVA, and the \(F\) -test. The next couple of chapters continue to explore properties of the ANOVA for different kinds of experimental designs. In general, the process to follow for all of the more complicated designs is very similar to what we did here, which boils down to two steps:
So what’s next…the ANOVA for repeated measures designs. See you in the next chapter.
ANOVA Test (Analysis of Variance) is used to compare the means of different groups using various estimate methodologies. ANOVA is an abbreviation for the analysis of variance. The ANOVA analysis is a statistical relevance tool designed to evaluate whether or not the null hypothesis can be rejected while testing hypotheses. It is used to determine whether or not the means of three or more groups are equal.
Whenever there are more than two or more independent groups, the ANOVA test is used. The ANOVA test is used to look for heterogeneity within groups as well as variability across groupings. The f test returns the ANOVA test statistic.
Table of Content
Examples of the use of anova formula.
ANOVA Table
Anova formula example, anova formula – faqs.
ANOVA formula is made up of numerous parts. The best way to tackle an ANOVA test problem is to organize the formulae inside an ANOVA table. Below are the ANOVA formulae.
Source of Variation | Sum of Squares | Degree of Freedom | Mean Squares | F Value |
---|---|---|---|---|
SSB = Σnj(X̄ – X̄) | df = k – 1 | MSB = SSB / (k – 1) | f = MSB / MSE or, F = MST/MSE | |
SSE = Σnj(X̄- X̄ ) | df = N – k | MSE = SSE / (N – k) | ||
SST = SSB + SSE | df = N – 1 |
An ANOVA (Analysis of Variance) test table is used to summarize the results of an ANOVA test, which is used to determine if there are any statistically significant differences between the means of three or more independent groups. Here’s a general structure of an ANOVA table:
Variance and Standard Deviation How to Calculate Variance? Frequency Distribution
Example 1: Three different kinds of food are tested on three groups of rats for 5 weeks. The objective is to check the difference in mean weight(in grams) of the rats per week. Apply one-way ANOVA using a 0.05 significance level to the following data:
Food I | Food II | Food III |
---|---|---|
8 | 4 | 11 |
12 | 5 | 8 |
19 | 4 | 7 |
8 | 6 | 13 |
6 | 9 | 7 |
11 | 7 | 9 |
H 0 : μ 1 = μ 2 =μ 3 H 1 : The means are not equal Since, X̄ 1 = 5, X̄ 2 = 9, X̄ 3 = 10 Total mean = X̄ = 8 SSB = 6(5 – 8) 2 + 6(9 – 8) 2 + 6(10 – 8) 2 = 84 SSE = 68 MSB = SSB/df 1 = 42 MSE = SSE/df 2 = 4.53 f = MSB/MSE = 42/4.53 = 9.33 Since f > F, the null hypothesis stands rejected.
Example 2: Calculate the ANOVA coefficient for the following data:
Plant | Number | Average span | s |
---|---|---|---|
Hibiscus | 5 | 12 | 2 |
Marigold | 5 | 16 | 1 |
Rose | 5 | 20 | 4 |
Plant n x s s 2 Hibiscus 5 12 2 4 Marigold 5 16 1 1 Rose 5 20 4 16 p = 3 n = 5 N = 15 x̄ = 16 SST = Σn(x−x̄) 2 SST= 5(12 − 16) 2 + 5(16 − 16) 2 + 11(20 − 16) 2 = 160 MST = SST/p-1 = 160/3-1 = 80 SSE = ∑ (n−1) = 4 (4 + 1) + 4(16) = 84 MSE = 7 F = MST/MSE = 80/7 F = 11.429
Example 3: The following data show the number of worms quarantined from the GI areas of four groups of muskrats in a carbon tetrachloride anthelmintic study. Conduct a two-way ANOVA test.
I | II | III | IV |
---|---|---|---|
338 | 412 | 124 | 389 |
324 | 387 | 353 | 432 |
268 | 400 | 469 | 255 |
147 | 233 | 222 | 133 |
309 | 212 | 111 | 265 |
Source of Variation Sum of Squares Degrees of Freedom Mean Square Between the groups 62111.6 8 9078.067 Within the groups 98787.8 16 4567.89 Total 167771.4 24 Since F = MST / MSE = 9.4062 / 3.66 F = 2.57
Example 4: Enlist the results in APA format after performing ANOVA on the following data set:
[Tex]\begin{bmatrix} \textbf{n} & \textbf{mean} & \textbf{sd} \\ 30 & 50.26 & 10.45 \\ 30 & 45.32 & 12.76 \\ 30 & 53.67 & 11.47 \\ \end{bmatrix} [/Tex]
Variance of first set = (10.45) 2 = 109.2 Variance of second set = (12.76) 2 = 162.82 Variance of third set = (11.47) 2 = 131.56 MS error = {109.2 + 162.82 + 131.56} / {3} = 134.53 MS between = (17.62)(30) = 528.75 F = MS between / MS error = 528.75 / 134.53 F = 4.86 APA writeup: F(2, 87)=3.93, p >=0.01, η 2 =0.08.
The equality of the means of distinct groups must be tested in an ANOVA test. As a result, the hypothesis is as follows: H 0 = 1 = 2 = 3 =… = k = Null Hypothesis H 1 is Alternative Hypothesis: The means are not equal.
ANOVA compares group means’ differences. Calculate Grand Mean, Between-Group Variability (SSB), and Within-Group Variability (SSW). Determine significance through variance comparison.
The sample mean of the jth treatment of a grouping or a mass data sample is called the ANOVA statistic. It is denoted by the alphabet f.
In ANOVA, a shared P-value is initially obtained. A significant P-value in the ANOVA test suggests statistical significance in at least one pair’s mean difference. Multiple comparisons are then employed to identify these significant pair(s).
One-way ANOVA is a form of ANOVA test that is used when just one independent variable is present. It compares the means of the various test groups. A test of this type can only provide information on the statistical significance of the means; it cannot establish which groups have different means.
Since it is more versatile and requires fewer observations, ANOVA analysis is sometimes thought to be more accurate than t-testing. It is also more suited to employ in more sophisticated studies than those that can be evaluated by testing.
Similar reads.
Statistics Made Easy
A repeated measures ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more groups in which the same subjects show up in each group.
A repeated measures ANOVA is typically used in two specific situations:
1. Measuring the mean scores of subjects during three or more time points. For example, you might want to measure the resting heart rate of subjects one month before they start a training program, during the middle of the training program, and one month after the training program to see if there is a significant difference in mean resting heart rate across these three time points.
2. Measuring the mean scores of subjects under three different conditions. For example, you might have subjects watch three different movies and rate each one based on how much they enjoyed it.
In a typical one-way ANOVA , different subjects are used in each group. For example, we might ask subjects to rate three movies, just like in the example above, but we use different subjects to rate each movie:
In this case, we would conduct a typical one-way ANOVA to test for the difference between the mean ratings of the three movies.
In real life there are two benefits of using the same subjects across multiple treatment conditions:
1. It’s cheaper and faster for researchers to recruit and pay a smaller number of people to carry out an experiment since they can just obtain data from the same people multiple times.
2. We are able to attribute some of the variance in the data to the subjects themselves, which makes it easier to obtain a smaller p-value.
One potential drawback of this type of design is that subjects might get bored or tired if an experiment lasts too long, which could skew the results. For example, subjects might give lower movie ratings to the third movie they watch because they’re tired and ready to go home.
Suppose we recruit five subjects to participate in a training program. We measure their resting heart rate before participating in a training program, after participating for 4 months, and after participating for 8 months.
The following table shows the results:
We want to know whether there is a difference in mean resting heart rate at these three time points so we conduct a repeated measures ANOVA at the .05 significance level using the following steps:
Step 1. State the hypotheses.
The null hypothesis (H 0 ): µ 1 = µ 2 = µ 3 (the population means are all equal)
The alternative hypothesis: (Ha): at least one population mean is different from the rest
Step 2. Perform the repeated measures ANOVA.
We will use the Repeated Measures ANOVA Calculator using the following input:
Once we click “Calculate” then the following output will automatically appear:
Step 3. Interpret the results.
From the output table we see that the F test statistic is 9.598 and the corresponding p-value is 0.00749 .
Since this p-value is less than 0.05, we reject the null hypothesis. This means we have sufficient evidence to say that there is a statistically significant difference between the mean resting heart rate at the three different points in time.
The following articles explain how to perform a repeated measures ANOVA using different statistical softwares:
Repeated Measures ANOVA in Excel Repeated Measures ANOVA in R Repeated Measures ANOVA in Stata Repeated Measures ANOVA in Python Repeated Measures ANOVA in SPSS Repeated Measures ANOVA in Google Sheets Repeated Measures ANOVA By Hand Repeated Measures ANOVA Calculator
Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.
thank you for the easy to understand explanation. are there post-hoc tests like the bonferroni for one way anova?
This was so helpful, thankyou. However, wanted to ask if the alternative non parametrics tests determines mean differences or MEDIAN.?
Somewhere, I read that non parametric are used to determine median difference since it is non outlier.
Your email address will not be published. Required fields are marked *
Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!
By subscribing you accept Statology's Privacy Policy.
IMAGES
VIDEO
COMMENTS
ANOVA Real Life Example #1. A large scale farm is interested in understanding which of three different fertilizers leads to the highest crop yield. They sprinkle each fertilizer on ten different fields and measure the total yield at the end of the growing season. To understand whether there is a statistically significant difference in the mean ...
Use a one-way ANOVA when you have collected data about one categorical independent variable and one quantitative dependent variable. The independent variable should have at least three levels (i.e. at least three different groups or categories). ANOVA tells you if the dependent variable changes according to the level of the independent variable.
Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It is similar to the t-test, but the t-test is generally used for comparing two means, while ANOVA is used when you have more than two means to compare. ANOVA is based on comparing the variance (or variation) between the data samples to the ...
The ANOVA Test. An ANOVA test is a way to find out if survey or experiment results are significant. In other words, they help you to figure out if you need to reject the null hypothesis or accept the alternate hypothesis. Basically, you're testing groups to see if there's a difference between them.
The Ultimate Guide to ANOVA. ANOVA is the go-to analysis tool for classical experimental design, which forms the backbone of scientific research. In this article, we'll guide you through what ANOVA is, how to determine which version to use to evaluate your particular experiment, and provide detailed examples for the most common forms of ANOVA.
The eta squared formula for one-way analysis of variance is: η 2 = SSB / SST. where SSB is the between-groups sum of squares and SST is the total sum of squares. Given this formula, we can compute eta squared for this drug dosage experiment, as shown below: η 2 = SSB / SST = 6240 / 15240 = 0.41.
The question is whether or not this difference is statistically significant. Fortunately, a one-way ANOVA allows us to answer this question. One-Way ANOVA: Assumptions. For the results of a one-way ANOVA to be valid, the following assumptions should be met: 1. Normality - Each sample was drawn from a normally distributed population. 2.
When to use a two-way ANOVA. You can use a two-way ANOVA when you have collected data on a quantitative dependent variable at multiple levels of two categorical independent variables.. A quantitative variable represents amounts or counts of things. It can be divided to find a group mean. Bushels per acre is a quantitative variable because it represents the amount of crop produced.
One-way ANOVA assumes your group data follow the normal distribution. However, your groups can be skewed if your sample size is large enough because of the central limit theorem. Here are the sample size guidelines: 2 - 9 groups: At least 15 in each group. 10 - 12 groups: At least 20 per group. For one-way ANOVA, unimodal data can be mildly ...
In this Lesson, we introduce Analysis of Variance or ANOVA. ANOVA is a statistical method that analyzes variances to determine if the means from more than two populations are the same. In other words, we have a quantitative response variable and a categorical explanatory variable with more than two levels. In ANOVA, the categorical explanatory ...
An example of a one-way ANOVA includes testing a therapeutic intervention (CBT, medication, placebo) on the incidence of depression in a clinical sample. Note: Both the One-Way ANOVA and the Independent Samples t-Test can compare the means for two groups. However, only the One-Way ANOVA can compare the means across three or more groups.
See three examples of ANOVA in action as you learn how it can be applied to more complex statistical analyses. Analysis of variance, or ANOVA, is an approach to comparing data with multiple means across different groups, and allows us to see patterns and trends within complex and varied data. See three examples of ANOVA in action as you learn ...
ANOVA, short for Analysis of Variance, is a statistical method used to see if there are significant differences between the averages of three or more unrelated groups. This technique is especially useful when comparing more than two groups, which is a limitation of other tests like the t-test and z-test. For example, ANOVA can compare average ...
C8057 (Research Methods II): One-Way ANOVA Exam Practice Dr. Andy Field Page 1 4/18/2007 One-Way Independent ANOVA: Exam Practice Sheet Questions Question 1 Students were given different drug treatments before revising for their exams. Some were given a memory drug, some a placebo drug and some no treatment. The exam scores (%) are
For example, suppose a clinical trial is designed to compare five different treatments for joint pain in patients with osteoarthritis. Investigators might also hypothesize that there are differences in the outcome by sex. This is an example of a two-factor ANOVA where the factors are treatment (with 5 levels) and sex (with 2 levels).
Step 6: Select "Significance analysis", "Group Means" and "Multiple Anova". Step 7: Select an Output Range. Step 8: Select an alpha level. In most cases, an alpha level of 0.05 (5 percent) works for most tests. Step 9: Click "OK" to run. The data will be returned in your specified output range.
Examples for typical questions the ANOVA answers are as follows: ... As described in the research question we want to test, the math test score is our dependent variable and the exam result is our independent variable. This would be enough for a basic analysis. But the dialog box has a couple more options around Contrasts, post hoc tests (also ...
Current: page 17: The research question and the one-way ANOVA model The research question and the one-way ANOVA model ... The ANOVA model above tells us that the response variable (yield in our example) depends on: A part we can explain by the treatment effect (τ_i ), the insecticide.
In practice, we will combine both the ANOVA test and \(t\)-tests when analyzing data with many sample means (from more than two groups or conditions). Just like the \(t\)-test, there are different kinds of ANOVAs for different research designs. There is one for between-subjects designs, and a slightly different one for repeated measures designs.
Step 6 : Since 20.0142 ≥ 6.93 (p-value ≤ 0.01), we shall reject the null hypothesis. At the = 0.01 level of significance, there exists enough evidence to α conclude that there is an effect due to door color. 13. The table below shows the observed pollution indexes of air samples in two areas of a city.
by Zach Bobbitt December 30, 2018. A two-way ANOVA ("analysis of variance") is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups that have been split on two variables (sometimes called "factors"). This tutorial explains the following: When to use a two ...
Types of ANOVA Formula. One-Way ANOVA: This test is used to see if there is a variation in the mean values of three or more groups. Such a test is used where the data set has only one independent variable. If the test statistic exceeds the critical value, the null hypothesis is rejected, and the averages of at least two different groups are significant statistically.
A repeated measures ANOVA is typically used in two specific situations: 1. Measuring the mean scores of subjects during three or more time points. For example, you might want to measure the resting heart rate of subjects one month before they start a training program, during the middle of the training program, and one month after the training ...