Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

verification of hypothesis in research

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 12, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

logo image missing

  • > Machine Learning
  • > Statistics

What is Hypothesis Testing? Types and Methods

  • Soumyaa Rawat
  • Jul 23, 2021

What is Hypothesis Testing? Types and Methods title banner

Hypothesis Testing  

Hypothesis testing is the act of testing a hypothesis or a supposition in relation to a statistical parameter. Analysts implement hypothesis testing in order to test if a hypothesis is plausible or not. 

In data science and statistics , hypothesis testing is an important step as it involves the verification of an assumption that could help develop a statistical parameter. For instance, a researcher establishes a hypothesis assuming that the average of all odd numbers is an even number. 

In order to find the plausibility of this hypothesis, the researcher will have to test the hypothesis using hypothesis testing methods. Unlike a hypothesis that is ‘supposed’ to stand true on the basis of little or no evidence, hypothesis testing is required to have plausible evidence in order to establish that a statistical hypothesis is true. 

Perhaps this is where statistics play an important role. A number of components are involved in this process. But before understanding the process involved in hypothesis testing in research methodology, we shall first understand the types of hypotheses that are involved in the process. Let us get started! 

Types of Hypotheses

In data sampling, different types of hypothesis are involved in finding whether the tested samples test positive for a hypothesis or not. In this segment, we shall discover the different types of hypotheses and understand the role they play in hypothesis testing.

Alternative Hypothesis

Alternative Hypothesis (H1) or the research hypothesis states that there is a relationship between two variables (where one variable affects the other). The alternative hypothesis is the main driving force for hypothesis testing. 

It implies that the two variables are related to each other and the relationship that exists between them is not due to chance or coincidence. 

When the process of hypothesis testing is carried out, the alternative hypothesis is the main subject of the testing process. The analyst intends to test the alternative hypothesis and verifies its plausibility.

Null Hypothesis

The Null Hypothesis (H0) aims to nullify the alternative hypothesis by implying that there exists no relation between two variables in statistics. It states that the effect of one variable on the other is solely due to chance and no empirical cause lies behind it. 

The null hypothesis is established alongside the alternative hypothesis and is recognized as important as the latter. In hypothesis testing, the null hypothesis has a major role to play as it influences the testing against the alternative hypothesis. 

(Must read: What is ANOVA test? )

Non-Directional Hypothesis

The Non-directional hypothesis states that the relation between two variables has no direction. 

Simply put, it asserts that there exists a relation between two variables, but does not recognize the direction of effect, whether variable A affects variable B or vice versa. 

Directional Hypothesis

The Directional hypothesis, on the other hand, asserts the direction of effect of the relationship that exists between two variables. 

Herein, the hypothesis clearly states that variable A affects variable B, or vice versa. 

Statistical Hypothesis

A statistical hypothesis is a hypothesis that can be verified to be plausible on the basis of statistics. 

By using data sampling and statistical knowledge, one can determine the plausibility of a statistical hypothesis and find out if it stands true or not. 

(Related blog: z-test vs t-test )

Performing Hypothesis Testing  

Now that we have understood the types of hypotheses and the role they play in hypothesis testing, let us now move on to understand the process in a better manner. 

In hypothesis testing, a researcher is first required to establish two hypotheses - alternative hypothesis and null hypothesis in order to begin with the procedure. 

To establish these two hypotheses, one is required to study data samples, find a plausible pattern among the samples, and pen down a statistical hypothesis that they wish to test. 

A random population of samples can be drawn, to begin with hypothesis testing. Among the two hypotheses, alternative and null, only one can be verified to be true. Perhaps the presence of both hypotheses is required to make the process successful. 

At the end of the hypothesis testing procedure, either of the hypotheses will be rejected and the other one will be supported. Even though one of the two hypotheses turns out to be true, no hypothesis can ever be verified 100%. 

(Read also: Types of data sampling techniques )

Therefore, a hypothesis can only be supported based on the statistical samples and verified data. Here is a step-by-step guide for hypothesis testing.

Establish the hypotheses

First things first, one is required to establish two hypotheses - alternative and null, that will set the foundation for hypothesis testing. 

These hypotheses initiate the testing process that involves the researcher working on data samples in order to either support the alternative hypothesis or the null hypothesis. 

Generate a testing plan

Once the hypotheses have been formulated, it is now time to generate a testing plan. A testing plan or an analysis plan involves the accumulation of data samples, determining which statistic is to be considered and laying out the sample size. 

All these factors are very important while one is working on hypothesis testing.

Analyze data samples

As soon as a testing plan is ready, it is time to move on to the analysis part. Analysis of data samples involves configuring statistical values of samples, drawing them together, and deriving a pattern out of these samples. 

While analyzing the data samples, a researcher needs to determine a set of things -

Significance Level - The level of significance in hypothesis testing indicates if a statistical result could have significance if the null hypothesis stands to be true.

Testing Method - The testing method involves a type of sampling-distribution and a test statistic that leads to hypothesis testing. There are a number of testing methods that can assist in the analysis of data samples. 

Test statistic - Test statistic is a numerical summary of a data set that can be used to perform hypothesis testing.

P-value - The P-value interpretation is the probability of finding a sample statistic to be as extreme as the test statistic, indicating the plausibility of the null hypothesis. 

Infer the results

The analysis of data samples leads to the inference of results that establishes whether the alternative hypothesis stands true or not. When the P-value is less than the significance level, the null hypothesis is rejected and the alternative hypothesis turns out to be plausible. 

Methods of Hypothesis Testing

As we have already looked into different aspects of hypothesis testing, we shall now look into the different methods of hypothesis testing. All in all, there are 2 most common types of hypothesis testing methods. They are as follows -

Frequentist Hypothesis Testing

The frequentist hypothesis or the traditional approach to hypothesis testing is a hypothesis testing method that aims on making assumptions by considering current data. 

The supposed truths and assumptions are based on the current data and a set of 2 hypotheses are formulated. A very popular subtype of the frequentist approach is the Null Hypothesis Significance Testing (NHST). 

The NHST approach (involving the null and alternative hypothesis) has been one of the most sought-after methods of hypothesis testing in the field of statistics ever since its inception in the mid-1950s. 

Bayesian Hypothesis Testing

A much unconventional and modern method of hypothesis testing, the Bayesian Hypothesis Testing claims to test a particular hypothesis in accordance with the past data samples, known as prior probability, and current data that lead to the plausibility of a hypothesis. 

The result obtained indicates the posterior probability of the hypothesis. In this method, the researcher relies on ‘prior probability and posterior probability’ to conduct hypothesis testing on hand. 

On the basis of this prior probability, the Bayesian approach tests a hypothesis to be true or false. The Bayes factor, a major component of this method, indicates the likelihood ratio among the null hypothesis and the alternative hypothesis. 

The Bayes factor is the indicator of the plausibility of either of the two hypotheses that are established for hypothesis testing.  

(Also read - Introduction to Bayesian Statistics ) 

To conclude, hypothesis testing, a way to verify the plausibility of a supposed assumption can be done through different methods - the Bayesian approach or the Frequentist approach. 

Although the Bayesian approach relies on the prior probability of data samples, the frequentist approach assumes without a probability. A number of elements involved in hypothesis testing are - significance level, p-level, test statistic, and method of hypothesis testing. 

(Also read: Introduction to probability distributions )

A significant way to determine whether a hypothesis stands true or not is to verify the data samples and identify the plausible hypothesis among the null hypothesis and alternative hypothesis. 

Share Blog :

verification of hypothesis in research

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

What is PESTLE Analysis? Everything you need to know about it

An Overview of Descriptive Analysis

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Dijkstra’s Algorithm: The Shortest Path Algorithm

Scope of Managerial Economics

Different Types of Research Methods

Latest Comments

verification of hypothesis in research

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

The Craft of Writing a Strong Hypothesis

Deeptanshu D

Table of Contents

Writing a hypothesis is one of the essential elements of a scientific research paper. It needs to be to the point, clearly communicating what your research is trying to accomplish. A blurry, drawn-out, or complexly-structured hypothesis can confuse your readers. Or worse, the editor and peer reviewers.

A captivating hypothesis is not too intricate. This blog will take you through the process so that, by the end of it, you have a better idea of how to convey your research paper's intent in just one sentence.

What is a Hypothesis?

The first step in your scientific endeavor, a hypothesis, is a strong, concise statement that forms the basis of your research. It is not the same as a thesis statement , which is a brief summary of your research paper .

The sole purpose of a hypothesis is to predict your paper's findings, data, and conclusion. It comes from a place of curiosity and intuition . When you write a hypothesis, you're essentially making an educated guess based on scientific prejudices and evidence, which is further proven or disproven through the scientific method.

The reason for undertaking research is to observe a specific phenomenon. A hypothesis, therefore, lays out what the said phenomenon is. And it does so through two variables, an independent and dependent variable.

The independent variable is the cause behind the observation, while the dependent variable is the effect of the cause. A good example of this is “mixing red and blue forms purple.” In this hypothesis, mixing red and blue is the independent variable as you're combining the two colors at your own will. The formation of purple is the dependent variable as, in this case, it is conditional to the independent variable.

Different Types of Hypotheses‌

Types-of-hypotheses

Types of hypotheses

Some would stand by the notion that there are only two types of hypotheses: a Null hypothesis and an Alternative hypothesis. While that may have some truth to it, it would be better to fully distinguish the most common forms as these terms come up so often, which might leave you out of context.

Apart from Null and Alternative, there are Complex, Simple, Directional, Non-Directional, Statistical, and Associative and casual hypotheses. They don't necessarily have to be exclusive, as one hypothesis can tick many boxes, but knowing the distinctions between them will make it easier for you to construct your own.

1. Null hypothesis

A null hypothesis proposes no relationship between two variables. Denoted by H 0 , it is a negative statement like “Attending physiotherapy sessions does not affect athletes' on-field performance.” Here, the author claims physiotherapy sessions have no effect on on-field performances. Even if there is, it's only a coincidence.

2. Alternative hypothesis

Considered to be the opposite of a null hypothesis, an alternative hypothesis is donated as H1 or Ha. It explicitly states that the dependent variable affects the independent variable. A good  alternative hypothesis example is “Attending physiotherapy sessions improves athletes' on-field performance.” or “Water evaporates at 100 °C. ” The alternative hypothesis further branches into directional and non-directional.

  • Directional hypothesis: A hypothesis that states the result would be either positive or negative is called directional hypothesis. It accompanies H1 with either the ‘<' or ‘>' sign.
  • Non-directional hypothesis: A non-directional hypothesis only claims an effect on the dependent variable. It does not clarify whether the result would be positive or negative. The sign for a non-directional hypothesis is ‘≠.'

3. Simple hypothesis

A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, “Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking.

4. Complex hypothesis

In contrast to a simple hypothesis, a complex hypothesis implies the relationship between multiple independent and dependent variables. For instance, “Individuals who eat more fruits tend to have higher immunity, lesser cholesterol, and high metabolism.” The independent variable is eating more fruits, while the dependent variables are higher immunity, lesser cholesterol, and high metabolism.

5. Associative and casual hypothesis

Associative and casual hypotheses don't exhibit how many variables there will be. They define the relationship between the variables. In an associative hypothesis, changing any one variable, dependent or independent, affects others. In a casual hypothesis, the independent variable directly affects the dependent.

6. Empirical hypothesis

Also referred to as the working hypothesis, an empirical hypothesis claims a theory's validation via experiments and observation. This way, the statement appears justifiable and different from a wild guess.

Say, the hypothesis is “Women who take iron tablets face a lesser risk of anemia than those who take vitamin B12.” This is an example of an empirical hypothesis where the researcher  the statement after assessing a group of women who take iron tablets and charting the findings.

7. Statistical hypothesis

The point of a statistical hypothesis is to test an already existing hypothesis by studying a population sample. Hypothesis like “44% of the Indian population belong in the age group of 22-27.” leverage evidence to prove or disprove a particular statement.

Characteristics of a Good Hypothesis

Writing a hypothesis is essential as it can make or break your research for you. That includes your chances of getting published in a journal. So when you're designing one, keep an eye out for these pointers:

  • A research hypothesis has to be simple yet clear to look justifiable enough.
  • It has to be testable — your research would be rendered pointless if too far-fetched into reality or limited by technology.
  • It has to be precise about the results —what you are trying to do and achieve through it should come out in your hypothesis.
  • A research hypothesis should be self-explanatory, leaving no doubt in the reader's mind.
  • If you are developing a relational hypothesis, you need to include the variables and establish an appropriate relationship among them.
  • A hypothesis must keep and reflect the scope for further investigations and experiments.

Separating a Hypothesis from a Prediction

Outside of academia, hypothesis and prediction are often used interchangeably. In research writing, this is not only confusing but also incorrect. And although a hypothesis and prediction are guesses at their core, there are many differences between them.

A hypothesis is an educated guess or even a testable prediction validated through research. It aims to analyze the gathered evidence and facts to define a relationship between variables and put forth a logical explanation behind the nature of events.

Predictions are assumptions or expected outcomes made without any backing evidence. They are more fictionally inclined regardless of where they originate from.

For this reason, a hypothesis holds much more weight than a prediction. It sticks to the scientific method rather than pure guesswork. "Planets revolve around the Sun." is an example of a hypothesis as it is previous knowledge and observed trends. Additionally, we can test it through the scientific method.

Whereas "COVID-19 will be eradicated by 2030." is a prediction. Even though it results from past trends, we can't prove or disprove it. So, the only way this gets validated is to wait and watch if COVID-19 cases end by 2030.

Finally, How to Write a Hypothesis

Quick-tips-on-how-to-write-a-hypothesis

Quick tips on writing a hypothesis

1.  Be clear about your research question

A hypothesis should instantly address the research question or the problem statement. To do so, you need to ask a question. Understand the constraints of your undertaken research topic and then formulate a simple and topic-centric problem. Only after that can you develop a hypothesis and further test for evidence.

2. Carry out a recce

Once you have your research's foundation laid out, it would be best to conduct preliminary research. Go through previous theories, academic papers, data, and experiments before you start curating your research hypothesis. It will give you an idea of your hypothesis's viability or originality.

Making use of references from relevant research papers helps draft a good research hypothesis. SciSpace Discover offers a repository of over 270 million research papers to browse through and gain a deeper understanding of related studies on a particular topic. Additionally, you can use SciSpace Copilot , your AI research assistant, for reading any lengthy research paper and getting a more summarized context of it. A hypothesis can be formed after evaluating many such summarized research papers. Copilot also offers explanations for theories and equations, explains paper in simplified version, allows you to highlight any text in the paper or clip math equations and tables and provides a deeper, clear understanding of what is being said. This can improve the hypothesis by helping you identify potential research gaps.

3. Create a 3-dimensional hypothesis

Variables are an essential part of any reasonable hypothesis. So, identify your independent and dependent variable(s) and form a correlation between them. The ideal way to do this is to write the hypothetical assumption in the ‘if-then' form. If you use this form, make sure that you state the predefined relationship between the variables.

In another way, you can choose to present your hypothesis as a comparison between two variables. Here, you must specify the difference you expect to observe in the results.

4. Write the first draft

Now that everything is in place, it's time to write your hypothesis. For starters, create the first draft. In this version, write what you expect to find from your research.

Clearly separate your independent and dependent variables and the link between them. Don't fixate on syntax at this stage. The goal is to ensure your hypothesis addresses the issue.

5. Proof your hypothesis

After preparing the first draft of your hypothesis, you need to inspect it thoroughly. It should tick all the boxes, like being concise, straightforward, relevant, and accurate. Your final hypothesis has to be well-structured as well.

Research projects are an exciting and crucial part of being a scholar. And once you have your research question, you need a great hypothesis to begin conducting research. Thus, knowing how to write a hypothesis is very important.

Now that you have a firmer grasp on what a good hypothesis constitutes, the different kinds there are, and what process to follow, you will find it much easier to write your hypothesis, which ultimately helps your research.

Now it's easier than ever to streamline your research workflow with SciSpace Discover . Its integrated, comprehensive end-to-end platform for research allows scholars to easily discover, write and publish their research and fosters collaboration.

It includes everything you need, including a repository of over 270 million research papers across disciplines, SEO-optimized summaries and public profiles to show your expertise and experience.

If you found these tips on writing a research hypothesis useful, head over to our blog on Statistical Hypothesis Testing to learn about the top researchers, papers, and institutions in this domain.

Frequently Asked Questions (FAQs)

1. what is the definition of hypothesis.

According to the Oxford dictionary, a hypothesis is defined as “An idea or explanation of something that is based on a few known facts, but that has not yet been proved to be true or correct”.

2. What is an example of hypothesis?

The hypothesis is a statement that proposes a relationship between two or more variables. An example: "If we increase the number of new users who join our platform by 25%, then we will see an increase in revenue."

3. What is an example of null hypothesis?

A null hypothesis is a statement that there is no relationship between two variables. The null hypothesis is written as H0. The null hypothesis states that there is no effect. For example, if you're studying whether or not a particular type of exercise increases strength, your null hypothesis will be "there is no difference in strength between people who exercise and people who don't."

4. What are the types of research?

• Fundamental research

• Applied research

• Qualitative research

• Quantitative research

• Mixed research

• Exploratory research

• Longitudinal research

• Cross-sectional research

• Field research

• Laboratory research

• Fixed research

• Flexible research

• Action research

• Policy research

• Classification research

• Comparative research

• Causal research

• Inductive research

• Deductive research

5. How to write a hypothesis?

• Your hypothesis should be able to predict the relationship and outcome.

• Avoid wordiness by keeping it simple and brief.

• Your hypothesis should contain observable and testable outcomes.

• Your hypothesis should be relevant to the research question.

6. What are the 2 types of hypothesis?

• Null hypotheses are used to test the claim that "there is no difference between two groups of data".

• Alternative hypotheses test the claim that "there is a difference between two data groups".

7. Difference between research question and research hypothesis?

A research question is a broad, open-ended question you will try to answer through your research. A hypothesis is a statement based on prior research or theory that you expect to be true due to your study. Example - Research question: What are the factors that influence the adoption of the new technology? Research hypothesis: There is a positive relationship between age, education and income level with the adoption of the new technology.

8. What is plural for hypothesis?

The plural of hypothesis is hypotheses. Here's an example of how it would be used in a statement, "Numerous well-considered hypotheses are presented in this part, and they are supported by tables and figures that are well-illustrated."

9. What is the red queen hypothesis?

The red queen hypothesis in evolutionary biology states that species must constantly evolve to avoid extinction because if they don't, they will be outcompeted by other species that are evolving. Leigh Van Valen first proposed it in 1973; since then, it has been tested and substantiated many times.

10. Who is known as the father of null hypothesis?

The father of the null hypothesis is Sir Ronald Fisher. He published a paper in 1925 that introduced the concept of null hypothesis testing, and he was also the first to use the term itself.

11. When to reject null hypothesis?

You need to find a significant difference between your two populations to reject the null hypothesis. You can determine that by running statistical tests such as an independent sample t-test or a dependent sample t-test. You should reject the null hypothesis if the p-value is less than 0.05.

verification of hypothesis in research

You might also like

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Sumalatha G

Literature Review and Theoretical Framework: Understanding the Differences

Nikhil Seethi

Types of Essays in Academic Writing - Quick Guide (2024)

  • First Online: 01 January 2024

Cite this chapter

Book cover

  • Hiroshi Ishikawa 3  

Part of the book series: Studies in Big Data ((SBD,volume 139))

129 Accesses

This chapter will explain the definition and properties of a hypothesis, the related concepts, and basic methods of hypothesis generation as follows.

Describe the definition, properties, and life cycle of a hypothesis.

Describe relationships between a hypothesis and a theory, a model, and data.

Categorize and explain research questions that provide hints for hypothesis generation.

Explain how to visualize data and analysis results.

Explain the philosophy of science and scientific methods in relation to hypothesis generation in science.

Explain deduction, induction, plausible reasoning, and analogy concretely as reasoning methods useful for hypothesis generation.

Explain problem solving as hypothesis generation methods by using familiar examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Aufmann RN, Lockwood JS et al (2018) Mathematical excursions. CENGAGE

Google Scholar  

Bortolotti L (2008) An introduction to the philosophy of science. Polity

Cairo A (2016) The truthful art: data, charts, and maps for communication. New Riders

Cellucci C (2017) Rethinking knowledge: the heuristic view. Springer

Chang M (2014) Principles of scientific methods. CRC Press

Crease RP (2010) The great equations: breakthroughs in science from Pythagoras to Heisenberg. W. W. Norton & Company

Danks D, Ippoliti E (eds) Building theories: Heuristics and hypotheses in sciences. Springer

Diggle PJ, Chetwynd AG (2011) Statistics and scientific method: an introduction for students and researchers. Oxford University Press

DOAJ (2022) Directory of open access journal. https://doaj.org/ Accessed 2022

Gilchrist P, Wheaton B (2011) Lifestyle sport, public policy and youth engagement: examining the emergence of Parkour. Int J Sport Policy Polit 3(1):109–131. https://doi.org/10.1080/19406940.2010.547866

Article   Google Scholar  

Google Maps. https://www.google.com/maps Accessed 2022.

Ishikawa H (2015) Social big data mining. CRC Press

Järvinen P (2008) Mapping research questions to research methods. In: Avison D, Kasper GM, Pernici B, Ramos I, Roode D (eds) Advances in information systems research, education and practice. Proceedings of IFIP 20th world computer congress, TC 8, information systems, vol 274. Springer. https://doi.org/10.1007/978-0-387-09682-7-9_3

JAXA (2022) Martian moons eXploration. http://www.mmx.jaxa.jp/en/ . Accessed 2022

Lewton T (2020) How the bits of quantum gravity can buzz. Quanta Magazine. 2020. https://www.quantamagazine.org/gravitons-revealed-in-the-noise-of-gravitational-waves-20200723/ . Accessed 2022

Mahajan S (2014) The art of insight in science and engineering: Mastering complexity. The MIT Press

Méndez A, Rivera–Valentín EG (2017) The equilibrium temperature of planets in elliptical orbits. Astrophys J Lett 837(1)

NASA (2022) Mars sample return. https://www.jpl.nasa.gov/missions/mars-sample-return-msr Accessed 2022

OpenStreetMap (2022). https://www.openstreetmap.org . Accessed 2022

Pólya G (2009) Mathematics and plausible reasoning: vol I: induction and analogy in mathematics. Ishi Press

Pólya G, Conway JH (2014) How to solve it. Princeton University Press

Rehm J (2019) The four fundamental forces of nature. Live science https://www.livescience.com/the-fundamental-forces-of-nature.html

Sadler-Smith E (2015) Wallas’ four-stage model of the creative process: more than meets the eye? Creat Res J 27(4):342–352. https://doi.org/10.1080/10400419.2015.1087277

Siegel E, This is why physicists think string theory might be our ‘theory of everything.’ Forbes, 2018. https://www.forbes.com/sites/startswithabang/2018/05/31/this-is-why-physicists-think-string-theory-might-be-our-theory-of-everything/?sh=b01d79758c25

Zeitz P (2006) The art and craft of problem solving. Wiley

Download references

Author information

Authors and affiliations.

Department of Systems Design, Tokyo Metropolitan University, Hino, Tokyo, Japan

Hiroshi Ishikawa

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Hiroshi Ishikawa .

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Ishikawa, H. (2024). Hypothesis. In: Hypothesis Generation and Interpretation. Studies in Big Data, vol 139. Springer, Cham. https://doi.org/10.1007/978-3-031-43540-9_2

Download citation

DOI : https://doi.org/10.1007/978-3-031-43540-9_2

Published : 01 January 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-43539-3

Online ISBN : 978-3-031-43540-9

eBook Packages : Computer Science Computer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • How to Write a Strong Hypothesis | Guide & Examples

How to Write a Strong Hypothesis | Guide & Examples

Published on 6 May 2022 by Shona McCombes .

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more variables . An independent variable is something the researcher changes or controls. A dependent variable is something the researcher observes and measures.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism, run a free check.

Step 1: ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2: Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalise more complex constructs.

Step 3: Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

Step 4: Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

Step 5: Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

Step 6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis is not just a guess. It should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, May 06). How to Write a Strong Hypothesis | Guide & Examples. Scribbr. Retrieved 9 April 2024, from https://www.scribbr.co.uk/research-methods/hypothesis-writing/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, operationalisation | a guide with examples, pros & cons, what is a conceptual framework | tips & examples, a quick guide to experimental design | 5 steps & examples.

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

Grad Coach

What Is A Research (Scientific) Hypothesis? A plain-language explainer + examples

By:  Derek Jansen (MBA)  | Reviewed By: Dr Eunice Rautenbach | June 2020

If you’re new to the world of research, or it’s your first time writing a dissertation or thesis, you’re probably noticing that the words “research hypothesis” and “scientific hypothesis” are used quite a bit, and you’re wondering what they mean in a research context .

“Hypothesis” is one of those words that people use loosely, thinking they understand what it means. However, it has a very specific meaning within academic research. So, it’s important to understand the exact meaning before you start hypothesizing. 

Research Hypothesis 101

  • What is a hypothesis ?
  • What is a research hypothesis (scientific hypothesis)?
  • Requirements for a research hypothesis
  • Definition of a research hypothesis
  • The null hypothesis

What is a hypothesis?

Let’s start with the general definition of a hypothesis (not a research hypothesis or scientific hypothesis), according to the Cambridge Dictionary:

Hypothesis: an idea or explanation for something that is based on known facts but has not yet been proved.

In other words, it’s a statement that provides an explanation for why or how something works, based on facts (or some reasonable assumptions), but that has not yet been specifically tested . For example, a hypothesis might look something like this:

Hypothesis: sleep impacts academic performance.

This statement predicts that academic performance will be influenced by the amount and/or quality of sleep a student engages in – sounds reasonable, right? It’s based on reasonable assumptions , underpinned by what we currently know about sleep and health (from the existing literature). So, loosely speaking, we could call it a hypothesis, at least by the dictionary definition.

But that’s not good enough…

Unfortunately, that’s not quite sophisticated enough to describe a research hypothesis (also sometimes called a scientific hypothesis), and it wouldn’t be acceptable in a dissertation, thesis or research paper . In the world of academic research, a statement needs a few more criteria to constitute a true research hypothesis .

What is a research hypothesis?

A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes – specificity , clarity and testability .

Let’s take a look at these more closely.

Need a helping hand?

verification of hypothesis in research

Hypothesis Essential #1: Specificity & Clarity

A good research hypothesis needs to be extremely clear and articulate about both what’ s being assessed (who or what variables are involved ) and the expected outcome (for example, a difference between groups, a relationship between variables, etc.).

Let’s stick with our sleepy students example and look at how this statement could be more specific and clear.

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.

As you can see, the statement is very specific as it identifies the variables involved (sleep hours and test grades), the parties involved (two groups of students), as well as the predicted relationship type (a positive relationship). There’s no ambiguity or uncertainty about who or what is involved in the statement, and the expected outcome is clear.

Contrast that to the original hypothesis we looked at – “Sleep impacts academic performance” – and you can see the difference. “Sleep” and “academic performance” are both comparatively vague , and there’s no indication of what the expected relationship direction is (more sleep or less sleep). As you can see, specificity and clarity are key.

A good research hypothesis needs to be very clear about what’s being assessed and very specific about the expected outcome.

Hypothesis Essential #2: Testability (Provability)

A statement must be testable to qualify as a research hypothesis. In other words, there needs to be a way to prove (or disprove) the statement. If it’s not testable, it’s not a hypothesis – simple as that.

For example, consider the hypothesis we mentioned earlier:

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.  

We could test this statement by undertaking a quantitative study involving two groups of students, one that gets 8 or more hours of sleep per night for a fixed period, and one that gets less. We could then compare the standardised test results for both groups to see if there’s a statistically significant difference. 

Again, if you compare this to the original hypothesis we looked at – “Sleep impacts academic performance” – you can see that it would be quite difficult to test that statement, primarily because it isn’t specific enough. How much sleep? By who? What type of academic performance?

So, remember the mantra – if you can’t test it, it’s not a hypothesis 🙂

A good research hypothesis must be testable. In other words, you must able to collect observable data in a scientifically rigorous fashion to test it.

Defining A Research Hypothesis

You’re still with us? Great! Let’s recap and pin down a clear definition of a hypothesis.

A research hypothesis (or scientific hypothesis) is a statement about an expected relationship between variables, or explanation of an occurrence, that is clear, specific and testable.

So, when you write up hypotheses for your dissertation or thesis, make sure that they meet all these criteria. If you do, you’ll not only have rock-solid hypotheses but you’ll also ensure a clear focus for your entire research project.

What about the null hypothesis?

You may have also heard the terms null hypothesis , alternative hypothesis, or H-zero thrown around. At a simple level, the null hypothesis is the counter-proposal to the original hypothesis.

For example, if the hypothesis predicts that there is a relationship between two variables (for example, sleep and academic performance), the null hypothesis would predict that there is no relationship between those variables.

At a more technical level, the null hypothesis proposes that no statistical significance exists in a set of given observations and that any differences are due to chance alone.

And there you have it – hypotheses in a nutshell. 

If you have any questions, be sure to leave a comment below and we’ll do our best to help you. If you need hands-on help developing and testing your hypotheses, consider our private coaching service , where we hold your hand through the research journey.

verification of hypothesis in research

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Research limitations vs delimitations

16 Comments

Lynnet Chikwaikwai

Very useful information. I benefit more from getting more information in this regard.

Dr. WuodArek

Very great insight,educative and informative. Please give meet deep critics on many research data of public international Law like human rights, environment, natural resources, law of the sea etc

Afshin

In a book I read a distinction is made between null, research, and alternative hypothesis. As far as I understand, alternative and research hypotheses are the same. Can you please elaborate? Best Afshin

GANDI Benjamin

This is a self explanatory, easy going site. I will recommend this to my friends and colleagues.

Lucile Dossou-Yovo

Very good definition. How can I cite your definition in my thesis? Thank you. Is nul hypothesis compulsory in a research?

Pereria

It’s a counter-proposal to be proven as a rejection

Egya Salihu

Please what is the difference between alternate hypothesis and research hypothesis?

Mulugeta Tefera

It is a very good explanation. However, it limits hypotheses to statistically tasteable ideas. What about for qualitative researches or other researches that involve quantitative data that don’t need statistical tests?

Derek Jansen

In qualitative research, one typically uses propositions, not hypotheses.

Samia

could you please elaborate it more

Patricia Nyawir

I’ve benefited greatly from these notes, thank you.

Hopeson Khondiwa

This is very helpful

Dr. Andarge

well articulated ideas are presented here, thank you for being reliable sources of information

TAUNO

Excellent. Thanks for being clear and sound about the research methodology and hypothesis (quantitative research)

I have only a simple question regarding the null hypothesis. – Is the null hypothesis (Ho) known as the reversible hypothesis of the alternative hypothesis (H1? – How to test it in academic research?

Tesfaye Negesa Urge

this is very important note help me much more

Trackbacks/Pingbacks

  • What Is Research Methodology? Simple Definition (With Examples) - Grad Coach - […] Contrasted to this, a quantitative methodology is typically used when the research aims and objectives are confirmatory in nature. For example,…

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Logo for Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Overview of the Scientific Method

Learning Objectives

  • Distinguish between a theory and a hypothesis.
  • Discover how theories are used to generate hypotheses and how the results of studies can be used to further inform theories.
  • Understand the characteristics of a good hypothesis.

Theories and Hypotheses

Before describing how to develop a hypothesis, it is important to distinguish between a theory and a hypothesis. A  theory  is a coherent explanation or interpretation of one or more phenomena. Although theories can take a variety of forms, one thing they have in common is that they go beyond the phenomena they explain by including variables, structures, processes, functions, or organizing principles that have not been observed directly. Consider, for example, Zajonc’s theory of social facilitation and social inhibition (1965) [1] . He proposed that being watched by others while performing a task creates a general state of physiological arousal, which increases the likelihood of the dominant (most likely) response. So for highly practiced tasks, being watched increases the tendency to make correct responses, but for relatively unpracticed tasks, being watched increases the tendency to make incorrect responses. Notice that this theory—which has come to be called drive theory—provides an explanation of both social facilitation and social inhibition that goes beyond the phenomena themselves by including concepts such as “arousal” and “dominant response,” along with processes such as the effect of arousal on the dominant response.

Outside of science, referring to an idea as a theory often implies that it is untested—perhaps no more than a wild guess. In science, however, the term theory has no such implication. A theory is simply an explanation or interpretation of a set of phenomena. It can be untested, but it can also be extensively tested, well supported, and accepted as an accurate description of the world by the scientific community. The theory of evolution by natural selection, for example, is a theory because it is an explanation of the diversity of life on earth—not because it is untested or unsupported by scientific research. On the contrary, the evidence for this theory is overwhelmingly positive and nearly all scientists accept its basic assumptions as accurate. Similarly, the “germ theory” of disease is a theory because it is an explanation of the origin of various diseases, not because there is any doubt that many diseases are caused by microorganisms that infect the body.

A  hypothesis , on the other hand, is a specific prediction about a new phenomenon that should be observed if a particular theory is accurate. It is an explanation that relies on just a few key concepts. Hypotheses are often specific predictions about what will happen in a particular study. They are developed by considering existing evidence and using reasoning to infer what will happen in the specific context of interest. Hypotheses are often but not always derived from theories. So a hypothesis is often a prediction based on a theory but some hypotheses are a-theoretical and only after a set of observations have been made, is a theory developed. This is because theories are broad in nature and they explain larger bodies of data. So if our research question is really original then we may need to collect some data and make some observations before we can develop a broader theory.

Theories and hypotheses always have this  if-then  relationship. “ If   drive theory is correct,  then  cockroaches should run through a straight runway faster, and a branching runway more slowly, when other cockroaches are present.” Although hypotheses are usually expressed as statements, they can always be rephrased as questions. “Do cockroaches run through a straight runway faster when other cockroaches are present?” Thus deriving hypotheses from theories is an excellent way of generating interesting research questions.

But how do researchers derive hypotheses from theories? One way is to generate a research question using the techniques discussed in this chapter  and then ask whether any theory implies an answer to that question. For example, you might wonder whether expressive writing about positive experiences improves health as much as expressive writing about traumatic experiences. Although this  question  is an interesting one  on its own, you might then ask whether the habituation theory—the idea that expressive writing causes people to habituate to negative thoughts and feelings—implies an answer. In this case, it seems clear that if the habituation theory is correct, then expressive writing about positive experiences should not be effective because it would not cause people to habituate to negative thoughts and feelings. A second way to derive hypotheses from theories is to focus on some component of the theory that has not yet been directly observed. For example, a researcher could focus on the process of habituation—perhaps hypothesizing that people should show fewer signs of emotional distress with each new writing session.

Among the very best hypotheses are those that distinguish between competing theories. For example, Norbert Schwarz and his colleagues considered two theories of how people make judgments about themselves, such as how assertive they are (Schwarz et al., 1991) [2] . Both theories held that such judgments are based on relevant examples that people bring to mind. However, one theory was that people base their judgments on the  number  of examples they bring to mind and the other was that people base their judgments on how  easily  they bring those examples to mind. To test these theories, the researchers asked people to recall either six times when they were assertive (which is easy for most people) or 12 times (which is difficult for most people). Then they asked them to judge their own assertiveness. Note that the number-of-examples theory implies that people who recalled 12 examples should judge themselves to be more assertive because they recalled more examples, but the ease-of-examples theory implies that participants who recalled six examples should judge themselves as more assertive because recalling the examples was easier. Thus the two theories made opposite predictions so that only one of the predictions could be confirmed. The surprising result was that participants who recalled fewer examples judged themselves to be more assertive—providing particularly convincing evidence in favor of the ease-of-retrieval theory over the number-of-examples theory.

Theory Testing

The primary way that scientific researchers use theories is sometimes called the hypothetico-deductive method  (although this term is much more likely to be used by philosophers of science than by scientists themselves). Researchers begin with a set of phenomena and either construct a theory to explain or interpret them or choose an existing theory to work with. They then make a prediction about some new phenomenon that should be observed if the theory is correct. Again, this prediction is called a hypothesis. The researchers then conduct an empirical study to test the hypothesis. Finally, they reevaluate the theory in light of the new results and revise it if necessary. This process is usually conceptualized as a cycle because the researchers can then derive a new hypothesis from the revised theory, conduct a new empirical study to test the hypothesis, and so on. As  Figure 2.3  shows, this approach meshes nicely with the model of scientific research in psychology presented earlier in the textbook—creating a more detailed model of “theoretically motivated” or “theory-driven” research.

verification of hypothesis in research

As an example, let us consider Zajonc’s research on social facilitation and inhibition. He started with a somewhat contradictory pattern of results from the research literature. He then constructed his drive theory, according to which being watched by others while performing a task causes physiological arousal, which increases an organism’s tendency to make the dominant response. This theory predicts social facilitation for well-learned tasks and social inhibition for poorly learned tasks. He now had a theory that organized previous results in a meaningful way—but he still needed to test it. He hypothesized that if his theory was correct, he should observe that the presence of others improves performance in a simple laboratory task but inhibits performance in a difficult version of the very same laboratory task. To test this hypothesis, one of the studies he conducted used cockroaches as subjects (Zajonc, Heingartner, & Herman, 1969) [3] . The cockroaches ran either down a straight runway (an easy task for a cockroach) or through a cross-shaped maze (a difficult task for a cockroach) to escape into a dark chamber when a light was shined on them. They did this either while alone or in the presence of other cockroaches in clear plastic “audience boxes.” Zajonc found that cockroaches in the straight runway reached their goal more quickly in the presence of other cockroaches, but cockroaches in the cross-shaped maze reached their goal more slowly when they were in the presence of other cockroaches. Thus he confirmed his hypothesis and provided support for his drive theory. (Zajonc also showed that drive theory existed in humans [Zajonc & Sales, 1966] [4] in many other studies afterward).

Incorporating Theory into Your Research

When you write your research report or plan your presentation, be aware that there are two basic ways that researchers usually include theory. The first is to raise a research question, answer that question by conducting a new study, and then offer one or more theories (usually more) to explain or interpret the results. This format works well for applied research questions and for research questions that existing theories do not address. The second way is to describe one or more existing theories, derive a hypothesis from one of those theories, test the hypothesis in a new study, and finally reevaluate the theory. This format works well when there is an existing theory that addresses the research question—especially if the resulting hypothesis is surprising or conflicts with a hypothesis derived from a different theory.

To use theories in your research will not only give you guidance in coming up with experiment ideas and possible projects, but it lends legitimacy to your work. Psychologists have been interested in a variety of human behaviors and have developed many theories along the way. Using established theories will help you break new ground as a researcher, not limit you from developing your own ideas.

Characteristics of a Good Hypothesis

There are three general characteristics of a good hypothesis. First, a good hypothesis must be testable and falsifiable . We must be able to test the hypothesis using the methods of science and if you’ll recall Popper’s falsifiability criterion, it must be possible to gather evidence that will disconfirm the hypothesis if it is indeed false. Second, a good hypothesis must be logical. As described above, hypotheses are more than just a random guess. Hypotheses should be informed by previous theories or observations and logical reasoning. Typically, we begin with a broad and general theory and use  deductive reasoning to generate a more specific hypothesis to test based on that theory. Occasionally, however, when there is no theory to inform our hypothesis, we use  inductive reasoning  which involves using specific observations or research findings to form a more general hypothesis. Finally, the hypothesis should be positive. That is, the hypothesis should make a positive statement about the existence of a relationship or effect, rather than a statement that a relationship or effect does not exist. As scientists, we don’t set out to show that relationships do not exist or that effects do not occur so our hypotheses should not be worded in a way to suggest that an effect or relationship does not exist. The nature of science is to assume that something does not exist and then seek to find evidence to prove this wrong, to show that it really does exist. That may seem backward to you but that is the nature of the scientific method. The underlying reason for this is beyond the scope of this chapter but it has to do with statistical theory.

  • Zajonc, R. B. (1965). Social facilitation.  Science, 149 , 269–274 ↵
  • Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., & Simons, A. (1991). Ease of retrieval as information: Another look at the availability heuristic.  Journal of Personality and Social Psychology, 61 , 195–202. ↵
  • Zajonc, R. B., Heingartner, A., & Herman, E. M. (1969). Social enhancement and impairment of performance in the cockroach.  Journal of Personality and Social Psychology, 13 , 83–92. ↵
  • Zajonc, R.B. & Sales, S.M. (1966). Social facilitation of dominant and subordinate responses. Journal of Experimental Social Psychology, 2 , 160-168. ↵

A coherent explanation or interpretation of one or more phenomena.

A specific prediction about a new phenomenon that should be observed if a particular theory is accurate.

A cyclical process of theory development, starting with an observed phenomenon, then developing or using a theory to make a specific prediction of what should happen if that theory is correct, testing that prediction, refining the theory in light of the findings, and using that refined theory to develop new hypotheses, and so on.

The ability to test the hypothesis using the methods of science and the possibility to gather evidence that will disconfirm the hypothesis if it is indeed false.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Hypothesis-Testing Demands Trustworthy Data—A Simulation Approach to Inferential Statistics Advocating the Research Program Strategy

Antonia krefeld-schwalb.

1 Geneva School of Economics and Management, University of Geneva, Geneva, Switzerland

Erich H. Witte

2 Institute for Psychology, University of Hamburg, Hamburg, Germany

Frank Zenker

3 Department of Philosophy, Lund University, Lund, Sweden

In psychology as elsewhere, the main statistical inference strategy to establish empirical effects is null-hypothesis significance testing (NHST). The recent failure to replicate allegedly well-established NHST-results, however, implies that such results lack sufficient statistical power, and thus feature unacceptably high error-rates. Using data-simulation to estimate the error-rates of NHST-results, we advocate the research program strategy (RPS) as a superior methodology. RPS integrates Frequentist with Bayesian inference elements, and leads from a preliminary discovery against a (random) H 0 -hypothesis to a statistical H 1 -verification. Not only do RPS-results feature significantly lower error-rates than NHST-results, RPS also addresses key-deficits of a “pure” Frequentist and a standard Bayesian approach. In particular, RPS aggregates underpowered results safely. RPS therefore provides a tool to regain the trust the discipline had lost during the ongoing replicability-crisis.

Introduction

Like all sciences, psychology seeks to establish stable empirical hypotheses, and only “methodologically well-hardened” data provide such stability (Lakatos, 1978 ). In analogy, data we cannot replicate are “soft.” Recent attempts to replicate allegedly well-established results of null-hypothesis significance testing (NHST), however, did broadly fail. As did the five preregistered replications, conducted between 2014 and 2016, reported in Perspectives of Psychological Science (Alogna et al., 2014 ; Cheung et al., 2016 ; Eerland et al., 2016 ; Hagger et al., 2016 ; Wagenmakers et al., 2016 ). This implies that the error-proportions of NHST-results generally are too large. For many more replication attempts should otherwise have succeeded.

We can partially explain the replication failure of NHST-results by citing questionable research practices that inflate the Type-I error probability (false positives), as signaled by a large α-error (Nelson et al., 2018 ). If researchers collect undersized samples, moreover, then this raises the Type-II error probability (false negatives), as signaled by a large β-error. (The latter implies a lack of sufficient test-power i.e., 1 – β-error). Ceteris paribus , as these errors increase, the replication-probability of a true hypothesis decreases, thus lowering the chance that a replication attempt obtains a similar data-pattern as the original study. Since NHST remains the statistical inference strategy in empirical psychology, many today (rightly) view the field as undergoing a replicability-crisis (Erdfelder and Ulrich, 2018 ).

It adds severity that this crisis extends beyond psychology—to medicine and health care (Ioannidis, 2014 , 2016 ), genetics (Alfaro and Holder, 2006 ), sociology (Freese and Peterson, 2017 ), and political science (Clinton, 2012 ), among other fields (Fanelli, 2009 )—and affects each field as a whole. A 50% replication-rate in cognitive psychology vs. a 25% replication-rate in social psychology (Open Science Collaboration, 2015 ), for instance, merely makes the first subarea appear more crisis-struck. Since all this keeps from having too much trust in our published empirical results, the term “confidence-crisis” is rather apt (Baker, 2015 ; Etz and Vandekerckhove, 2016 ).

The details of how researchers standardly employ NHST coming under doubt has sparked renewed interest in statistical inference. Indeed, many researchers today self-identify as either Frequentists or Bayesians, and align with a “school” (Fisher, Neyman-Pearson, Jeffreys, or Wald). However, statistical inference as a whole offers no more (nor less) than a probabilistic logic to estimate the support that a hypothesis, H , receives from data, D (Fisher, 1956 ; Hacking, 1965 ; Edwards, 1972 ; Stigler, 1986 ). This estimate is technically an inverse probability, known as the likelihood, L ( H | D ), and (rightly) remains central to Bayesians.

An important precondition for calculating L ( H | D ) is the probability of D given H, p ( D , H ). Unlike L ( H | D ), we cannot determine p ( D , H ) other than by induction over data. This (rightly) makes p ( D , H ) central to Frequentists. Testing H against D —in the sense of estimating L ( H | D )—thus presupposes induction, but nevertheless remains distinct conceptually. Indeed, the term “test” in “NHST” misleads. For NHST tests only p ( D , H ), but not L ( H | D ). This may explain why publications regularly over-report an NHST-result as supporting a hypothesis. Indeed, many researchers appear to misinterpret NHST as the statistical hypothesis-testing method it emphatically is not.

To clarify why testing p ( D , H ) conceptually differs from testing L ( H | D ), this article compares NHST with the research program strategy (RPS), a hybrid-approach that integrates Frequentist with Bayesian statistical inference elements (Witte and Zenker, 2016a , b , 2017a , b ). As “stand-ins” for real empirical data, we here simulate the distribution of a (dependent) variable in hypothetical treatment- and control-groups to simulate that variable's arithmetic mean in both groups. Our simulated data are sufficiently similar to data that actual studies would collect for purposes of assessing whether an independent, categorical variable (e.g., an experimental manipulation) significantly influences a dependent variable. Therefore, simulating the parameter-range over which hypothetical data are sufficiently replicable does estimate whether actual data are stable, and hence trustworthy .

We outline RPS [section The Research Program Strategy (RPS)], detail three statistical measures (section Three Measures), explain purpose, method, and the key-result of our simulations (section Simulations), offer a critical discussion (section Discussion), then compare RPS to a “pure” Frequentist and a standard Bayesian approach (section Frequentism Vs. Bayesianism Vs. RPS), and finally conclude (section Conclusion). As supplementary material, we include the R-code, a technical appendix, and an online-app to verify quickly that a dataset is sufficiently stable 1 .

The research program strategy (RPS)

With the construction of empirical theories as its main aim, RPS distinguishes the discovery context from the justification context (Reichenbach, 1938 ). The discovery context readily lets us induce a data-subsuming hypothesis without requiring reference to a theoretical construct. Rather, discerning a non-random data-pattern, as per p ( D , H 0 ) < α ≤ 0.05, here sufficiently warrants accepting the H 1 -hypothesis that is a best fit to D as a data-abbreviation . Focusing on non-random effects alone, then, discovery context research is fully data-driven.

In the justification context, by contrast, data shall firmly test a theoretical H 1 -hypothesis, i.e., verify or falsify the H 1 probabilistically. A hypothesis-test must therefore pitch a theoretical H 1 -hypothesis either against the (random) H 0 -hypothesis, or against some substantial hypothesis besides the H 1 (i.e., H 2 , …, H n −1 , H n ). Were the H 1 -hypothesis we are testing indistinct from the data-abbreviating H 1 , however, then data once employed to induce the H 1 now would confirm it, too. As this would level the distinction between theoretical and inductive hypotheses, it made “hypothesis-testing” an empty term. Hence, justification context research must postulate a theoretical H 1 .

Having described and applied RPS elsewhere (Witte and Zenker, 2016a , b , 2017a , b ), we here merely list the six (individually necessary and jointly sufficient) RPS-steps to a probabilistic hypothesis-verification 2 .

The first step discriminates a random fluctuation ( H 0 ) from a systematic empirical observation ( H 1 ), measured either by the p -value (Fisher) or the α-error (Neyman-Pearson). Under accepted errors, we achieve a preliminary H 1 -discovery if the empirical effect sufficiently deviates from a random event.

Neyman-Pearson test-theory (NPTT) states the probability that a preliminary discovery is replicable as the (1–β-error), aka test-power. If we replicate a preliminary discovery while α- and β-error (hereafter: α, β) remain sufficiently small, a preliminary H 1 -discovery turns into a substantial H 1 -discovery .

A substantial H 1 -discovery may entail that we thereby preliminarily falsify the H 0 (or another point-hypothesis). As the falsification criterion, we propose that the likelihood-ratio of the theoretical effect-size d > 0, postulated by the H 1 , and of a null-effect d = 0, postulated by the H 0 , i.e., L ( d > 0 | D ) L ( d = 0 | D ) , must exceed Wald's criterion ( 1 - β ) α (Wald, 1943 ).

A preliminary H 0 -falsification turns into a substantial H 0 -falsification if the likelihood-ratio of all theoretical effect-sizes that exceed the minimum theoretical effect-size d > δ = dH 1 – dH 0 , and of the H 0( d =0) , i.e., L ( d > δ | D ) L ( d = 0 | D ) , exceeds the same criterion, i.e., ( 1 - β ) α .

We achieve a preliminary H 1 -verification if the likelihood ratio of the point-valued H 1( d = δ) and the H 0( d =0) exceeds, again, ( 1 - β ) α .

Having preliminarily verified the H 1( d =δ) against the H 0( d =0) , we now test how similar δ is to the empirical (“observed”) effect-size's maximum-likelihood-estimate, MLE ( d emp) . As our verification criterion, we propose the ratio of both likelihood-values (i.e., the maximal ordinate of the normal distribution divided by its ordinate at the 95% interval-point), which is approximately 4 (see next section). If δ's likelihood falls within the 95%-interval centered on MLE ( d emp) , then we achieve a substantial H 1 -verification . This means we now accept “ H 1( d =δ) ” as shorthand for the effect-size our data corroborate statistically.

RPS thus starts in the discovery context by using p -values (Fisher), proceeds to an optimal 3 test of a non-zero effect-size against either a random-model or an alternative model (Neyman-Pearson), and reaches—entering into the justification context—a statistical verification of a theoretically specified effect-size based on probably replicable data 4 (see Figure ​ Figure1). 1 ). All along, of course, we must assume accepted α - and β -error.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-09-00460-g0001.jpg

The six steps of the research program strategy (RPS).

In what we call the data space , RPS-steps 1 and 2 thus evaluate probabilities; RPS-steps 3–5 evaluate likelihoods in the hypotheses space ; and RPS-step 6 returns to the data space. For data alone determine if the point-hypothesis from RPS-step 5 is substantially verified at RPS-step 6, or not. As if in a circle, then, RPS balances threes steps in the data space (1, 2, 6) with three steps in the hypotheses space (3, 4, 5).

Importantly, individual research groups rarely command sufficient resources to collect a sufficiently large sample that achieves the desirably low error-rates a well-powered study requires (see note 3). To complete all RPS-steps, therefore, groups must coordinate joint efforts, which requires a method to aggregate underpowered studies safely (We return to this toward the end of our next section).

Since RPS integrates Frequentist with Bayesian statistical inference-elements, the untrained eye might discern an arbitrary “hodgepodge” of methods. Of course, the Frequentist and Bayesian schools both harbor advantages and disadvantages (Witte, 1994 ; Witte and Kaufman, 1997 ; Witte and Zenker, 2017b ). For instance, Bayesian statistics allows us to infer hypotheses from data, but normally demands greater effort than using Frequentist methods. The simplicity and ubiquity of Frequentist methods, by contrast, facilitates the application and communication of research results. But it also risks to neglect assumptions that affect the research process, or to falsely interpret such statistical magnitudes as confidence intervals or p -values (Nelson et al., 2018 ). Decisively, however, narrowly sticking to any one school would simply avoid attempting to integrate each school's best statistical inference-elements into an all-things-considered best strategy. RPS does just this.

RPS motivates the selection of these elements by its main goal: to construct informative empirical theories featuring precise parameters and hypotheses. As RPS-step 1 exhausts the utility of α, or the p -value ( preliminary discovery ), for instance, β additionally serves at RPS-step 2 ( substantial discovery ). In general, RPS deploys inference elements at any subsequent step (e.g., the effect size at RPS-step 2–5; confidence intervals at RPS-step 6) to sequentially increase the information of a preceding step's focal result.

Unlike what RPS may suggest, of course, the actual research process is not linear. Researchers instead stipulate both the hypothesis-content and the theoretical effect-size freely. Nevertheless, a hypothesis-test deserving its name—one estimating L ( H | D ), that is—requires replicable rather than “soft” data, for such data alone can meaningfully induce a stable effect-size.

RPS therefore measures three qualities: induction quality of data, as well as falsification quality and verification quality of hypotheses, to which we now turn.

Three measures

This section defines three measures and their critical values in RPS. The first measure estimates how well data sustain an induced parameter; the second and third measure estimate how well replicable data undermine and, respectively, support a hypothesis 5 .

Def. induction quality : Based on NPTT, we measure induction quality as α and β, given a fixed sample size, N , and two point-valued hypotheses, H 0 and H 1 , yielding the effect-size difference dH 1 – dH 0 = δ.

The measure presupposes the effect-size difference dH 1 – dH 0 = δ, for otherwise we could not determine test-power (1–β).

Since induction quality pertains to the (experimental) conditions under which one collects data, the measure qualifies an empirical setting's sensitivity . Whether a setting is acceptable, or not, rests on convention, of course. RPS generally promotes α = β = 0.05, or even α = β = 0.01, as the right sensitivity (see section Frequentism Vs. Bayesianism Vs. RPS). By contrast, α = 0.05 and β = 0.20 are normal today. Since β α = 0 . 20 0 . 05 = 4 , this makes it four times more important to discover an effect than to replicate it—an imbalance that counts toward explaining the replicability-crisis.

A decisive reason to instead equate both errors (α = β) is that this avoids a bias pro detection (α) and contra replicability (1–β). Given acceptable induction quality, a substantial discovery thus arises if the probability of data passes the critical value ( 1 - β ) α . Under α = β = 0.05, for instance, we find that ( 1 - β ) α = 0 . 95 0 . 05 = 19 . Hence, for the H 1 to be statistically significantly more probable than the H 0 , we have it that p ( H 1 , D ) = 19 × p ( H 0 , D ).

Thus, we evidently can fully determine induction quality prior to data-collection for hypothetical data. Therefore, the measure says nothing about the focal outcome of a hypothesis-test. As we evaluate L ( H|D ) in the justification context, by contrast, the same measure nevertheless quantifies the trust that actual data deserve or—as the case may be—require.

Def. falsification quality : Based on Wald's theory, we measure falsification quality as the likelihood-ratio of all hypotheses the effect-size of which exceeds either the H 0 (preliminary falsification) or δ (substantial falsification), and the point-valued H 0 , i.e., L ( d > 0| D )/ L ( d = 0| D ). Our proposed falsification-threshold (1−β)/α thus depends on induction quality of data.

The falsification quality measure rests on both the H 1 and a fixed amount of actual data. It comparatively tests the point-valued H 0 against all point-alternative hypotheses that exceed d H 1 – d H 0 = δ. For instance, α = β = 0.05 obviously yields the threshold 19 (or log 19 = 2.94); α = β = 0.01 yields 99 (log 99 = 4.59), etc 6 . Since it is normally unrealistic to set α = β = 0, “falsification” here demands a statistical sense, rather than one grounding in an observation a deterministic law cannot subsume. Thus, a statistical falsification is fallible rather than final.

The same holds for verification:

Def. verification quality : Again based on Wald's theory, we measure verification quality as the likelihood-ratio of a point-valued H 1 and a substantially falsified H 0 . The threshold for a preliminary verification is again ( 1 - β ) α (thus, too, depends on induction quality of data). As the threshold for a substantial verification , we propose the value 4.

To explain this value, RPS views a H 1 -verification as preliminary if the maximum-likelihood-estimate (MLE) of data falls below the ratio of the maximum corroboration, itself determined via a normal curve's maximal ordinate, viz., 0.3989, and the ordinate at the 95%-interval centered on the maximum, viz., 0.10. As our confirmation threshold, this yields ≈4. Hence, a ratio <4 sees the theoretical parameter lie inside the 95%-interval. RPS would thus achieve a substantial verification .

Following Popper ( 1959 ), many take hypothesis-verification to be impossible in a deterministic sense. Understood probabilistically, by contrast, even a substantial verification of one point-valued hypothesis against another such hypothesis is error-prone (Zenker, 2017 ). The non-zero proportion of false negative decisions thus keeps us from verifying even the best-supported hypothesis absolutely. We can therefore achieve at most relative verification.

Assume we have managed to verify a parameter preliminarily. If the MLE now deviates sufficiently from that parameter's original theoretical value, then we must either modify the parameter accordingly, or may otherwise (deservedly) be admonished for ignoring experience. The MLE thus acts as a stopping-rule, signaling when we may (temporarily) accept a theoretical parameter as substantially verified.

The six RPS steps thus obtain a parameter we can trust to the extent that we accept the error probabilities. Unless strong reasons motivate doubt that our data are faithful, indeed, the certainty we invest into this parameter ought to mirror (1–β), i.e., the replication-probability of data closely matching a true hypothesis (Miller and Ulrich, 2016 ; Erdfelder and Ulrich, 2018 ).

Before sufficient amounts of probably replicable data arise in praxis, however, we must normally integrate various studies that each fail the above thresholds. RPS's way of integration is to add the log-likelihood-ratios of two point-hypotheses, each of which is “loaded” with the same prior probability, p ( H 1 ) = p ( H 0 ) = 0.50. Also known as log-likelihood-addition, RPS thus aggregates data of insufficient induction quality by relying on the well-known equation:

We proceed to simulate select values from the full parameter-range of possible RPS-results. These values are diverse enough to extrapolate to implicit values safely. The subsequent sections offer a discussion and then compare RPS to alternative methodologies.

Simulations

Using R-software, we simulate data for hypothetical treatment- and control-groups, calculate the group-means, and then compare these means with a t -test. While varying both induction quality of data and the effect-size, we simulate the resulting error rates. Since the simulated error- proportions of a t -test approximate the error- probability of data, this determines the parameter-range over which empirical results (such as those that RPS's six steps obtain) are stable , and hence trustworthy.

In particular, we estimate:

  • the necessary sample size, N MIN , in order to register, under (1–β), the effect-size δ as a statistically significant deviation from random 7 ;
  • the p -value, as the most commonly used indicator in NHST;
  • the likelihood that the empirical effect-size d (emp) exceeds the postulated effect-size δ, i.e., L ( d > δ| D ), as a measure of substantial falsification;
  • the likelihood of the H 0 , i.e., L (δ = 0| D ), as a measure of type I and type II errors;
  • the likelihood of the H 1 , i.e., the true effect-size L (δ| D ), as a measure of preliminary verification;
  • the maximum-likelihood-estimate of data, MLE( x ), when compared to the likelihood of the H 1 , as a measure of substantial verification.

We conduct five simulations. Simulations 1 and 2 estimate the probability of true positive and false negative results as a function of the effect-size and test-power. Our significance level is set to α = 0.05, respectively to α = 0.01. Simulation 3 estimates the probability of false positive results. The remaining two simulations address engaging with data in post-hoc fashion. Simulation 4 evaluates shedding 10% of data that least support the focal hypothesis. To address research groups' individual inability to collect the large samples that RPS demands, Simulation 5 mimics collaborative research by adding the log-likelihood-ratios of underpowered studies.

Simulation 1

Simulation 1 manipulates the test-power and the true effect-size to estimate the false negative error-rates (respectively the true positive rate) throughout RPS's six steps.

We manipulate 16 datasets that each contain 100 samples of identical size and variance. We represent a sample by the mean of a normally distributed variable in two independent groups (treatment and control), summarized with the test-statistic t . Between these 16 datasets, we vary the effect-size δ = [0.01, 0.2, 0.5, 0.8], and thus vary the difference between the group-means. We also vary test-power (1–β) = [0.4, 0.5, 0.8, 0.95], and thus let induction quality range from “very poor,” i.e., (1–β) = 0.4, to “medium,” i.e., (1–β) = 0.95. Under α = 0.05 (one-sided), we estimate N MIN to meet the respective test-power (Simulation 2 tightens the significance level to α = 0.01).

Results and discussion

For both the experimental and the control group, Table ​ Table1 1 lists N MIN to register the effect-size δ as a statistically significant deviation from random ( substantial discovery ). Generally, given constant test-power (1–β), the smaller (respectively larger) δ is, the larger (smaller) is N MIN . This shows how N MIN depends on β.

The estimated minimum sample size for a two sample t -test as a function of test-power (1–β) and effect size δ, given α = 0.05.

For the sample sizes in Table ​ Table1, 1 , moreover, Table ​ Table2 2 states the proportion of p -values that fall below α = 0.05, given a test-power value. This estimates the probability of a substantial discovery . As the standard deviation of the p -value here indicates, we retain a large variance across samples especially for data of low induction quality.

The proportion P of substantial discoveries, indicated by p -values below the significance level α = 0.05, as a function of the effect-size δ and test-power (1–β).

P(p < α) = proportion of significant results; σ(p) = standard deviation of p-value .

As with Table ​ Table1, 1 , Table ​ Table2 2 shows that the larger the test-power value is, the larger is the proportion of substantial discoveries, ceteris paribus . We obtain a similar result when estimating the probability of a substantial falsification or a preliminary verification , as per the likelihood-ratios L ( d > 0 | D ) L ( d = 0 | D ) and L ( d = δ | D ) L ( d = 0 | D ) meeting the threshold ( 1 - β ) α .

In case of a preliminary verification , however, we obtain a larger proportion of false negative results than in case of a substantial falsification. For in verification we narrowly test a point-valued H 0 against a point -valued H 1 . Whereas in falsification we test a point-valued H 0 against an interval H 1 . Therefore, the verification criterion is “less forgiving” than the falsification criterion.

Using bar plots to illustrate the distribution of likelihood-ratios ( LR s) for a preliminary verification, Figure ​ Figure2 2 shows that LR s often fall below the threshold ( 1 - β ) α . However, if data are only of medium induction quality (α = β = 0.05), we find a large proportion of LR s > 3. We should therefore not immediately reject the H 1 , if ( 1 - β ) α < LR > 3, because LR > 3 indicates some evidence for H 1 . Instead, we should supply additional data before evaluating the LR . If we increase the sample by 50% of its original size, N /2, for instance, but the LR still falls below the threshold, then we may add yet another N /2, and only then sum the log- LR s. If this too fails to yield a preliminary H 1 -verification (or a H 0 -verification), then we may still use this empirical result as a parameter-estimate which future studies might test.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-09-00460-g0002.jpg

Illustration of true positives. Bar plots indicate the frequencies of likelihood ratios ( L ( d > δ | D ) L ( d = 0 | D ) set in light gray, and L ( d = δ | D ) L ( d = 0 | D ) in dark gray) that, respectively, fall above the criterion ( 1 - β ) α (two leftmost bars), between this criterion and three (two middle bars), and below three (two rightmost bars), as a function of induction quality of data, provided the H 1 is true, under α = 0.05 [itself defined via d and (1–β), the latter here abbreviated as “pow”].

An important caveat is that the likelihood-ratio measures the distance between data and hypothesis only indirectly . Even though the likelihood steadily increases as the mean of data approaches the effect-size that the H 1 postulates, we cannot infer this distance from the LR alone , but must study the distribution itself. For otherwise, even if L R ≥ ( 1 - β ) α , we would risk verifying the H 1 although the observed mean of data does not originate with the H 1 -distribution, but with a distinct distribution featuring a different mean.

Moving beyond RPS-step 5, we can only address this caveat adequately by constraining the data-points that substantially verify the H 1 to those lying in an acceptable area of variance around the H 1 . Table ​ Table4 4 reports the proportion of preliminarily H 1 -verifying samples that now fail the criterion for a substantial H 1 -verification , and thus amount to additional false negatives. We can reduce these errors by increasing the sample size, which generally reduces the error-probabilities.

The proportion of preliminary verifications as per L R ≥ ( 1 - β ) α , given the empirical effect-size d lies outside the interval comprising 95% of expected values placed around the H 1 , where L ( d | D ) L ( d = δ | D ) > p d f ( P 50 | d ) p d f ( P 95 | d ) > 4 .

pdf, Probability density function; P 50 /P 95 , 50th/95th percentile .

To account for the decrease in β after constraining the sample size in PRS-step 5, of course, the value of the threshold ( 1 - β ) α now is higher, too. Hence, meeting it becomes more demanding. RPS-step 6 nevertheless increases our certainty that the data-mean originates with the hypothesized H 1 -distribution, and so increases our certainty in the theoretical parameter.

Table ​ Table5 5 states the proportion of datasets that successfully complete RPS's six steps, i.e., preliminary and substantial discovery (steps 1, 2) as well as preliminary and substantial falsification and verification (steps 3–6). For data of low to medium induction quality, we retain a rather large proportion of false negatives.

The proportion of substantial verifications, after substantial discoveries and subsequent preliminary verifications were obtained, given the H 0 had been substantially falsified.

Simulation 2

To reduce the proportion of false negatives, as we saw, we must increase induction quality of data. Simulation 2 illustrates this by lowering the error-rates.

Repeating the procedure of Simulation 1, but having tightened the error-rates from α = β = 0.05 to α = β = 0.01, we consequently obtain test-power (1–β) = 0.99. This also tightens the threshold from LR > 19 to LR > 99. We drop the smallest effect-size of Simulation 1 (δ = 0.01), for (1–β) = 0.99, after all, makes N MIN = 432,952 unrealistically large (see note 3). Simulation 2 therefore comprises three datasets (each with 100 samples) and manipulates the effect-size as δ = [0.2, 0.5, 0.8].

For these three effect sizes, Table ​ Table6 6 states N MIN under α = β = 0.01. Again, the larger (smaller) the effect is, the smaller (larger) is N MIN . Simulated p -values continue to reflect the test-power value almost perfectly (see Table ​ Table7). 7 ). Further, the proportion of preliminary verifications and substantial falsifications (see Table ​ Table8) 8 ) approaches the proportion of substantial discoveries (see Table ​ Table7 7 ).

Sample size for a t -test as a function of δ, given α = β = 0.01.

The proportion of substantial discoveries (indicated by the p -value) as a function of δ, given α = β = 0.01.

The proportion of substantial falsifications and preliminary verifications, indicated by the respective LR , as a function of δ under α = β = 0.01.

Under high induction quality of data, also the proportion of false negative verifications now is acceptable. When applying the corroboration criterion for a substantial verification, we thus retain only a very small number of additional false negative verifications (see Table ​ Table9 9 ).

The proportion of preliminary verifications as per L R ≥ ( 1 - β ) α , where the empirical effect size d , however, lies outside the area spanned by the 95%-interval of expected values centered on the H 1 , and where L ( d | D ) L ( d = δ | D ) > p d f ( P 50 | d ) p d f ( P 95 | d ) > 4 .

pdf, Probability density function; P 50 /P 95 = 50th/95th percentile .

Table ​ Table10 10 reports the proportion of simulated datasets that successfully complete RPS-steps 3–6 in the justification context (preliminary H 0 -falsification to substantial H 1 -verification). As before, increasing induction quality of data decreases the proportion of false negative results.

The proportion of substantial verifications (subsequent to achieving substantial discoveries and preliminary verifications), given that the H 0 was substantially falsified under α = β = 0.01.

Simulation 3

We have so far estimated the probability of true positive and false negative results as per the LR and the p -value. To estimate also the probability of false positive results, Simulation 3 assumes hypothetical effect-sizes and sufficiently large samples to accord with simulated test-power values.

Simulating four datasets (100 samples each), Simulation 3 matches the sample-size to the test-power values (1–β) = [0.4, 0.5, 0.8, 0.95] for a hypothetical effect-size δ = 0.2. In all datasets, the simulated true effect-size is δ = 0.

Table ​ Table11 11 shows that simulated p -values reflect our predefined significance level α = 0.05. At this level, a substantial falsification leads to a similar proportion of false positive results as a substantial discovery . By contrast, a preliminary verification decreases the proportion of false positive results to almost zero (see Table ​ Table12). 12 ). Applying the substantial verification -criterion even further decreases the probability of false positive results (see Figure ​ Figure3 3 ).

The proportion of false positives, where the sample size, N , is obtained by a priori power analysis, given δ = 0.2 and where (1–β) = [0.4, 0.5, 0.8, 0.95].

The proportion of false substantial falsifications and false preliminary verifications using L R ≥ ( 1 - β ) α .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-09-00460-g0003.jpg

Illustration of false positives. Bar plots indicate the frequency of likelihood ratios ( L ( d > δ | D ) L ( d = 0 | D ) in light gray and L ( d = δ | D ) L ( d = 0 | D ) in dark gray) repeatedly falling above the criterion ( 1 - β ) α , between the criterion and three, and below three, as a function of the sample size, provided the H 0 is true.

The preceding simulations suggest that, given the threshold L R ≥ ( 1 - β ) α , the proportion of false negative results remains too large. One might therefore lower the threshold to 3 < L R < ( 1 - β ) α , which still indicates some evidence for the H 1 (see Figure ​ Figure2). 2 ). Whether this new threshold reduces the proportion of false negative results unproblematically directly depends on the proportion of false positives. Compared to the case of falsification, however, we now retain a larger proportion of false positives (see Table ​ Table13 13 and Figure ​ Figure3 3 ).

The proportion of false preliminary verifications using LR = 3.

As we combine the threshold L R ≥ ( 1 - β ) α with the substantial verification-criterion, the previous simulations retained a rather large proportion of false negative results. However, this increase occurs only if data are of low to medium induction quality. If induction quality approaches α = β = 0.01, by contrast, then the proportion of both false positive and false negative results decreases to an acceptable minimum. Hence, we may falsify the H 0 and simultaneously verify the H 1 .

Simulation 4

Simulations 1–3 confirmed a simple relation: increasing induction quality of data decreases the proportion of false positive results. Where an actual experimental manipulation fails to produce its expected result, this relation may now tempt researchers to post-hoc manipulate induction quality of data, by shedding some of the “failing” data-points. Simulation 4 investigates the consequences of this move.

Using the samples from Simulation 4, we remove from each sample the 10% of data that score lowest on the dependent variable, thus least support the H 1 , and then re-assess the proportion of false positive findings.

Rather than increase induction quality of data, this post-hoc manipulation produces the opposite result: it raises the proportion of false positive results. On all of our criteria, indeed, shedding the 10% of data that least support the focal hypothesis increases the error-rates profoundly (see Table ​ Table14 14 ).

The proportion of false substantial falsifications and false preliminary verifications, given one had obtained a preliminary discovery (as per the p -value and LR ), after 10% of least hypothesis supporting data were removed.

Published data, of course, do not reveal whether someone shed parts of them. Where this manipulation occurs but one cannot trace it reliably, this risks that others draw invalid inferences. For this reason alone, sound inferences should rely on the aggregate results of independent studies (This assumes that data shedding is not ubiquitous). As RPS's favored aggregation method, we therefore simulate a log-likelihood-addition of such results.

Simulation 5

We generally advocate high induction quality of data. Collecting the sizable N MIN (that particularly laboratory studies require) to meet test-power = 0.99 (or merely 0.95), however, can quickly exhaust an individual research group's resources (see Lakens et al., 2018 ) 8 . In fact, we often have no other choice but to aggregate comparatively “soft” (underpowered) data from multiple studies. Aggregate data, of course, must reflect the trust each dataset deserves individually. We therefore simulate the addition of logarithmic LR s (log- LR s) for data of low to medium test-power.

We add the log- LR s as per the low-powered samples of Simulation 1, then assess the proportions of samples that meet the criteria of each RPS-step. Notice that this is the only way to conduct a global hypothesis-test that combines individual studies safely (It is nevertheless distinct from a viable meta-analytic approach; see Birnbaum, 1954 ).

Table ​ Table15 15 shows that the log- LR s of three low-powered studies under (1–β) = [0.4, 0.5, 0.8] aggregate to one medium-powered study under (1–β) = 0.95 (see Table ​ Table3), 3 ), because the three samples sum to N MIN for a substantial discovery under (1–β) = 0.95. The probability of correctly rejecting the H 0 thus approaches 1, whereas the proportion of preliminary verifications is not much larger than for each individual study (see Table ​ Table3; 3 ; last row). This means individual research groups can collect fewer data points than N MIN . Thus, log- LR addition indeed optimizes a substantial H 0 -falsification.

The proportion of substantial falsifications and preliminary verifications, as indicated by the respective likelihood ratio ( LR ) meeting or exceeding the threshold L R ≥ ( 1 - β ) α .

LR, likelihood ratio; D, data .

The proportions of L R ≥ ( 1 - β ) α when adding the log( LR ) of individually underpowered studies featuring (1–β) = [0.4, 0.5, 0.8].

Simulations 1–5 recommend RPS primarily for its desirably low error-rates, which to achieve made induction quality of data and likelihood-ratios central. Particularly Simulation 5 shows why log-likelihood-ratio addition of individually under-powered studies can meet the rigorous test-power demands of the justification context, viz. (1–β) = 0.95, or better yet (1–β) = 0.99.

As an alternative to testing the H 1 against H 0 = 0, we may pitch it against H 0 = random. Following a reviewer's suggestion, we therefore also simulated testing the mean-difference between the treatment- and the control-group against the randomly varying mean-difference between the control-group and zero. Compared to pitching the H 1 against H 0 = 0, this yields a reduced proportion of false negatives, but also generates a higher proportion of false positives.

Since our sampling procedure lets the mean-difference between control group and zero vary randomly around zero, the increase in false positives (negatives) arises from the control group's mean-difference falling below ( above ) zero in roughly 50% of all samples. This must increase the LR in favor of the H 1 ( H 0 ). With respect to comparing group-means, however, testing the H 1 against H 0 = random does not prove superior to testing it against H 0 = 0, as in RPS.

In view of RPS, if induction quality of data remains low (α = β > 0.05), then we cannot hope to either verify or falsify a hypothesis. This restricts us to two discovery context-activities: making a preliminary or a substantial discovery (RPS-step 1, 2). After all, since both discovery-variants arise from estimating p ( D , H ), this rules out hypothesis-testing research, which instead estimates L ( H | D ).

By contrast, achieving medium induction quality (α = β ≤ 0.05) meets a crucial precondition for justification context-research. RPS can now test hypotheses against “hard” data by estimating L ( H | D ). Specifically, RPS tests a preliminary , respectively a substantial H 0 -falsification (RPS-steps 3, 4), by testing if L ( d > 0 | D ) L ( d = 0 | D ) , respectively L ( d = δ | D ) L ( d = 0 | D ) , exceeds ( 1 - β ) α . If the latter holds true, then we can test a preliminary verification of the theoretical effect-size H 1 -hypothesis (RPS-step 5) as to whether L ( d = δ | D ) L ( d = 0 | D ) exceeds ( 1 - β ) α . If so, then we finally test a substantial H 1 -verification (RPS-step 6)—here using the ratio of the MLE of data and the likelihood of the H 1( d =δ) —as to whether δ falls within the 95%-interval centered on the MLE [If not, we may adapt H 1( d =δ) accordingly, provided both theoretical and empirical considerations support this].

As we saw, RPS almost eliminates the probability of a false positive H 1 -verification. If data are of medium induction quality, moreover, then the probability of falsely rejecting the H 1 lies in an acceptable range, too (This range is even slightly smaller than that for false positive verifications). However, lowering the threshold ( 1 - β ) α to decrease the probability of false negatives will increase the probability of false positives. In balancing false positive with false negative H 1 -verifications, then, we face an inevitable trade-off.

To increase the probability of false positives is generally more detrimental to a study's global outcome than to decrease the probability of false negatives. After all, since editors and reviewers typically prefer significant results ( p < α = 0.05), non-significant results more often fail the review process, or are not written-up (Franco et al., 2014 ). This risks that the community attends to more potentially false positive than potentially false negative results 9 . That researchers should reduce this risk thus speaks decisively against lowering the threshold. To control the risk, moreover, it suffices to increase induction quality of data by adding additional samples until N = N MIN .

In psychology as elsewhere today, the standard mode of empirical research clearly differs from what RPS recommends; particularly induction quality (test-power) appears underappreciated. Yet, what besides a substantial H 1 -verification can provide a statistical warrant to accept a H 1 that aptly pre- or retrodicts a phenomenon? Likewise, only a substantial H 0 -falsification can warrant us in rejecting the H 0 (For reasons given earlier, p -values alone will not do).

The discovery context as RPS's origin and the justification context as its end, RPS employs empirical knowledge to gain theoretical knowledge. A theory is generally more informative the more possible states-of-affairs it rules out. The most informative kind of theory, therefore, lets us deduce hypotheses predicting precise (point-specified) empirical effects—effects we can falsify statistically 10 . Obvious candidates for such point-values are those effects that “hard” data support sufficiently. RPS's use of statistical inference toward constructing improved theories thus reflects that, rather than one's statistical school determining the most appropriate inference-element, this primarily depends on the prior state of empirical knowledge we seek to develop.

Such prior knowledge we typically gain via meta-analyses that aggregate the samples and effect-sizes of topically related object-level studies. These studies either estimate a parameter or test a hypothesis against aggregated data, but typically are individually underpowered. A meta-analysis now tends to join an estimated combined effect-size of several studies, one on hand, with the estimated sum of their confidence intervals deviating from the H 0 , on the other. This aggregate estimate, however, thus rests on data of variable induction quality. A similar aggregation method, therefore, can facilitate only a parameter-estimation, but it will not estimate L ( H | D ) safely.

A typical meta-analysis indeed ignores the replication-probability of object-level studies, instead considering only the probability of data, p ( D , H ) 11 . This makes it an instance of discovery context-research. By contrast, log-likelihood-addition is per definition based on trustworthy data (of high induction quality), does estimate L ( H | D ) safely, and hence is an instance of justification context-research (see sections Three Measures and Discussion).

RPS furthermore aligns with the registered replication reports-initiative (RRR), which aims at more realistic empirical effect-size estimates by counteracting p -hacking and publication bias (Bowmeester et al., 2017 ). Indeed, RPS complements RRR. Witte and Zenker's ( 2017a ) re-analysis of Hagger et al.'s ( 2016 ) RRR of the ego-depletion effect, for instance, strengthens the authors' own conclusions, showing that their data lend some 500 times more support to the H 0( d = 0.00) than to the H 1( d =0.20) .

Both RRR and RPS obviously advocate effortful research. Though we could coordinate such efforts across several research groups, current efforts are broadly individualistic and tend to go into making preliminary discoveries . This may yield a more complex view upon a phenomenon. Explaining, predicting, and intervening, however, all require theories with substantially verified H 1 -hypotheses as their deductive consequences. Again, constructing a more precise version of such a theory is RPS's main aim. Indeed, we need something like RPS anyways. For we can statistically test hypotheses by induction [see section The Research Program Strategy (RPS)], but we cannot outsource theory-construction to induction.

Frequentism vs. bayesianism vs. RPS

A decisive evaluative criterion is whether an inference strategy leads to a rigorously validated, informative theory. Researchers can obviously support this end only if their individual actions relate to what the research community does as a whole. At the same time, each researcher must balance her own interests with those of others. Hence, we exercise “thrift” when collecting small samples, but also publish the underpowered results this generates to further our careers.

Reflecting the research community's need for informative theories, most journals require that a submitted manuscript report at least one statistically significant effect—that is, a preliminary discovery à la NHST (For an exception, see Trafimow, 2014 ). Given this constraint, the favored strategy to warrant our publication activities seemingly entails conducting “one-shot”-experiments, leading to many papers without integrating their results theoretically.

That strategy's probably best defense offers three supporting reasons: (i) the strategy suffices to discover non-random effects; (ii) non-random effects matter in constructing informative theories; (iii) the more such discoveries the merrier. However, (i) is a necessary (rather than a sufficient) reason that the strategy is apt; (ii) is an insufficient supporting reason, for non-randomness matters but test-power counts (Witte and Zenker, 2018 ); and (iii) obviously falls with (ii). Therefore, this defense cannot sufficiently support that the strategy balances the interests of all concerned parties. Indeed, the status quo strongly favors the individual's career aspirations over the community's need for informative theories.

The arguably best statistical method to make a discovery remains a Fisher-test (For other methods, see, e.g., Woodward, 1989 ; Haig, 2005 ). It estimates the probability of an empirical effect given uncontrollable, but non-negligible influences. This probability meeting a significance-threshold such as p ( H , D ) < α = 0.05, as we saw, is a necessary and sufficient condition for a preliminary discovery (RPS-step 1). Though this directs our attention to an empirical object, it also exhausts what NHST by itself can deliver. Subsequent RPS-steps therefore employ additional induction quality measures, namely the effect-size (steps 2–5) and offer a new way of using confidence intervals (step 6).

Recent critiques of NHST give particular prominence to Bayesian statistics. As an alternative to a classical t -test, for instance, many promote a Bayesian t -test. This states the probability-ratio of data given a hypotheses-pair, p ( D | H 1 )/ p ( D | H 0 ), a ratio that is known as the “Bayes factor” (Rouder et al., 2009 ; Wetzels et al., 2011 ). If the prior probabilities are identical, p ( H 1 ) = p ( H 0 ) = 0.50, then the Bayes factor is the likelihood-ratio of two point-hypotheses, L ( H 1 | D )/ L ( H 0 | D ). Indeed, RPS largely is coextensive with a Bayesian approach as concerns the hypothesis space .

But Bayesians must also operate in the data space , particularly when selecting data-distributions as priors for an unspecified H 1 . Such substantial assumptions obviously demand a warrant. For the systematic connection between the Bayes-factor and the p -value of a classical t -test is that “default Bayes factors and p -values largely covary with each other” (Wetzels et al., 2011 , 295). The main difference is their calibration: “ p -values accord more evidence against the null [hypothesis] than do Bayes factors” (ibid).

The keyword here is “default.” For the default prior probabilities one assumes matter when testing hypotheses. In fact, not only do Bayesians tend to assign different default priors to the focal H 0 and the H 1 ; they also tend to distribute (rather than point-specify) these priors. As Rouder et al. ( 2009 , 229) submit, for instance, “[…] we assumed that the alternative [hypothesis] was at a single point”—an assumption, however, which allegedly is “too restrictive to be practical” (ibid). Rather, it be “more realistic to consider an alternative [hypothesis] that is a distribution across a range of outcomes” (ibid), although “arbitrarily diffuse priors are not appropriate for hypothesis testing” (p. 230) either. This can easily suggest that modeling a focal parameter's prior probability distributively would be the innocent choice it is not.

After all, computing a Bayesian t -test necessarily incurs not only a specific prior data-distribution, but also a point-specified scaling factor. This factor is given by the prior distributions of the focal hypotheses, i.e., as the ratio p ( H 1 )/ p ( H 0 ) [see our formula (1), section Three Measures]. Prior to collecting empirical data, therefore, p ( H 1 )/ p ( H 0 ) < 1 reflects a (subjective) bias pro the H 0 –which lets data raise the ratio's denominator—while p ( H 1 )/ p ( H 0 ) > 1 reflects a preference contra the H 0 .

If the priors on the H 0 and the H 1 are unbiased, by contrast, then the scaling factor “drops out.” It thus qualifies as a hidden parameter. Alas, unbiased priors are the exception in Bayesian statistics. A default Bayesian t -test, for instance, normally assumes both a Cauchy distribution and a scaling factor of 0.707. Both assumptions are of the same strength as the assumptions that RPS incurs to point-specify the H 1 . The crucial difference, however, is that the two Bayesian assumptions concern the data space , whereas RPS's assumptions pertain to the hypotheses space .

Unlike RPS's assumptions, the two Bayesian assumptions thus substantially influence the shape of possible data. For the scaling factor's value grounds in the type of the chosen prior-distribution, which hence lets the Bayes factor vary noticeably. Different default priors can thus lead to profound differences as to whether data corroborate the H 0 - or the H 1 -hypothesis

Moreover, a Bayesian t -test's result continues to depend on the sample size, and lacks information on the replication-probability of data given a true hypothesis.

The most decisive reason against considering a standard Bayesian approach an all-things-considered best inference strategy, finally, is that it remains unclear how to sufficiently justify this or that scaling factor, or distribution, not only “ prior to analysis[, but also] without influence from [sic] data” (Rouder et al., 2009 , 233; italics added ). Indeed, the need to fix a Bayesian t-test's prior-distribution alone already fully shifts the decision—as to the elements an inference strategy should (not) specify—from the hypotheses space to the data space. This injects into the debate a form of subjectivity that point-specifying the H 1 would instead make superfluous.

One should therefore treat a Bayesian t -test with utmost caution. For rather than render hypothesis testing simple and transparent, a Bayesian t -test demands additional efforts to bring its hidden parameters and default priors back into view . We would hence do well to separate our data exploration-strategy clearly from our hypothesis-testing machinery. The Bayesian approach, however, either would continue not to mark a clear boundary or soon look similar to RPS's hybrid-approach 12 .

To summarize the advantages RPS offers over both a pure Frequentist and a standard Bayesian approach:

  • RPS uses NPTT to determine the minimum sample size, N MIN , that suffices to conduct research under at least medium induction quality of data (α = β < 0.05);
  • the RPS hypothesis corroboration-threshold is sensitive to both errors (α, β);
  • to facilitate an aggregate hypothesis-evaluation (balancing resource restrictions with career aspirations), RPS uses log-likelihood-addition to integrate individually underpowered studies.

RPS thus makes explicit why a statistical result depends on the sample-size, N . Using a point-alternative hypothesis particularly shows that the Bayes-factor varies with N , which otherwise remains “hidden” information. Throughout RPS's six steps, the desirably transparent parameter to guide the acceptance or rejection of a hypothesis (as per Wald's criterion) is induction quality of data (test-power).

Finally, notice that the “new statistics” of Cumming ( 2013 ) only pertains to the data space. As does Benjamin et al.'s ( 2018 ) proposal to lower α drastically. For it narrowly concerns a preliminary discovery (RPS-step 1), but leaves hypothesis-testing unaddressed (also see Lakens et al., 2018 ). To our knowledge, no equally appropriate and comprehensive strategy currently matches the inferential capabilities that RPS offers (Wasserstein and Lazar, 2016 ).

RPS is a hybrid-statistical approach using tools from several statistical schools. Its six hierarchical steps lead from a preliminary H 1 -discovery to a substantial H 1 -verification. Each step not only makes a prior empirical result from an earlier step more precise, our simulations also show that completing RPS's six steps nearly eliminates the probability of false positive H 1 -verifications. If data are of medium induction quality, moreover, then also the probability of falsely rejecting the H 1 lies in an acceptable range.

Having simulated a broad range of focal parameters (α, β, d, N ), we may extrapolate to implicit ranges safely. This lets us infer the probable error-rates of studies that were conducted independently RPS and thus allows estimating how trustworthy a given such result is. The online-tool we supply indeed makes this easy.

We advocate RPS primarily for the very low error-rates of its empirical results (Those feeling uncertain about such RPS-results may further increase the sample, to obtain yet lower error-rates). Moreover, an integration of individually underpowered studies via log-likelihood-addition not only is meaningful, it can also meet the test-power demands of the justification context. Therefore, research groups may cooperate such that each group collects fewer that the minimum number of data points.

Null-hypothesis significance testing by itself can at most deliver a preliminary discovery (RPS-step 1). This may motivate new research questions, which for RPS is merely an intermediate goal; the aim is to facilitate theory development and testing. Since most current research in psychology as elsewhere stops at RPS-step 1, however, this cannot suffice to construct well-supported and informative theories. Indeed, that an accumulation of preliminary discoveries could lead to a well-supported theory ever remains a deeply flawed idea.

Author contributions

The idea for RPS originates with EW. FZ and EW jointly developed its presentation. AK-S programed and ran the simulations. EW wrote the first draft of the manuscript; all authors edited it and approved the final version.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank Paul T. Barrett and Aristides Moustakas for constructive comments that helped improve an earlier version of this manuscript. We also thank Holmes Finch for overseeing the review process. AK-S acknowledges a research grant from the Swiss National Science Foundation, and open access funding from the University of Geneva. FZ acknowledges funding from the Ragnar Söderberg Foundation, the Volkswagen Foundation, the HANBAN institute, and a European Union MSC-COFUND fellowship, as well as open access funding from Lund University.

1 See https://osf.io/pwc26/ for the R-code; find the online-tool at https://antoniakrefeldschwalb.shinyapps.io/ResearchProgramStrategy/

2 Our focus here is on the quantitative evaluation of hypotheses by empirical data. The current presentation of RPS therefore both excludes qualitative research processes preceding data-collection like conjecturing phenomena or constructing experimental designs (see Flick, 2014 ) as well as subsequent processes like embedding data into an informative theory. Both process kinds employ observation and interpretation, but also rely on scholarly argument referencing more than statistical data alone. Nonetheless, insofar as empirical data are independent of a researcher's prior belief, such data are necessary to run a research program.

3 The smallest sample, N MIN , sufficing in NPTT to identify a point-specified effect as a statistically significant deviation from random, is a function of α, β, and d . Under conventional errors (e.g., α = β ≤ 0.05), therefore, given any sample, N , a significance-test is optimal if N = N MIN . With both hypothesis-verification and -falsification alike, however, if N > N MIN , then the utility of additional data decreases. Under α = 0.05 (one-tailed), for instance, already N = 500 let the very small effect d = 0.10 become statistically significant, even though it “explains” but 0.002% of data-variance. Once N > 60.000, this utility vanishes. Almost any way of partitioning a very large sample now makes virtually the smallest effect statistically significant (Bakan, 1966 ).

This may seem paradoxical because the law of great numbers states that, ceteris paribus , enlarging N increases the validity of a parameter-estimate. At N > 60,000, however, measuring virtually any variable “reveals” that it significantly deviates from some predicted value. In a statistical sense, all unknown influences can now sufficiently manifest themselves, which lets any parameter-value become equally admissible. But if every parameter could become statistically significant, then none would be particularly important. Ad absurdum , then, as concerns hypothesis-testing the claim “more data is always better” is false in the hypotheses space . It nevertheless holds that increasing the sample yields an ever more precise parameter-estimate in the data space .

4 Fixing the H 0 -parameter as d = 0, as a random-model has it, is merely a convention, of course. In fact RPS can alternatively base the H 0 -parameter on a control-group (as is typical), or on a simpler model (that eliminates elements of a more complex model), or on a rivaling theoretical model. In any case, not to remain ignorant of α, β, N, d , we must specify the H 0 .

5 Witte and Zenker ( 2017b ) presented the second and third measure as if they were one.

6 Using “log” to abbreviate the logarithmus naturalis (ln), as per the command in R, in previous work we used “log” to abbreviate the logarithmus decimalis (Witte and Zenker, 2017b ). Results are independent of nomenclature, of course, except that the critical values then came to 1.28 and 2.00.

7 The term “random” is shorthand for a normalized mean, irrespective of whether we assume random influences, work with a control group or a simpler model (featuring fewer parameters), or with a theoretical alternative model.

8 Governing the praxis of enlarging the sample until we reach sufficient test-power is the assumption that the focal theoretical parameter is the mean at the group level, whereas participant behavior at the individual level fluctuates randomly. If we instead focus on the potential non-random variation at the individual level, of course, then it is not the size of the sample (number of participants) that counts, but the number of repeated measurements we perform on a single participant. With a “small-N design,” indeed, the population is the single participant (see Smith and Little, 2018 ). Provided our indicators are valid and reliable, repeated measurements on a single participant may in fact detect idiosyncratic influences that averaging at the group level could distort. But rather than offer an alternative to large sample research, a small-N design serves a distinct purpose, and so complements large sample research.

9 Things might look different if, next to a truth-criterion (based on error probabilities), we employ external utilities, too (Miller and Ulrich, 2016 ). Even where we can motivate such utilities unproblematically, we must ever compare the empirical proportions of simulated false positive results vs. false negative substantial verifications . Under medium induction quality (α = β = 0.05), the odds-ratio roughly is 1:5, under high induction quality (α = β = 0.01) it is 1:10. To compare with the proportion of substantial discoveries and substantial falsifications , under medium induction quality the odds-ratio decreases to about 1:2; under high induction quality nearly to 1:1. As we saw in the previous section, the asymmetry itself arises from comparing a point-parameter in case of false negatives , with a distributed-parameter (an interval) in case of false positives .

10 If we are uncertain which point-hypothesis best specifies a theoretical parameter, then we may generalize the parameter from a point- to an interval-hypothesis. The interval's end-points thus state distinct (hypothetical) effect-sizes; the middle point qualifies as a theoretical assumption. To achieve constant induction quality, of course, we must confront each end-point with its appropriate sample size. To this end, log-likelihood-addition lets us increase the sample associated to the larger effect-size until we reach the appropriate sample-size for the smaller effect-size.

11 Here, we can neither discuss meta-analysis as a method, nor adequately address the replication of empirical studies. We show elsewhere how to statistically establish hypotheses by integrated efforts, particularly addressing Bem's psi-hypothesis (Witte and Zenker, 2017b ) and the ego-depletion effect (Witte and Zenker, 2017a ).

12 Schönbrodt and Wagenmakers's ( 2018 ) recent Bayes factor design analysis (BFDA), for instance, clearly recognizes the need to first plan an empirical setting, to only then evaluate the degree to which actual data falsify or verify a hypothesis statistically. This same need lets RPS characterize the setting via induction quality of data. While the planning stage is independent of the analysis stage, RPS's Wald-criterion not only provides a bridge between them, it also functions as a threshold with known consequences. Unlike BFDA and similar Bayesian approaches, however, RPS avoids setting subjective priors and relies solely on the likelihood-function.

  • Alfaro M. E., Holder M. T. (2006). The posterior and the prior in Bayesian phylogenetics . Annu. Rev. Ecol. Evol. Syst. 37 , 19–42. 10.1146/annurev.ecolsys.37.091305.110021 [ CrossRef ] [ Google Scholar ]
  • Alogna V. K., Attaya M. K., Aucoin P., Bahnik S., Birch S., Birt A. R., et al.. (2014). Registered replication report: Schooler & Engstler-Schooler. (1990) . Perspect. Psychol. Sci. 9 , 556–578. 10.1177/1745691614545653 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bakan D. (1966). The test of significance in psychological research . Psychol. Bull. 66 , 423–437. 10.1037/h0020412 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baker M. (2015). First results from psychology's largest reproducibility test . Nature. 10.1038/nature.2015.17433 [ CrossRef ] [ Google Scholar ]
  • Benjamin D. J., Berger J., Johannesson M., Nosek B. A., Wagenmakers E.-J., Berk R., et al. (2018). Redefine statistical significance . Nat. Human Behav. 2 , 6–10. 10.1038/s41562-017-0189-z [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Birnbaum A. (1954). Combining independent tests of significance . J. Am. Statist. Assoc. 49 , 559–574. [ Google Scholar ]
  • Bowmeester S., Verkoeijen P. P. J. L., Aczel B., Barbosa F., Bègue L., Brañas-Garza P., et al. (2017). Registered replication report: Rand, Greene, and Nowak (2012) . Perspect. Psychol. Sci. 12 , 527–542. 10.1177/1745691617693624 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cheung I., Campbell L., LeBel E., Yong J. C. (2016). Registered replication report: study 1 from Finkel, Rusbult, Kumashiro, & Hannon (2002) . Perspect. Psychol. Sci. 11 , 750–764. 10.1177/1745691616664694 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Clinton J. D. (2012). Using roll call estimates to test models of politics . Ann. Rev. Pol. Sci. 15 , 79–99. 10.1146/annurev-polisci-043010-095836 [ CrossRef ] [ Google Scholar ]
  • Cumming G. (2013). The new statistics: why and how . Psychol. Sci. 20 , 1–23. 10.1177/0956797613504966 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Edwards A. W. F. (1972). Likelihood. Cambridge: Cambridge University Press; (expanded edition, 1992, Baltimore: Johns Hopkins University Press). [ Google Scholar ]
  • Eerland A., Sherrill A. M., Magliano J. P., Zwaan R. A., Arnal J. D., Aucoin P., et al.. (2016). Registered replication report: Hart & Albarracín (2011) . Perspect. Psychol. Sci. 11 , 158–171. 10.1177/1745691615605826 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Erdfelder E., Ulrich R. (2018). Zur Methodologie von Replikationsstudien [On a methodology of replication studies] . Psychol. Rundsch. 69 , 3–21. 10.1026/0033-3042/a000387 [ CrossRef ] [ Google Scholar ]
  • Etz A., Vandekerckhove J. (2016). A Bayesian perspective on the reproducibility project: psychology . PLoS ONE 11 :e0149794. 10.1371/journal.pone.0149794 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fanelli D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data . PloS ONE 4 :e5738. 10.1371/journal.pone.0005738 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisher R. A. (1956). Statistical Methods and Scientific Inference . New York, NY: Hafner. [ Google Scholar ]
  • Flick U. (ed.). (2014). The SAGE Handbook of Qualitative Data Analysis. London: Sage. [ Google Scholar ]
  • Franco A., Malhotra N., Simonovits G. (2014). Publication bias in the social sciences: unlocking the file drawer . Science 345 , 1502–1505. 10.1126/science.1255484 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Freese J., Peterson D. (2017). Replication in social science . Annu. Rev. Sociol. 43 , 147–165. 10.1146/annurev-soc-060116-053450 [ CrossRef ] [ Google Scholar ]
  • Hacking I. (1965). Logic of Statistical Inference. Cambridge: Cambridge University Press. [ Google Scholar ]
  • Hagger M. S., Chatzisarantis N. L. D., Alberts H., Anggono C. O., Batailler C., Birt A. R., et al.. (2016). A multilab preregistered replication of the ego-depletion effect . Perspect. Psychol. Sci. 11 , 546–573. 10.1177/1745691616652873 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Haig B. (2005). An abductive theory of scientific method . Psychol. Methods 10 , 371–388. 10.1037/1082-989X.10.4.371 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ioannidis J. P. A. (2014). How to make more published research true . PLoS Med. 11 :e1001747. 10.1371/journal.pmed.1001747 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ioannidis J. P. A. (2016). The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses . Milbank Q. 94 , 485–514. 10.1111/1468-0009.12210 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lakatos I. (1978). The Methodology of Scientific Research Programmes , Vol. I , eds Worrall J., Currie G. Cambridge, UK: Cambridge University Press. [ Google Scholar ]
  • Lakens D., Adolfi F. G., Albers C., Anvari F., Apps M. A. J., Argamon S. E., et al. (2018). Justify your alpha . Nat. Hum. Behav. 2 , 168–171. 10.1038/s41562-018-0311-x [ CrossRef ] [ Google Scholar ]
  • Miller J., Ulrich R. (2016). Optimizing research payoff . Perspect. Psychol. Sci. 11 , 661–691. 10.1177/1745691616649170 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nelson L. D., Simmons J., Simonsohn U. (2018). Psychology's Renaissance . Annu. Rev. Psychol. 69 , 511–534. 10.1146/annurev-psych-122216-011836 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Open Science Collaboration (2015). Estimating the reproducibility of psychological science . Science 349 :acc4716 10.1126/science.aac4716 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Popper K. R. (1959). Logic of Scientific Discovery . London: Basic Books. [ Google Scholar ]
  • Reichenbach H. (1938). Experience and Prediction . Chicago, IL: University of Chicago Press. [ Google Scholar ]
  • Rouder J. N., Speckman P. L., Sun D., Morey R. D., Iverson G. (2009). Baysian t-tests for accepting and rejecting the null hypothesis . Psychon. Bull. Rev. 16 , 225–237. 10.3758/PBR.16.2.225 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schönbrodt F., Wagenmakers E.-J. (2018). Bayes factor design analysis: planning for compelling evidence . Psychon. Bull. Rev. 25 , 128–142. 10.3758/s13423-017-1230-y [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith P. L., Little D. R. (2018). Small is beautiful: In defense of the small-N design . Psychon. Bull. Rev. [Epub ahead of print]. 10.3758/s13423-018-1451-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stigler S. (1986). The History of Statistics: The Measurement of Uncertainty before 1900 . Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Trafimow D. (2014). Editorial . Basic Appl. Soc. Psychol. 36 , 1–2. 10.1080/01973533.2014.865505 [ CrossRef ] [ Google Scholar ]
  • Wagenmakers E.-J., Beek T., Dijkhoff L., Gronau Q. F., Acosta A., Adams R. B., et al.. (2016). Registered replication report: Strack, Martin, and Stepper. (1988) . Perspect. Psychol. Sci. 11 , 917–928. 10.1177/1745691616674458 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wald A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large . Trans. Am. Math. Soc. 54 , 426–482. 10.1090/S0002-9947-1943-0012401-3 [ CrossRef ] [ Google Scholar ]
  • Wasserstein R. L., Lazar N. A. (2016). The ASA's statement on p-values: context, process, and purpose . Am. Stat. 70 , 129–133. 10.1080/00031305.2016.1154108 [ CrossRef ] [ Google Scholar ]
  • Wetzels R., Matzke D., Lee M. D., Rouder J. N., Iverson G. J., Wagenmakers E.-J. (2011). Statistical evidence in experimental psychology: an empirical comparison using 855 t-tests . Perspect. Psychol. Sci. 6 , 291–298. 10.1177/1745691611406923 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Witte E. H. (1994). A Statistical Inference Strategy (FOSTIS): A Non-Confounded Hybrid Theory. HAFOS , 9 Available online at: http://hdl.handle.net/20.500.11780/491 (Accessed August 8, 2016).
  • Witte E. H., Kaufman J. (1997). The Stepwise Hybrid Statistical Inference Strategy: FOSTIS. HAFOS , 18 Available online at: http://hdl.handle.net/20.500.11780/502 (Accessed April 6, 2018).
  • Witte E. H., Zenker F. (2016a). Reconstructing recent work on macro-social stress as a research program . Basic Appl. Soc. Psychol. 38 , 301–307. 10.1080/01973533.2016.1207077 [ CrossRef ] [ Google Scholar ]
  • Witte E. H., Zenker F. (2016b). Beyond schools—reply to Marsman, Ly & Wagenmakers . Basic Appl. Soc. Psychol. 38 , 313–317. 10.1080/01973533.2016.1227710 [ CrossRef ] [ Google Scholar ]
  • Witte E. H., Zenker F. (2017a). Extending a multilab preregistered replication of the ego-depletion effect to a research program . Basic Appl. Soc. Psychol. 39 , 74–80. 10.1080/01973533.2016.1269286 [ CrossRef ] [ Google Scholar ]
  • Witte E. H., Zenker F. (2017b). From discovery to justification. Outline of an ideal research program in empirical psychology . Front. Psychol. 8 , 1847. 10.3389/fpsyg.2017.01847 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Witte E. H., Zenker F. (2018). Data replication matters, replicated hypothesis-corroboration counts. (Commentary on “Making Replication Mainstream” by Rolf A. Zwaan, Alexander Etz, Richard E., Lucas, and M. Brent Donnellan) . Behav. Brain Sci. (forthcoming). [ PubMed ] [ Google Scholar ]
  • Woodward J. (1989). Data and phenomena . Synthese 79 , 393–472. 10.1007/BF00869282 [ CrossRef ] [ Google Scholar ]
  • Zenker F. (2017). Falsification , in The Wiley Encyclopedia of Social Theory , ed Turner B. (Chichester: Wiley Blackwell; ), 1–3. [ Google Scholar ]
  • Share full article

Advertisement

Supported by

Guest Essay

The Troubling Trend in Teenage Sex

A pile of bed linens on a night stand next to a bed.

By Peggy Orenstein

Ms. Orenstein is the author of “Boys & Sex: Young Men on Hookups, Love, Porn, Consent and Navigating the New Masculinity” and “Girls & Sex: Navigating the Complicated New Landscape.”

Debby Herbenick is one of the foremost researchers on American sexual behavior. The director of the Center for Sexual Health Promotion at Indiana University and the author of the pointedly titled book “Yes, Your Kid,” she usually shares her data, no matter how explicit, without judgment. So I was surprised by how concerned she seemed when we checked in on Zoom recently: “I haven’t often felt so strongly about getting research out there,” she told me. “But this is lifesaving.”

For the past four years, Dr. Herbenick has been tracking the rapid rise of “rough sex” among college students, particularly sexual strangulation, or what is colloquially referred to as choking. Nearly two-thirds of women in her most recent campus-representative survey of 5,000 students at an anonymized “major Midwestern university” said a partner had choked them during sex (one-third in their most recent encounter). The rate of those women who said they were between the ages 12 and 17 the first time that happened had shot up to 40 percent from one in four.

As someone who’s been writing for well over a decade about young people’s attitudes and early experience with sex in all its forms, I’d also begun clocking this phenomenon. I was initially startled in early 2020 when, during a post-talk Q. and A. at an independent high school, a 16-year-old girl asked, “How come boys all want to choke you?” In a different class, a 15-year-old boy wanted to know, “Why do girls all want to be choked?” They do? Not long after, a college sophomore (and longtime interview subject) contacted me after her roommate came home in tears because a hookup partner, without warning, had put both hands on her throat and squeezed.

I started to ask more, and the stories piled up. Another sophomore confided that she enjoyed being choked by her boyfriend, though it was important for a partner to be “properly educated” — pressing on the sides of the neck, for example, rather than the trachea. (Note: There is no safe way to strangle someone.) A male freshman said “girls expected” to be choked and, even though he didn’t want to do it, refusing would make him seem like a “simp.” And a senior in high school was angry that her friends called her “vanilla” when she complained that her boyfriend had choked her.

Sexual strangulation, nearly always of women in heterosexual pornography, has long been a staple on free sites, those default sources of sex ed for teens . As with anything else, repeat exposure can render the once appalling appealing. It’s not uncommon for behaviors to be normalized in porn, move within a few years to mainstream media, then, in what may become a feedback loop, be adopted in the bedroom or the dorm room.

Choking, Dr. Herbenick said, seems to have made that first leap in a 2008 episode of Showtime’s “Californication,” where it was still depicted as outré, then accelerated after the success of “Fifty Shades of Grey.” By 2019, when a high school girl was choked in the pilot of HBO’s “Euphoria,” it was standard fare. A young woman was choked in the opener of “The Idol” (again on HBO and also, like “Euphoria,” created by Sam Levinson; what’s with him ?). Ali Wong plays the proclivity for laughs in a Netflix special, and it’s a punchline in Tina Fey’s new “Mean Girls.” The chorus of Jack Harlow’s “Lovin On Me,” which topped Billboard’s Hot 100 chart for six nonconsecutive weeks this winter and has been viewed over 99 million times on YouTube, starts with, “I’m vanilla, baby, I’ll choke you, but I ain’t no killer, baby.” How-to articles abound on the internet, and social media algorithms feed young people (but typically not their unsuspecting parents) hundreds of #chokemedaddy memes along with memes that mock — even celebrate — the potential for hurting or killing female partners.

I’m not here to kink-shame (or anything-shame). And, anyway, many experienced BDSM practitioners discourage choking, believing it to be too dangerous. There are still relatively few studies on the subject, and most have been done by Dr. Herbenick and her colleagues. Reports among adolescents are now trickling out from the United Kingdom , Australia , Iceland , New Zealand and Italy .

Twenty years ago, sexual asphyxiation appears to have been unusual among any demographic, let alone young people who were new to sex and iffy at communication. That’s changed radically in a short time, with health consequences that parents, educators, medical professionals, sexual consent advocates and teens themselves urgently need to understand.

Sexual trends can spread quickly on campus and, to an extent, in every direction. But, at least among straight kids, I’ve sometimes noticed a pattern: Those that involve basic physical gratification — like receiving oral sex in hookups — tend to favor men. Those that might entail pain or submission, like choking, are generally more for women.

So, while undergrads of all genders and sexualities in Dr. Herbenick’s surveys report both choking and being choked, straight and bisexual young women are far more likely to have been the subjects of the behavior; the gap widens with greater occurrences. (In a separate study , Dr. Herbenick and her colleagues found the behavior repeated across the United States, particularly for adults under 40, and not just among college students.) Alcohol may well be involved, and while the act is often engaged in with a steady partner, a quarter of young women said partners they’d had sex with on the day they’d met also choked them.

Either way, most say that their partners never or only sometimes asked before grabbing their necks. For many, there had been moments when they couldn’t breathe or speak, compromising the ability to withdraw consent, if they’d given it. No wonder that, in a separate study by Dr. Herbenick, choking was among the most frequently listed sex acts young women said had scared them, reporting that it sometimes made them worry whether they’d survive.

Among girls and women I’ve spoken with, many did not want or like to be sexually strangled, though in an otherwise desired encounter they didn’t name it as assault . Still, a sizable number were enthusiastic; they requested it. It is exciting to feel so vulnerable, a college junior explained. The power dynamic turns her on; oxygen deprivation to the brain can trigger euphoria.

That same young woman, incidentally, had never climaxed with a partner: While the prevalence of choking has skyrocketed, rates of orgasm among young women have not increased, nor has the “orgasm gap” disappeared among heterosexual couples. “It indicates they’re not doing other things to enhance female arousal or pleasure,” Dr. Herbenick said.

When, for instance, she asked one male student who said he choked his partner whether he’d ever tried using a vibrator instead, he recoiled. “Why would I do that?” he asked.

Perhaps, she responded, because it would be more likely to produce orgasm without risking, you know, death.

In my interviews, college students have seen male orgasm as a given; women’s is nice if it happens, but certainly not expected or necessarily prioritized (by either partner). It makes sense, then, that fulfillment would be less the motivator for choking than appearing adventurous or kinky. Such performances don’t always feel good.

“Personally, my hypothesis is that this is one of the reasons young people are delaying or having less sex,” Dr. Herbenick said. “Because it’s uncomfortable and weird and scary. At times some of them literally think someone is assaulting them but they don’t know. Those are the only sexual experiences for some people. And it’s not just once they’ve gotten naked. They’ll say things like, ‘I’ve only tried to make out with someone once because he started choking and hitting me.’”

Keisuke Kawata, a neuroscientist at Indiana University’s School of Public Health, was one of the first researchers to sound the alarm on how the cumulative, seemingly inconsequential, sub-concussive hits football players sustain (as opposed to the occasional hard blow) were key to triggering C.T.E., the degenerative brain disease. He’s a good judge of serious threats to the brain. In response to Dr. Herbenick’s work, he’s turning his attention to sexual strangulation. “I see a similarity” to C.T.E., he told me, “though the mechanism of injury is very different.” In this case, it is oxygen-blocking pressure to the throat, frequently in light, repeated bursts of a few seconds each.

Strangulation — sexual or otherwise — often leaves few visible marks and can be easily overlooked as a cause of death. Those whose experiences are nonlethal rarely seek medical attention, because any injuries seem minor: Young women Dr. Herbenick studied mostly reported lightheadedness, headaches, neck pain, temporary loss of coordination and ear ringing. The symptoms resolve, and all seems well. But, as with those N.F.L. players, the true effects are silent, potentially not showing up for days, weeks, even years.

According to the American Academy of Neurology, restricting blood flow to the brain, even briefly, can cause permanent injury, including stroke and cognitive impairment. In M.R.I.s conducted by Dr. Kawata and his colleagues (including Dr. Herbenick, who is a co-author of his papers on strangulation), undergraduate women who have been repeatedly choked show a reduction in cortical folding in the brain compared with a never-choked control group. They also showed widespread cortical thickening, an inflammation response that is associated with elevated risk of later-onset mental illness. In completing simple memory tasks, their brains had to work far harder than the control group, recruiting from more regions to achieve the same level of accuracy.

The hemispheres in the choked group’s brains, too, were badly skewed, with the right side hyperactive and the left underperforming. A similar imbalance is associated with mood disorders — and indeed in Dr. Herbenick’s surveys girls and women who had been choked were more likely than others (or choked men) to have experienced overwhelming anxiety, as well as sadness and loneliness, with the effect more pronounced as the incidence rose: Women who had experienced more than five instances of choking were two and a half times as likely as those who had never been choked to say they had been so depressed within the previous 30 days they couldn’t function. Whether girls and women with mental health challenges are more likely to seek out (or be subjected to) choking, choking causes mood disorders, or some combination of the two is still unclear. But hypoxia, or oxygen deprivation — judging by what research has shown about other types of traumatic brain injury — could be a contributing factor. Given the soaring rates of depression and anxiety among young women, that warrants concern.

Now consider that every year Dr. Herbenick has done her survey, the number of females reporting extreme effects from strangulation (neck swelling, loss of consciousness, losing control of urinary function) has crept up. Among those who’ve been choked, the rate of becoming what students call “cloudy” — close to passing out, but not crossing the line — is now one in five, a huge proportion. All of this indicates partners are pressing on necks longer and harder.

The physical, cognitive and psychological impacts of sexual choking are disturbing. So is the idea that at a time when women’s social, economic, educational and political power are in ascent (even if some of those rights may be in jeopardy), when #MeToo has made progress against harassment and assault, there has been the popularization of a sex act that can damage our brains, impair intellectual functioning, undermine mental health, even kill us. Nonfatal strangulation, one of the most significant indicators that a man will murder his female partner (strangulation is also one of the most common methods used for doing so), has somehow been eroticized and made consensual, at least consensual enough. Yet, the outcomes are largely the same: Women’s brains and bodies don’t distinguish whether they are being harmed out of hate or out of love.

By now I’m guessing that parents are curled under their chairs in a fetal position. Or perhaps thinking, “No, not my kid!” (see: title of Dr. Herbenick’s book above, which, by the way, contains an entire chapter on how to talk to your teen about “rough sex”).

I get it. It’s scary stuff. Dr. Herbenick is worried; I am, too. And we are hardly some anti-sex, wait-till-marriage crusaders. But I don’t think our only option is to wring our hands over what young people are doing.

Parents should take a beat and consider how they might give their children relevant information in a way that they can hear it. Maybe reiterate that they want them to have a pleasurable sex life — you have already said that, right? — and also want them to be safe. Tell them that misinformation about certain practices, including choking, is rampant, that in reality it has grave health consequences. Plus, whether or not a partner initially requested it, if things go wrong, you’re generally criminally on the hook.

Dr. Herbenick suggests reminding them that there are other, lower-risk ways to be exploratory or adventurous if that is what they are after, but it would be wisest to delay any “rough sex” until they are older and more skilled at communicating. She offers language when negotiating with a new partner, such as, “By the way, I’m not comfortable with” — choking, or other escalating behaviors such as name-calling, spitting and genital slapping — “so please don’t do it/don’t ask me to do it to you.” They could also add what they are into and want to do together.

I’d like to point high school health teachers to evidence-based porn literacy curricula, but I realize that incorporating such lessons into their classrooms could cost them their jobs. Shafia Zaloom, a lecturer at the Harvard Graduate School of Education, recommends, if that’s the case, grounding discussions in mainstream and social media. There are plenty of opportunities. “You can use it to deconstruct gender norms, power dynamics in relationships, ‘performative’ trends that don’t represent most people’s healthy behaviors,” she said, “especially depictions of people putting pressure on someone’s neck or chest.”

I also know that pediatricians, like other adults, struggle when talking to adolescents about sex (the typical conversation, if it happens, lasts 40 seconds). Then again, they already caution younger children to use a helmet when they ride a bike (because heads and necks are delicate!); they can mention that teens might hear about things people do in sexual situations, including choking, then explain the impact on brain health and why such behavior is best avoided. They should emphasize that if, for any reason — a fall, a sports mishap or anything else — a young person develops symptoms of head trauma, they should come in immediately, no judgment, for help in healing.

The role and responsibility of the entertainment industry is a tangled knot: Media reflects behavior but also drives it, either expanding possibilities or increasing risks. There is precedent for accountability. The European Union now requires age verification on the world’s largest porn sites (in ways that preserve user privacy, whatever that means on the internet); that discussion, unsurprisingly, had been politicized here. Social media platforms have already been pushed to ban content promoting eating disorders, self-harm and suicide — they should likewise be pressured to ban content promoting choking. Traditional formats can stop glamorizing strangulation, making light of it, spreading false information, using it to signal female characters’ complexity or sexual awakening. Young people’s sexual scripts are shaped by what they watch, scroll by and listen to — unprecedentedly so. They deserve, and desperately need, models of interactions that are respectful, communicative, mutual and, at the very least, safe.

Peggy Orenstein is the author of “Boys & Sex: Young Men on Hookups, Love, Porn, Consent and Navigating the New Masculinity” and “Girls & Sex: Navigating the Complicated New Landscape.”

The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips . And here’s our email: [email protected] .

Follow the New York Times Opinion section on Facebook , Instagram , TikTok , WhatsApp , X and Threads .

An earlier version of this article misstated the network on which “Californication” first appeared. It is Showtime, not HBO. The article also misspelled a book and film title. It is “Fifty Shades of Grey,” not “Fifty Shades of Gray.”

How we handle corrections

IMAGES

  1. 🏷️ Formulation of hypothesis in research. How to Write a Strong

    verification of hypothesis in research

  2. SOLUTION: How to write research hypothesis

    verification of hypothesis in research

  3. Research Hypothesis: Definition, Types, Examples and Quick Tips

    verification of hypothesis in research

  4. Hypothesis Testing- Meaning, Types & Steps

    verification of hypothesis in research

  5. 😍 How to formulate a hypothesis in research. How to Formulate

    verification of hypothesis in research

  6. Research Hypothesis: Definition, Types, Examples and Quick Tips

    verification of hypothesis in research

VIDEO

  1. Verification of the hypothesis in Fascial Manipulation Method

  2. Alternate hypothesis

  3. How To Formulate The Hypothesis/What is Hypothesis?

  4. Null Hypothesis vs Alternate Hypothesis

  5. Difference between null and alternative hypothesis |research methodology in tamil #sscomputerstudies

  6. Hypothesis

COMMENTS

  1. Hypothesis Testing

    Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).

  2. What is Hypothesis Testing? Types and Methods

    Hypothesis Testing. Hypothesis testing is the act of testing a hypothesis or a supposition in relation to a statistical parameter. Analysts implement hypothesis testing in order to test if a hypothesis is plausible or not. In data science and statistics, hypothesis testing is an important step as it involves the verification of an assumption ...

  3. An Introduction to Statistics: Understanding Hypothesis Testing and

    HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...

  4. Verification Strategies for Establishing Reliability and Validity in

    Refocusing the qualitative research process to verification strategies is not without profound implications. It will, for example, enhance researcher's responsiveness to data and constantly remind researchers to be proactive, and take responsibility for rigor. 6 Student projects, although necessarily smaller in scope, must also be responsive to ...

  5. Scientific Hypotheses: Writing, Promoting, and Predicting Implications

    A snapshot analysis of citation activity of hypothesis articles may reveal interest of the global scientific community towards their implications across various disciplines and countries. As a prime example, Strachan's hygiene hypothesis, published in 1989,10 is still attracting numerous citations on Scopus, the largest bibliographic database ...

  6. Statistical Tests: Verifying Hypotheses

    Abstract. Statistical tests based on hypotheses are used to statistically verify or disprove, at a certain level of significance, models of populations and their probability distributions. The null and alternative hypothesis are the corner-stones of each such verification, and go hand-in-hand with the possibility of inference errors; these are ...

  7. Research Hypothesis: Definition, Types, Examples and Quick Tips

    3. Simple hypothesis. A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, "Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking. 4.

  8. Hypothesis

    In other words, research questions can be expected to lead to hypothesis generation. In many cases, research questions guide us as to how we collect data and what we choose as methods for hypothesis generation and verification. Each type of basic research question is associated with candidate methods for hypothesis generation. For example ...

  9. How to Write a Strong Hypothesis

    Step 5: Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  10. Research Hypothesis In Psychology: Types, & Examples

    A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  11. (PDF) FORMULATING AND TESTING HYPOTHESIS

    The researcher states a hypothesis to be tested, formulates an analysis plan, analyzes sample data. according to the plan, and accepts or rejects the null hypothesis, based on r esults of the ...

  12. What Is A Research Hypothesis? A Simple Definition

    A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.

  13. Formulating Hypotheses for Different Study Designs

    Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...

  14. Verification: The use of empirical data, observation, test, or

    Verification: The use of empirical data, observation, test, or experiment to confirm the truth or rational justification of a hypothesis. Scientific beliefs must be evaluated and supported by empirical data. ... have attempted to offer a more historically adequate account of scientific belief based on a theory of "scientific research ...

  15. Scientific hypothesis

    scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world.The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ability to be supported or refuted through observation and experimentation.

  16. Sequential Experiment Design for Hypothesis Verification

    This research was supported, in part, by National Science Foundation under Grant NSF CNS-1213128, CCF-1410009, CPS-1446901, Grant ONR N00014-15-1-2550, and Grant AFOSR FA9550-12-1-0215. high, the verification cost dominates the performance. In this ... For a given hypothesis h, we refer to the problem of max- ...

  17. Developing a Hypothesis

    The second way is to describe one or more existing theories, derive a hypothesis from one of those theories, test the hypothesis in a new study, and finally reevaluate the theory. This format works well when there is an existing theory that addresses the research question—especially if the resulting hypothesis is surprising or conflicts with ...

  18. Hypothesis-generating and confirmatory studies, Bonferroni correction

    Many null hypotheses are tested in order to generate study hypotheses for further research, others to confirm an already established study hypothesis. ... A serious editorial evaluation of manuscripts presenting confirmatory results should always include a verification of the endpoint's pre-specification. Hypothesis-generating studies are ...

  19. The Role of Hypotheses in Research Studies: A Simple Guide

    Essentially, a hypothesis is a tentative statement that predicts the relationship between two or more variables in a research study. It is usually derived from a theoretical framework or previous ...

  20. PDF Verification Reports

    Verification Reports - Guide for Authors Verification Reports are a form of empirical article in which authors evaluate the claims in published research through reanalysis of the original study data. The purpose of this ... conclusions? (b) in the case of hypothesis-driven research, will the re-analysis test the ...

  21. Research hypothesis verification.

    Table 4 shows the research hypothesis verification results of this study. The results of the all sample show that all hypotheses, with the exception of H1, H2, and H10, are significant ...

  22. Hypothesis-Testing Demands Trustworthy Data—A Simulation Approach to

    The research program strategy (RPS) With the construction of empirical theories as its main aim, RPS distinguishes the discovery context from the justification context (Reichenbach, 1938).The discovery context readily lets us induce a data-subsuming hypothesis without requiring reference to a theoretical construct. Rather, discerning a non-random data-pattern, as per p(D,H 0) < α ≤ 0.05 ...

  23. Water

    This paper presents the verification results of an experimental site that employed a particle tracking algorithm to assess the transport of tracers through the composite formation of gravel and Cholan in northwest Taiwan. A suitable hydrogeological conceptual model that describes the flow characteristics of the gravel formation and Cholan formation is essential to evaluate groundwater flow and ...

  24. Opinion

    "Personally, my hypothesis is that this is one of the reasons young people are delaying or having less sex," Dr. Herbenick said. "Because it's uncomfortable and weird and scary.