Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Qualitative vs. Quantitative Research | Differences, Examples & Methods

Qualitative vs. Quantitative Research | Differences, Examples & Methods

Published on April 12, 2019 by Raimo Streefkerk . Revised on June 22, 2023.

When collecting and analyzing data, quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings. Both are important for gaining different kinds of knowledge.

Common quantitative methods include experiments, observations recorded as numbers, and surveys with closed-ended questions.

Quantitative research is at risk for research biases including information bias , omitted variable bias , sampling bias , or selection bias . Qualitative research Qualitative research is expressed in words . It is used to understand concepts, thoughts or experiences. This type of research enables you to gather in-depth insights on topics that are not well understood.

Common qualitative methods include interviews with open-ended questions, observations described in words, and literature reviews that explore concepts and theories.

Table of contents

The differences between quantitative and qualitative research, data collection methods, when to use qualitative vs. quantitative research, how to analyze qualitative and quantitative data, other interesting articles, frequently asked questions about qualitative and quantitative research.

Quantitative and qualitative research use different research methods to collect and analyze data, and they allow you to answer different kinds of research questions.

Qualitative vs. quantitative research

Quantitative and qualitative data can be collected using various methods. It is important to use a data collection method that will help answer your research question(s).

Many data collection methods can be either qualitative or quantitative. For example, in surveys, observational studies or case studies , your data can be represented as numbers (e.g., using rating scales or counting frequencies) or as words (e.g., with open-ended questions or descriptions of what you observe).

However, some methods are more commonly used in one type or the other.

Quantitative data collection methods

  • Surveys :  List of closed or multiple choice questions that is distributed to a sample (online, in person, or over the phone).
  • Experiments : Situation in which different types of variables are controlled and manipulated to establish cause-and-effect relationships.
  • Observations : Observing subjects in a natural environment where variables can’t be controlled.

Qualitative data collection methods

  • Interviews : Asking open-ended questions verbally to respondents.
  • Focus groups : Discussion among a group of people about a topic to gather opinions that can be used for further research.
  • Ethnography : Participating in a community or organization for an extended period of time to closely observe culture and behavior.
  • Literature review : Survey of published works by other authors.

A rule of thumb for deciding whether to use qualitative or quantitative data is:

  • Use quantitative research if you want to confirm or test something (a theory or hypothesis )
  • Use qualitative research if you want to understand something (concepts, thoughts, experiences)

For most research topics you can choose a qualitative, quantitative or mixed methods approach . Which type you choose depends on, among other things, whether you’re taking an inductive vs. deductive research approach ; your research question(s) ; whether you’re doing experimental , correlational , or descriptive research ; and practical considerations such as time, money, availability of data, and access to respondents.

Quantitative research approach

You survey 300 students at your university and ask them questions such as: “on a scale from 1-5, how satisfied are your with your professors?”

You can perform statistical analysis on the data and draw conclusions such as: “on average students rated their professors 4.4”.

Qualitative research approach

You conduct in-depth interviews with 15 students and ask them open-ended questions such as: “How satisfied are you with your studies?”, “What is the most positive aspect of your study program?” and “What can be done to improve the study program?”

Based on the answers you get you can ask follow-up questions to clarify things. You transcribe all interviews using transcription software and try to find commonalities and patterns.

Mixed methods approach

You conduct interviews to find out how satisfied students are with their studies. Through open-ended questions you learn things you never thought about before and gain new insights. Later, you use a survey to test these insights on a larger scale.

It’s also possible to start with a survey to find out the overall trends, followed by interviews to better understand the reasons behind the trends.

Qualitative or quantitative data by itself can’t prove or demonstrate anything, but has to be analyzed to show its meaning in relation to the research questions. The method of analysis differs for each type of data.

Analyzing quantitative data

Quantitative data is based on numbers. Simple math or more advanced statistical analysis is used to discover commonalities or patterns in the data. The results are often reported in graphs and tables.

Applications such as Excel, SPSS, or R can be used to calculate things like:

  • Average scores ( means )
  • The number of times a particular answer was given
  • The correlation or causation between two or more variables
  • The reliability and validity of the results

Analyzing qualitative data

Qualitative data is more difficult to analyze than quantitative data. It consists of text, images or videos instead of numbers.

Some common approaches to analyzing qualitative data include:

  • Qualitative content analysis : Tracking the occurrence, position and meaning of words or phrases
  • Thematic analysis : Closely examining the data to identify the main themes and patterns
  • Discourse analysis : Studying how communication works in social contexts

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

A research project is an academic, scientific, or professional undertaking to answer a research question . Research projects can take many forms, such as qualitative or quantitative , descriptive , longitudinal , experimental , or correlational . What kind of research approach you choose will depend on your topic.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Streefkerk, R. (2023, June 22). Qualitative vs. Quantitative Research | Differences, Examples & Methods. Scribbr. Retrieved March 29, 2024, from https://www.scribbr.com/methodology/qualitative-quantitative-research/

Is this article helpful?

Raimo Streefkerk

Raimo Streefkerk

Other students also liked, what is quantitative research | definition, uses & methods, what is qualitative research | methods & examples, mixed methods research | definition, guide & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Qualitative vs Quantitative Research Methods & Data Analysis

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

What is the difference between quantitative and qualitative?

The main difference between quantitative and qualitative research is the type of data they collect and analyze.

Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms. Quantitative research is often used to test hypotheses, identify patterns, and make predictions.

Qualitative research, on the other hand, collects non-numerical data such as words, images, and sounds. The focus is on exploring subjective experiences, opinions, and attitudes, often through observation and interviews.

Qualitative research aims to produce rich and detailed descriptions of the phenomenon being studied, and to uncover new insights and meanings.

Quantitative data is information about quantities, and therefore numbers, and qualitative data is descriptive, and regards phenomenon which can be observed but not measured, such as language.

What Is Qualitative Research?

Qualitative research is the process of collecting, analyzing, and interpreting non-numerical data, such as language. Qualitative research can be used to understand how an individual subjectively perceives and gives meaning to their social reality.

Qualitative data is non-numerical data, such as text, video, photographs, or audio recordings. This type of data can be collected using diary accounts or in-depth interviews and analyzed using grounded theory or thematic analysis.

Qualitative research is multimethod in focus, involving an interpretive, naturalistic approach to its subject matter. This means that qualitative researchers study things in their natural settings, attempting to make sense of, or interpret, phenomena in terms of the meanings people bring to them. Denzin and Lincoln (1994, p. 2)

Interest in qualitative data came about as the result of the dissatisfaction of some psychologists (e.g., Carl Rogers) with the scientific study of psychologists such as behaviorists (e.g., Skinner ).

Since psychologists study people, the traditional approach to science is not seen as an appropriate way of carrying out research since it fails to capture the totality of human experience and the essence of being human.  Exploring participants’ experiences is known as a phenomenological approach (re: Humanism ).

Qualitative research is primarily concerned with meaning, subjectivity, and lived experience. The goal is to understand the quality and texture of people’s experiences, how they make sense of them, and the implications for their lives.

Qualitative research aims to understand the social reality of individuals, groups, and cultures as nearly as possible as participants feel or live it. Thus, people and groups are studied in their natural setting.

Some examples of qualitative research questions are provided, such as what an experience feels like, how people talk about something, how they make sense of an experience, and how events unfold for people.

Research following a qualitative approach is exploratory and seeks to explain ‘how’ and ‘why’ a particular phenomenon, or behavior, operates as it does in a particular context. It can be used to generate hypotheses and theories from the data.

Qualitative Methods

There are different types of qualitative research methods, including diary accounts, in-depth interviews , documents, focus groups , case study research , and ethnography.

The results of qualitative methods provide a deep understanding of how people perceive their social realities and in consequence, how they act within the social world.

The researcher has several methods for collecting empirical materials, ranging from the interview to direct observation, to the analysis of artifacts, documents, and cultural records, to the use of visual materials or personal experience. Denzin and Lincoln (1994, p. 14)

Here are some examples of qualitative data:

Interview transcripts : Verbatim records of what participants said during an interview or focus group. They allow researchers to identify common themes and patterns, and draw conclusions based on the data. Interview transcripts can also be useful in providing direct quotes and examples to support research findings.

Observations : The researcher typically takes detailed notes on what they observe, including any contextual information, nonverbal cues, or other relevant details. The resulting observational data can be analyzed to gain insights into social phenomena, such as human behavior, social interactions, and cultural practices.

Unstructured interviews : generate qualitative data through the use of open questions.  This allows the respondent to talk in some depth, choosing their own words.  This helps the researcher develop a real sense of a person’s understanding of a situation.

Diaries or journals : Written accounts of personal experiences or reflections.

Notice that qualitative data could be much more than just words or text. Photographs, videos, sound recordings, and so on, can be considered qualitative data. Visual data can be used to understand behaviors, environments, and social interactions.

Qualitative Data Analysis

Qualitative research is endlessly creative and interpretive. The researcher does not just leave the field with mountains of empirical data and then easily write up his or her findings.

Qualitative interpretations are constructed, and various techniques can be used to make sense of the data, such as content analysis, grounded theory (Glaser & Strauss, 1967), thematic analysis (Braun & Clarke, 2006), or discourse analysis.

For example, thematic analysis is a qualitative approach that involves identifying implicit or explicit ideas within the data. Themes will often emerge once the data has been coded.

RESEARCH THEMATICANALYSISMETHOD

Key Features

  • Events can be understood adequately only if they are seen in context. Therefore, a qualitative researcher immerses her/himself in the field, in natural surroundings. The contexts of inquiry are not contrived; they are natural. Nothing is predefined or taken for granted.
  • Qualitative researchers want those who are studied to speak for themselves, to provide their perspectives in words and other actions. Therefore, qualitative research is an interactive process in which the persons studied teach the researcher about their lives.
  • The qualitative researcher is an integral part of the data; without the active participation of the researcher, no data exists.
  • The study’s design evolves during the research and can be adjusted or changed as it progresses. For the qualitative researcher, there is no single reality. It is subjective and exists only in reference to the observer.
  • The theory is data-driven and emerges as part of the research process, evolving from the data as they are collected.

Limitations of Qualitative Research

  • Because of the time and costs involved, qualitative designs do not generally draw samples from large-scale data sets.
  • The problem of adequate validity or reliability is a major criticism. Because of the subjective nature of qualitative data and its origin in single contexts, it is difficult to apply conventional standards of reliability and validity. For example, because of the central role played by the researcher in the generation of data, it is not possible to replicate qualitative studies.
  • Also, contexts, situations, events, conditions, and interactions cannot be replicated to any extent, nor can generalizations be made to a wider context than the one studied with confidence.
  • The time required for data collection, analysis, and interpretation is lengthy. Analysis of qualitative data is difficult, and expert knowledge of an area is necessary to interpret qualitative data. Great care must be taken when doing so, for example, looking for mental illness symptoms.

Advantages of Qualitative Research

  • Because of close researcher involvement, the researcher gains an insider’s view of the field. This allows the researcher to find issues that are often missed (such as subtleties and complexities) by the scientific, more positivistic inquiries.
  • Qualitative descriptions can be important in suggesting possible relationships, causes, effects, and dynamic processes.
  • Qualitative analysis allows for ambiguities/contradictions in the data, which reflect social reality (Denscombe, 2010).
  • Qualitative research uses a descriptive, narrative style; this research might be of particular benefit to the practitioner as she or he could turn to qualitative reports to examine forms of knowledge that might otherwise be unavailable, thereby gaining new insight.

What Is Quantitative Research?

Quantitative research involves the process of objectively collecting and analyzing numerical data to describe, predict, or control variables of interest.

The goals of quantitative research are to test causal relationships between variables , make predictions, and generalize results to wider populations.

Quantitative researchers aim to establish general laws of behavior and phenomenon across different settings/contexts. Research is used to test a theory and ultimately support or reject it.

Quantitative Methods

Experiments typically yield quantitative data, as they are concerned with measuring things.  However, other research methods, such as controlled observations and questionnaires , can produce both quantitative information.

For example, a rating scale or closed questions on a questionnaire would generate quantitative data as these produce either numerical data or data that can be put into categories (e.g., “yes,” “no” answers).

Experimental methods limit how research participants react to and express appropriate social behavior.

Findings are, therefore, likely to be context-bound and simply a reflection of the assumptions that the researcher brings to the investigation.

There are numerous examples of quantitative data in psychological research, including mental health. Here are a few examples:

Another example is the Experience in Close Relationships Scale (ECR), a self-report questionnaire widely used to assess adult attachment styles .

The ECR provides quantitative data that can be used to assess attachment styles and predict relationship outcomes.

Neuroimaging data : Neuroimaging techniques, such as MRI and fMRI, provide quantitative data on brain structure and function.

This data can be analyzed to identify brain regions involved in specific mental processes or disorders.

For example, the Beck Depression Inventory (BDI) is a clinician-administered questionnaire widely used to assess the severity of depressive symptoms in individuals.

The BDI consists of 21 questions, each scored on a scale of 0 to 3, with higher scores indicating more severe depressive symptoms. 

Quantitative Data Analysis

Statistics help us turn quantitative data into useful information to help with decision-making. We can use statistics to summarize our data, describing patterns, relationships, and connections. Statistics can be descriptive or inferential.

Descriptive statistics help us to summarize our data. In contrast, inferential statistics are used to identify statistically significant differences between groups of data (such as intervention and control groups in a randomized control study).

  • Quantitative researchers try to control extraneous variables by conducting their studies in the lab.
  • The research aims for objectivity (i.e., without bias) and is separated from the data.
  • The design of the study is determined before it begins.
  • For the quantitative researcher, the reality is objective, exists separately from the researcher, and can be seen by anyone.
  • Research is used to test a theory and ultimately support or reject it.

Limitations of Quantitative Research

  • Context: Quantitative experiments do not take place in natural settings. In addition, they do not allow participants to explain their choices or the meaning of the questions they may have for those participants (Carr, 1994).
  • Researcher expertise: Poor knowledge of the application of statistical analysis may negatively affect analysis and subsequent interpretation (Black, 1999).
  • Variability of data quantity: Large sample sizes are needed for more accurate analysis. Small-scale quantitative studies may be less reliable because of the low quantity of data (Denscombe, 2010). This also affects the ability to generalize study findings to wider populations.
  • Confirmation bias: The researcher might miss observing phenomena because of focus on theory or hypothesis testing rather than on the theory of hypothesis generation.

Advantages of Quantitative Research

  • Scientific objectivity: Quantitative data can be interpreted with statistical analysis, and since statistics are based on the principles of mathematics, the quantitative approach is viewed as scientifically objective and rational (Carr, 1994; Denscombe, 2010).
  • Useful for testing and validating already constructed theories.
  • Rapid analysis: Sophisticated software removes much of the need for prolonged data analysis, especially with large volumes of data involved (Antonius, 2003).
  • Replication: Quantitative data is based on measured values and can be checked by others because numerical data is less open to ambiguities of interpretation.
  • Hypotheses can also be tested because of statistical analysis (Antonius, 2003).

Antonius, R. (2003). Interpreting quantitative data with SPSS . Sage.

Black, T. R. (1999). Doing quantitative research in the social sciences: An integrated approach to research design, measurement and statistics . Sage.

Braun, V. & Clarke, V. (2006). Using thematic analysis in psychology . Qualitative Research in Psychology , 3, 77–101.

Carr, L. T. (1994). The strengths and weaknesses of quantitative and qualitative research : what method for nursing? Journal of advanced nursing, 20(4) , 716-721.

Denscombe, M. (2010). The Good Research Guide: for small-scale social research. McGraw Hill.

Denzin, N., & Lincoln. Y. (1994). Handbook of Qualitative Research. Thousand Oaks, CA, US: Sage Publications Inc.

Glaser, B. G., Strauss, A. L., & Strutzel, E. (1968). The discovery of grounded theory; strategies for qualitative research. Nursing research, 17(4) , 364.

Minichiello, V. (1990). In-Depth Interviewing: Researching People. Longman Cheshire.

Punch, K. (1998). Introduction to Social Research: Quantitative and Qualitative Approaches. London: Sage

Further Information

  • Designing qualitative research
  • Methods of data collection and analysis
  • Introduction to quantitative and qualitative research
  • Checklists for improving rigour in qualitative research: a case of the tail wagging the dog?
  • Qualitative research in health care: Analysing qualitative data
  • Qualitative data analysis: the framework approach
  • Using the framework method for the analysis of
  • Qualitative data in multi-disciplinary health research
  • Content Analysis
  • Grounded Theory
  • Thematic Analysis

Print Friendly, PDF & Email

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Qualitative vs Quantitative Research | Examples & Methods

Qualitative vs Quantitative Research | Examples & Methods

Published on 4 April 2022 by Raimo Streefkerk . Revised on 8 May 2023.

When collecting and analysing data, quantitative research deals with numbers and statistics, while qualitative research  deals with words and meanings. Both are important for gaining different kinds of knowledge.

Common quantitative methods include experiments, observations recorded as numbers, and surveys with closed-ended questions. Qualitative research Qualitative research is expressed in words . It is used to understand concepts, thoughts or experiences. This type of research enables you to gather in-depth insights on topics that are not well understood.

Table of contents

The differences between quantitative and qualitative research, data collection methods, when to use qualitative vs quantitative research, how to analyse qualitative and quantitative data, frequently asked questions about qualitative and quantitative research.

Quantitative and qualitative research use different research methods to collect and analyse data, and they allow you to answer different kinds of research questions.

Qualitative vs quantitative research

Prevent plagiarism, run a free check.

Quantitative and qualitative data can be collected using various methods. It is important to use a data collection method that will help answer your research question(s).

Many data collection methods can be either qualitative or quantitative. For example, in surveys, observations or case studies , your data can be represented as numbers (e.g. using rating scales or counting frequencies) or as words (e.g. with open-ended questions or descriptions of what you observe).

However, some methods are more commonly used in one type or the other.

Quantitative data collection methods

  • Surveys :  List of closed or multiple choice questions that is distributed to a sample (online, in person, or over the phone).
  • Experiments : Situation in which variables are controlled and manipulated to establish cause-and-effect relationships.
  • Observations: Observing subjects in a natural environment where variables can’t be controlled.

Qualitative data collection methods

  • Interviews : Asking open-ended questions verbally to respondents.
  • Focus groups: Discussion among a group of people about a topic to gather opinions that can be used for further research.
  • Ethnography : Participating in a community or organisation for an extended period of time to closely observe culture and behavior.
  • Literature review : Survey of published works by other authors.

A rule of thumb for deciding whether to use qualitative or quantitative data is:

  • Use quantitative research if you want to confirm or test something (a theory or hypothesis)
  • Use qualitative research if you want to understand something (concepts, thoughts, experiences)

For most research topics you can choose a qualitative, quantitative or mixed methods approach . Which type you choose depends on, among other things, whether you’re taking an inductive vs deductive research approach ; your research question(s) ; whether you’re doing experimental , correlational , or descriptive research ; and practical considerations such as time, money, availability of data, and access to respondents.

Quantitative research approach

You survey 300 students at your university and ask them questions such as: ‘on a scale from 1-5, how satisfied are your with your professors?’

You can perform statistical analysis on the data and draw conclusions such as: ‘on average students rated their professors 4.4’.

Qualitative research approach

You conduct in-depth interviews with 15 students and ask them open-ended questions such as: ‘How satisfied are you with your studies?’, ‘What is the most positive aspect of your study program?’ and ‘What can be done to improve the study program?’

Based on the answers you get you can ask follow-up questions to clarify things. You transcribe all interviews using transcription software and try to find commonalities and patterns.

Mixed methods approach

You conduct interviews to find out how satisfied students are with their studies. Through open-ended questions you learn things you never thought about before and gain new insights. Later, you use a survey to test these insights on a larger scale.

It’s also possible to start with a survey to find out the overall trends, followed by interviews to better understand the reasons behind the trends.

Qualitative or quantitative data by itself can’t prove or demonstrate anything, but has to be analysed to show its meaning in relation to the research questions. The method of analysis differs for each type of data.

Analysing quantitative data

Quantitative data is based on numbers. Simple maths or more advanced statistical analysis is used to discover commonalities or patterns in the data. The results are often reported in graphs and tables.

Applications such as Excel, SPSS, or R can be used to calculate things like:

  • Average scores
  • The number of times a particular answer was given
  • The correlation or causation between two or more variables
  • The reliability and validity of the results

Analysing qualitative data

Qualitative data is more difficult to analyse than quantitative data. It consists of text, images or videos instead of numbers.

Some common approaches to analysing qualitative data include:

  • Qualitative content analysis : Tracking the occurrence, position and meaning of words or phrases
  • Thematic analysis : Closely examining the data to identify the main themes and patterns
  • Discourse analysis : Studying how communication works in social contexts

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organisations.

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organise your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Streefkerk, R. (2023, May 08). Qualitative vs Quantitative Research | Examples & Methods. Scribbr. Retrieved 25 March 2024, from https://www.scribbr.co.uk/research-methods/quantitative-qualitative-research/

Is this article helpful?

Raimo Streefkerk

Raimo Streefkerk

Reference management. Clean and simple.

Qualitative vs. quantitative research - what’s the difference?

Qualitative vs. quantitative research - what’s the difference

What is quantitative research?

What is quantitative research used for, how to collect data for quantitative research, what is qualitative research, what is qualitative research used for, how to collect data for qualitative research, when to use which approach, how to analyze qualitative and quantitative research, analyzing quantitative data, analyzing qualitative data, differences between qualitative and quantitative research, frequently asked questions about qualitative vs. quantitative research, related articles.

Both qualitative and quantitative research are valid and effective approaches to study a particular subject. However, it is important to know that these research approaches serve different purposes and provide different results. This guide will help illustrate quantitative and qualitative research, what they are used for, and the difference between them.

Quantitative research focuses on collecting numerical data and using it to measure variables. As such, quantitative research and data are typically expressed in numbers and graphs. Moreover, this type of research is structured and statistical and the returned results are objective.

The simplest way to describe quantitative research is that it answers the questions " what " or " how much ".

To illustrate what quantitative research is used for, let’s look at a simple example. Let’s assume you want to research the reading habits of a specific part of a population.

With this research, you would like to establish what they read. In other words, do they read fiction, non-fiction, magazines, blogs, and so on? Also, you want to establish what they read about. For example, if they read fiction, is it thrillers, romance novels, or period dramas?

With quantitative research, you can gather concrete data about these reading habits. Your research will then, for example, show that 40% of the audience reads fiction and, of that 40%, 60% prefer romance novels.

In other studies and research projects, quantitative research will work in much the same way. That is, you use it to quantify variables, opinions, behaviors, and more.

Now that we've seen what quantitative research is and what it's used for, let's look at how you'll collect data for it. Because quantitative research is structured and statistical, its data collection methods focus on collecting numerical data.

Some methods to collect this data include:

  • Surveys . Surveys are one of the most popular and easiest ways to collect quantitative data. These can include anything from online surveys to paper surveys. It’s important to remember that, to collect quantitative data, you won’t be able to ask open-ended questions.
  • Interviews . As is the case with qualitative data, you’ll be able to use interviews to collect quantitative data with the proviso that the data will not be based on open-ended questions.
  • Observations . You’ll also be able to use observations to collect quantitative data. However, here you’ll need to make observations in an environment where variables can’t be controlled.
  • Website interceptors . With website interceptors, you’ll be able to get real-time insights into a specific product, service, or subject. In most cases, these interceptors take the form of surveys displayed on websites or invitations on the website to complete the survey.
  • Longitudinal studies . With these studies, you’ll gather data on the same variables over specified time periods. Longitudinal studies are often used in medical sciences and include, for instance, diet studies. It’s important to remember that, for the results to be reliable, you’ll have to collect data from the same subjects.
  • Online polls . Similar to website interceptors, online polls allow you to gather data from websites or social media platforms. These polls are short with only a few options and can give you valuable insights into a very specific question or topic.
  • Experiments . With experiments, you’ll manipulate some variables (your independent variables) and gather data on causal relationships between others (your dependent variables). You’ll then measure what effect the manipulation of the independent variables has on the dependent variables.

Qualitative research focuses on collecting and analyzing non-numerical data. As such, it's typically unstructured and non-statistical. The main aim of qualitative research is to get a better understanding and insights into concepts, topics, and subjects.

The easiest way to describe qualitative research is that it answers the question " why ".

Considering that qualitative research aims to provide more profound insights and understanding into specific subjects, we’ll use our example mentioned earlier to explain what qualitative research is used for.

Based on this example, you’ve now established that 40% of the population reads fiction. You’ve probably also discovered in what proportion the population consumes other reading materials.

Qualitative research will now enable you to learn the reasons for these reading habits. For example, it will show you why 40% of the readers prefer fiction, while, for instance, only 10% prefer thrillers. It thus gives you an understanding of your participants’ behaviors and actions.

We've now recapped what qualitative research is and what it's used for. Let's now consider some methods to collect data for this type of research.

Some of these data collection methods include:

  • Interviews . These include one-on-one interviews with respondents where you ask open-ended questions. You’ll then record the answers from every respondent and analyze these answers later.
  • Open-ended survey questions . Open-ended survey questions give you insights into why respondents feel the way they do about a particular aspect.
  • Focus groups . Focus groups allow you to have conversations with small groups of people and record their opinions and views about a specific topic.
  • Observations . Observations like ethnography require that you participate in a specific organization or group in order to record their routines and interactions. This will, for instance, be the case where you want to establish how customers use a product in real-life scenarios.
  • Literature reviews . With literature reviews, you’ll analyze the published works of other authors to analyze the prevailing view regarding a specific subject.
  • Diary studies . Diary studies allow you to collect data about peoples’ habits, activities, and experiences over time. This will, for example, show you how customers use a product, when they use it, and what motivates them.

Now, the immediate question is: When should you use qualitative research, and when should you use quantitative research? As mentioned earlier, in its simplest form:

  • Quantitative research allows you to confirm or test a hypothesis or theory or quantify a specific problem or quality.
  • Qualitative research allows you to understand concepts or experiences.

Let's look at how you'll use these approaches in a research project a bit closer:

  • Formulating a hypothesis . As mentioned earlier, qualitative research gives you a deeper understanding of a topic. Apart from learning more profound insights about your research findings, you can also use it to formulate a hypothesis when you start your research.
  • Confirming a hypothesis . Once you’ve formulated a hypothesis, you can test it with quantitative research. As mentioned, you can also use it to quantify trends and behavior.
  • Finding general answers . Quantitative research can help you answer broad questions. This is because it uses a larger sample size and thus makes it easier to gather simple binary or numeric data on a specific subject.
  • Getting a deeper understanding . Once you have the broad answers mentioned above, qualitative research will help you find reasons for these answers. In other words, quantitative research shows you the motives behind actions or behaviors.

Considering the above, why not consider a mixed approach ? You certainly can because these approaches are not mutually exclusive. In other words, using one does not necessarily exclude the other. Moreover, both these approaches are useful for different reasons.

This means you could use both approaches in one project to achieve different goals. For example, you could use qualitative to formulate a hypothesis. Once formulated, quantitative research will allow you to confirm the hypothesis.

So, to answer the initial question, the approach you use is up to you.  However, when deciding on the right approach, you should consider the specific research project, the data you'll gather, and what you want to achieve.

No matter what approach you choose, you should design your research in such a way that it delivers results that are objective, reliable, and valid.

Both these research approaches are based on data. Once you have this data, however, you need to analyze it to answer your research questions. The method to do this depends on the research approach you use.

To analyze quantitative data, you'll need to use mathematical or statistical analysis. This can involve anything from calculating simple averages to applying complex and advanced methods to calculate the statistical significance of the results. No matter what analysis methods you use, it will enable you to spot trends and patterns in your data.

Considering the above, you can use tools, applications, and programming languages like R to calculate:

  • The average of a set of numbers . This could, for instance, be the case where you calculate the average scores students obtained in a test or the average time people spend on a website.
  • The frequency of a specific response . This will be the case where you, for example, use open-ended survey questions during qualitative analysis. You could then calculate the frequency of a specific response for deeper insights.
  • Any correlation between different variables . Through mathematical analysis, you can calculate whether two or more variables are directly or indirectly correlated. In turn, this could help you identify trends in the data.
  • The statistical significance of your results . By analyzing the data and calculating the statistical significance of the results, you'll be able to see whether certain occurrences happen randomly or because of specific factors.

Analyzing qualitative data is more complex than quantitative data. This is simply because it's not based on numerical values but rather text, images, video, and the like. As such, you won't be able to use mathematical analysis to analyze and interpret your results.

Because of this, it relies on a more interpretive analysis style and a strict analytical framework to analyze data and extract insights from it.

Some of the most common ways to analyze qualitative data include:

  • Qualitative content analysis . In a content analysis, you'll analyze the language used in a specific piece of text. This allows you to understand the intentions of the author, who the audience is, and find patterns and correlations in how different concepts are communicated. A major benefit of this approach is that it follows a systematic and transparent process that other researchers will be able to replicate. As such, your research will produce highly reliable results. Keep in mind, however, that content analysis can be time-intensive and difficult to automate. ➡️  Learn how to do a content analysis in the guide.
  • Thematic analysis . In a thematic analysis, you'll analyze data with a view of extracting themes, topics, and patterns in the data. Although thematic analysis can encompass a range of diverse approaches, it's usually used to analyze a collection of texts like survey responses, focus group discussions, or transcriptions of interviews. One of the main benefits of thematic analysis is that it's flexible in its approach. However, in some cases, thematic analysis can be highly subjective, which, in turn, impacts the reliability of the results. ➡️  Learn how to do a thematic analysis in this guide.
  • Discourse analysis . In a discourse analysis, you'll analyze written or spoken language to understand how language is used in real-life social situations. As such, you'll be able to determine how meaning is given to language in different contexts. This is an especially effective approach if you want to gain a deeper understanding of different social groups and how they communicate with each other. As such, it's commonly used in humanities and social science disciplines.

We’ve now given a broad overview of both qualitative and quantitative research. Based on this, we can summarize the differences between these two approaches as follows:

Qualitative research focuses on collecting and analyzing non-numerical data. As such, it's typically unstructured and non-statistical. The main aim of qualitative research is to get a better understanding and insights into concepts, topics, and subjects. Quantitative research focuses on collecting numerical data and using it to measure variables. As such, quantitative research and data are typically expressed in numbers and graphs. Moreover, this type of research is structured and statistical and the returned results are objective.

3 examples of qualitative research would be:

  • Interviews . These include one-on-one interviews with respondents with open-ended questions. You’ll then record the answers and analyze them later.
  • Observations . Observations require that you participate in a specific organization or group in order to record their routines and interactions.

3 examples of quantitative research include:

  • Surveys . Surveys are one of the most popular and easiest ways to collect quantitative data. To collect quantitative data, you won’t be able to ask open-ended questions.
  • Longitudinal studies . With these studies, you’ll gather data on the same variables over specified time periods. Longitudinal studies are often used in medical sciences.

The main purpose of qualitative research is to get a better understanding and insights into concepts, topics, and subjects. The easiest way to describe qualitative research is that it answers the question " why ".

The purpose of quantitative research is to collect numerical data and use it to measure variables. As such, quantitative research and data are typically expressed in numbers and graphs. The simplest way to describe quantitative research is that it answers the questions " what " or " how much ".

research paper qualitative and quantitative

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Quantitative vs. Qualitative Research in Psychology

Anabelle Bernard Fournier is a researcher of sexual and reproductive health at the University of Victoria as well as a freelance writer on various health topics.

Emily is a board-certified science editor who has worked with top digital publishing brands like Voices for Biodiversity, Study.com, GoodTherapy, Vox, and Verywell.

research paper qualitative and quantitative

  • Key Differences

Quantitative Research Methods

Qualitative research methods.

  • How They Relate

In psychology and other social sciences, researchers are faced with an unresolved question: Can we measure concepts like love or racism the same way we can measure temperature or the weight of a star? Social phenomena⁠—things that happen because of and through human behavior⁠—are especially difficult to grasp with typical scientific models.

At a Glance

Psychologists rely on quantitative and quantitative research to better understand human thought and behavior.

  • Qualitative research involves collecting and evaluating non-numerical data in order to understand concepts or subjective opinions.
  • Quantitative research involves collecting and evaluating numerical data. 

This article discusses what qualitative and quantitative research are, how they are different, and how they are used in psychology research.

Qualitative Research vs. Quantitative Research

In order to understand qualitative and quantitative psychology research, it can be helpful to look at the methods that are used and when each type is most appropriate.

Psychologists rely on a few methods to measure behavior, attitudes, and feelings. These include:

  • Self-reports , like surveys or questionnaires
  • Observation (often used in experiments or fieldwork)
  • Implicit attitude tests that measure timing in responding to prompts

Most of these are quantitative methods. The result is a number that can be used to assess differences between groups.

However, most of these methods are static, inflexible (you can't change a question because a participant doesn't understand it), and provide a "what" answer rather than a "why" answer.

Sometimes, researchers are more interested in the "why" and the "how." That's where qualitative methods come in.

Qualitative research is about speaking to people directly and hearing their words. It is grounded in the philosophy that the social world is ultimately unmeasurable, that no measure is truly ever "objective," and that how humans make meaning is just as important as how much they score on a standardized test.

Used to develop theories

Takes a broad, complex approach

Answers "why" and "how" questions

Explores patterns and themes

Used to test theories

Takes a narrow, specific approach

Answers "what" questions

Explores statistical relationships

Quantitative methods have existed ever since people have been able to count things. But it is only with the positivist philosophy of Auguste Comte (which maintains that factual knowledge obtained by observation is trustworthy) that it became a "scientific method."

The scientific method follows this general process. A researcher must:

  • Generate a theory or hypothesis (i.e., predict what might happen in an experiment) and determine the variables needed to answer their question
  • Develop instruments to measure the phenomenon (such as a survey, a thermometer, etc.)
  • Develop experiments to manipulate the variables
  • Collect empirical (measured) data
  • Analyze data

Quantitative methods are about measuring phenomena, not explaining them.

Quantitative research compares two groups of people. There are all sorts of variables you could measure, and many kinds of experiments to run using quantitative methods.

These comparisons are generally explained using graphs, pie charts, and other visual representations that give the researcher a sense of how the various data points relate to one another.

Basic Assumptions

Quantitative methods assume:

  • That the world is measurable
  • That humans can observe objectively
  • That we can know things for certain about the world from observation

In some fields, these assumptions hold true. Whether you measure the size of the sun 2000 years ago or now, it will always be the same. But when it comes to human behavior, it is not so simple.

As decades of cultural and social research have shown, people behave differently (and even think differently) based on historical context, cultural context, social context, and even identity-based contexts like gender , social class, or sexual orientation .

Therefore, quantitative methods applied to human behavior (as used in psychology and some areas of sociology) should always be rooted in their particular context. In other words: there are no, or very few, human universals.

Statistical information is the primary form of quantitative data used in human and social quantitative research. Statistics provide lots of information about tendencies across large groups of people, but they can never describe every case or every experience. In other words, there are always outliers.

Correlation and Causation

A basic principle of statistics is that correlation is not causation. Researchers can only claim a cause-and-effect relationship under certain conditions:

  • The study was a true experiment.
  • The independent variable can be manipulated (for example, researchers cannot manipulate gender, but they can change the primer a study subject sees, such as a picture of nature or of a building).
  • The dependent variable can be measured through a ratio or a scale.

So when you read a report that "gender was linked to" something (like a behavior or an attitude), remember that gender is NOT a cause of the behavior or attitude. There is an apparent relationship, but the true cause of the difference is hidden.

Pitfalls of Quantitative Research

Quantitative methods are one way to approach the measurement and understanding of human and social phenomena. But what's missing from this picture?

As noted above, statistics do not tell us about personal, individual experiences and meanings. While surveys can give a general idea, respondents have to choose between only a few responses. This can make it difficult to understand the subtleties of different experiences.

Quantitative methods can be helpful when making objective comparisons between groups or when looking for relationships between variables. They can be analyzed statistically, which can be helpful when looking for patterns and relationships.

Qualitative data are not made out of numbers but rather of descriptions, metaphors, symbols, quotes, analysis, concepts, and characteristics. This approach uses interviews, written texts, art, photos, and other materials to make sense of human experiences and to understand what these experiences mean to people.

While quantitative methods ask "what" and "how much," qualitative methods ask "why" and "how."

Qualitative methods are about describing and analyzing phenomena from a human perspective. There are many different philosophical views on qualitative methods, but in general, they agree that some questions are too complex or impossible to answer with standardized instruments.

These methods also accept that it is impossible to be completely objective in observing phenomena. Researchers have their own thoughts, attitudes, experiences, and beliefs, and these always color how people interpret results.

Qualitative Approaches

There are many different approaches to qualitative research, with their own philosophical bases. Different approaches are best for different kinds of projects. For example:

  • Case studies and narrative studies are best for single individuals. These involve studying every aspect of a person's life in great depth.
  • Phenomenology aims to explain experiences. This type of work aims to describe and explore different events as they are consciously and subjectively experienced.
  • Grounded theory develops models and describes processes. This approach allows researchers to construct a theory based on data that is collected, analyzed, and compared to reach new discoveries.
  • Ethnography describes cultural groups. In this approach, researchers immerse themselves in a community or group in order to observe behavior.

Qualitative researchers must be aware of several different methods and know each thoroughly enough to produce valuable research.

Some researchers specialize in a single method, but others specialize in a topic or content area and use many different methods to explore the topic, providing different information and a variety of points of view.

There is not a single model or method that can be used for every qualitative project. Depending on the research question, the people participating, and the kind of information they want to produce, researchers will choose the appropriate approach.

Interpretation

Qualitative research does not look into causal relationships between variables, but rather into themes, values, interpretations, and meanings. As a rule, then, qualitative research is not generalizable (cannot be applied to people outside the research participants).

The insights gained from qualitative research can extend to other groups with proper attention to specific historical and social contexts.

Relationship Between Qualitative and Quantitative Research

It might sound like quantitative and qualitative research do not play well together. They have different philosophies, different data, and different outputs. However, this could not be further from the truth.

These two general methods complement each other. By using both, researchers can gain a fuller, more comprehensive understanding of a phenomenon.

For example, a psychologist wanting to develop a new survey instrument about sexuality might and ask a few dozen people questions about their sexual experiences (this is qualitative research). This gives the researcher some information to begin developing questions for their survey (which is a quantitative method).

After the survey, the same or other researchers might want to dig deeper into issues brought up by its data. Follow-up questions like "how does it feel when...?" or "what does this mean to you?" or "how did you experience this?" can only be answered by qualitative research.

By using both quantitative and qualitative data, researchers have a more holistic, well-rounded understanding of a particular topic or phenomenon.

Qualitative and quantitative methods both play an important role in psychology. Where quantitative methods can help answer questions about what is happening in a group and to what degree, qualitative methods can dig deeper into the reasons behind why it is happening. By using both strategies, psychology researchers can learn more about human thought and behavior.

Gough B, Madill A. Subjectivity in psychological science: From problem to prospect . Psychol Methods . 2012;17(3):374-384. doi:10.1037/a0029313

Pearce T. “Science organized”: Positivism and the metaphysical club, 1865–1875 . J Hist Ideas . 2015;76(3):441-465.

Adams G. Context in person, person in context: A cultural psychology approach to social-personality psychology . In: Deaux K, Snyder M, eds. The Oxford Handbook of Personality and Social Psychology . Oxford University Press; 2012:182-208.

Brady HE. Causation and explanation in social science . In: Goodin RE, ed. The Oxford Handbook of Political Science. Oxford University Press; 2011. doi:10.1093/oxfordhb/9780199604456.013.0049

Chun Tie Y, Birks M, Francis K. Grounded theory research: A design framework for novice researchers .  SAGE Open Med . 2019;7:2050312118822927. doi:10.1177/2050312118822927

Reeves S, Peller J, Goldman J, Kitto S. Ethnography in qualitative educational research: AMEE Guide No. 80 . Medical Teacher . 2013;35(8):e1365-e1379. doi:10.3109/0142159X.2013.804977

Salkind NJ, ed. Encyclopedia of Research Design . Sage Publishing.

Shaughnessy JJ, Zechmeister EB, Zechmeister JS.  Research Methods in Psychology . McGraw Hill Education.

By Anabelle Bernard Fournier Anabelle Bernard Fournier is a researcher of sexual and reproductive health at the University of Victoria as well as a freelance writer on various health topics.

  • Translators
  • Graphic Designers
  • Editing Services
  • Academic Editing Services
  • Admissions Editing Services
  • Admissions Essay Editing Services
  • AI Content Editing Services
  • APA Style Editing Services
  • Application Essay Editing Services
  • Book Editing Services
  • Business Editing Services
  • Capstone Paper Editing Services
  • Children's Book Editing Services
  • College Application Editing Services
  • College Essay Editing Services
  • Copy Editing Services
  • Developmental Editing Services
  • Dissertation Editing Services
  • eBook Editing Services
  • English Editing Services
  • Horror Story Editing Services
  • Legal Editing Services
  • Line Editing Services
  • Manuscript Editing Services
  • MLA Style Editing Services
  • Novel Editing Services
  • Paper Editing Services
  • Personal Statement Editing Services
  • Research Paper Editing Services
  • Résumé Editing Services
  • Scientific Editing Services
  • Short Story Editing Services
  • Statement of Purpose Editing Services
  • Substantive Editing Services
  • Thesis Editing Services

Proofreading

  • Proofreading Services
  • Admissions Essay Proofreading Services
  • Children's Book Proofreading Services
  • Legal Proofreading Services
  • Novel Proofreading Services
  • Personal Statement Proofreading Services
  • Research Proposal Proofreading Services
  • Statement of Purpose Proofreading Services

Translation

  • Translation Services

Graphic Design

  • Graphic Design Services
  • Dungeons & Dragons Design Services
  • Sticker Design Services
  • Writing Services

Solve

Please enter the email address you used for your account. Your sign in information will be sent to your email address after it has been verified.

Qualitative and Quantitative Research: Differences and Similarities

ScienceEditor

Qualitative research and quantitative research are two complementary approaches for understanding the world around us.

Qualitative research collects non-numerical data , and the results are typically presented as written descriptions, photographs, videos, and/or sound recordings.

The goal of qualitative research is to learn about situations that aren't well understood.

In contrast, quantitative research collects numerical data , and the results are typically presented in tables, graphs, and charts.

Quantitative research collects numerical data

Debates about whether to use qualitative or quantitative research methods are common in the social sciences (i.e. anthropology, archaeology, economics, geography, history, law, linguistics, politics, psychology, sociology), which aim to understand a broad range of human conditions. Qualitative observations may be used to gain an understanding of unique situations, which may lead to quantitative research that aims to find commonalities.

Understanding Qualitative vs. Quantitative Research

Within the natural and physical sciences (i.e. physics, chemistry, geology, biology), qualitative observations often lead to a plethora of quantitative studies. For example, unusual observations through a microscope or telescope can immediately lead to counting and measuring. In other situations, meaningful numbers cannot immediately be obtained, and the qualitative research must stand on its own (e.g. The patient presented with an abnormally enlarged spleen (Figure 1), and complained of pain in the left shoulder.)

For both qualitative and quantitative research, the researcher's assumptions shape the direction of the study and thereby influence the results that can be obtained. Let's consider some prominent examples of qualitative and quantitative research, and how these two methods can complement each other.

Qualitative and Quantitative Infographic

Qualitative research example

In 1960, Jane Goodall started her decades-long study of chimpanzees in the wild at Gombe Stream National Park in Tanzania. Her work is an example of qualitative research that has fundamentally changed our understanding of non-human primates, and has influenced our understanding of other animals, their abilities, and their social interactions.

Dr. Goodall was by no means the first person to study non-human primates, but she took a highly unusual approach in her research. For example, she named individual chimpanzees instead of numbering them, and used terms such as "childhood", "adolescence", "motivation", "excitement", and "mood". She also described the distinct "personalities" of individual chimpanzees. Dr. Goodall was heavily criticized for describing chimpanzees in ways that are regularly used to describe humans, which perfectly illustrates how the assumptions of the researcher can heavily influence their work.

The quality of qualitative research is largely determined by the researcher's ability, knowledge, creativity, and interpretation of the results. One of the hallmarks of good qualitative research is that nothing is predefined or taken for granted, and that the study subjects teach the researcher about their lives. As a result, qualitative research studies evolve over time, and the focus or techniques used can shift as the study progresses.

Qualitative research methods

Dr. Goodall immersed herself in the chimpanzees' natural surroundings, and used direct observation to learn about their daily life. She used photographs, videos, sound recordings, and written descriptions to present her data. These are all well-established methods of qualitative research, with direct observation within the natural setting considered a gold standard. These methods are time-intensive for the researcher (and therefore monetarily expensive) and limit the number of individuals that can be studied at one time.

When studying humans, a wider variety of research methods are available to understand how people perceive and navigate their world—past or present. These techniques include: in-depth interviews (e.g. Can you discuss your experience of growing up in the Deep South in the 1950s?), open-ended survey questions (e.g. What do you enjoy most about being part of the Church of Latter Day Saints?), focus group discussions, researcher participation (e.g. in military training), review of written documents (e.g. social media accounts, diaries, school records, etc), and analysis of cultural records (e.g. anything left behind including trash, clothing, buildings, etc).

Qualitative research can lead to quantitative research

Qualitative research is largely exploratory. The goal is to gain a better understanding of an unknown situation. Qualitative research in humans may lead to a better understanding of underlying reasons, opinions, motivations, experiences, etc. The information generated through qualitative research can provide new hypotheses to test through quantitative research. Quantitative research studies are typically more focused and less exploratory, involve a larger sample size, and by definition produce numerical data.

Dr. Goodall's qualitative research clearly established periods of childhood and adolescence in chimpanzees. Quantitative studies could better characterize these time periods, for example by recording the amount of time individual chimpanzees spend with their mothers, with peers, or alone each day during childhood compared to adolescence.

For studies involving humans, quantitative data might be collected through a questionnaire with a limited number of answers (e.g. If you were being bullied, what is the likelihood that you would tell at least one parent? A) Very likely, B) Somewhat likely, C) Somewhat unlikely, D) Unlikely).

Quantitative research example

One of the most influential examples of quantitative research began with a simple qualitative observation: Some peas are round, and other peas are wrinkled. Gregor Mendel was not the first to make this observation, but he was the first to carry out rigorous quantitative experiments to better understand this characteristic of garden peas.

As described in his 1865 research paper, Mendel carried out carefully controlled genetic crosses and counted thousands of resulting peas. He discovered that the ratio of round peas to wrinkled peas matched the ratio expected if pea shape were determined by two copies of a gene for pea shape, one inherited from each parent. These experiments and calculations became the foundation of modern genetics, and Mendel's ratios became the default hypothesis for experiments involving thousands of different genes in hundreds of different organisms.

The quality of quantitative research is largely determined by the researcher's ability to design a feasible experiment, that will provide clear evidence to support or refute the working hypothesis. The hallmarks of good quantitative research include: a study that can be replicated by an independent group and produce similar results, a sample population that is representative of the population under study, a sample size that is large enough to reveal any expected statistical significance.

Quantitative research methods

The basic methods of quantitative research involve measuring or counting things (size, weight, distance, offspring, light intensity, participants, number of times a specific phrase is used, etc). In the social sciences especially, responses are often be split into somewhat arbitrary categories (e.g. How much time do you spend on social media during a typical weekday? A) 0-15 min, B) 15-30 min, C) 30-60 min, D) 1-2 hrs, E) more than 2 hrs).

These quantitative data can be displayed in a table, graph, or chart, and grouped in ways that highlight patterns and relationships. The quantitative data should also be subjected to mathematical and statistical analysis. To reveal overall trends, the average (or most common survey answer) and standard deviation can be determined for different groups (e.g. with treatment A and without treatment B).

Typically, the most important result from a quantitative experiment is the test of statistical significance. There are many different methods for determining statistical significance (e.g. t-test, chi square test, ANOVA, etc.), and the appropriate method will depend on the specific experiment.

Statistical significance provides an answer to the question: What is the probably that the difference observed between two groups is due to chance alone, and the two groups are actually the same? For example, your initial results might show that 32% of Friday grocery shoppers buy alcohol, while only 16% of Monday grocery shoppers buy alcohol. If this result reflects a true difference between Friday shoppers and Monday shoppers, grocery store managers might want to offer Friday specials to increase sales.

After the appropriate statistical test is conducted (which incorporates sample size and other variables), the probability that the observed difference is due to chance alone might be more than 5%, or less than 5%. If the probability is less than 5%, the convention is that the result is considered statistically significant. (The researcher is also likely to cheer and have at least a small celebration.) Otherwise, the result is considered statistically insignificant. (If the value is close to 5%, the researcher may try to group the data in different ways to achieve statistical significance. For example, by comparing alcohol sales after 5pm on Friday and Monday.) While it is important to reveal differences that may not be immediately obvious, the desire to manipulate information until it becomes statistically significant can also contribute to bias in research.

So how often do results from two groups that are actually the same give a probability of less than 5%? A bit less than 5% of the time (by definition). This is one of the reasons why it is so important that quantitative research can be replicated by different groups.

Which research method should I choose?

Choose the research methods that will allow you to produce the best results for a meaningful question, while acknowledging any unknowns and controlling for any bias. In many situations, this will involve a mixed methods approach. Qualitative research may allow you to learn about a poorly understood topic, and then quantitative research may allow you to obtain results that can be subjected to rigorous statistical tests to find true and meaningful patterns. Many different approaches are required to understand the complex world around us.

Related Posts

Need to Make Your Essay Longer? Here's How

Need to Make Your Essay Longer? Here's How

How To Cite a Tweet

How To Cite a Tweet

  • Academic Writing Advice
  • All Blog Posts
  • Writing Advice
  • Admissions Writing Advice
  • Book Writing Advice
  • Short Story Advice
  • Employment Writing Advice
  • Business Writing Advice
  • Web Content Advice
  • Article Writing Advice
  • Magazine Writing Advice
  • Grammar Advice
  • Dialect Advice
  • Editing Advice
  • Freelance Advice
  • Legal Writing Advice
  • Poetry Advice
  • Graphic Design Advice
  • Logo Design Advice
  • Translation Advice
  • Blog Reviews
  • Short Story Award Winners
  • Scholarship Winners

Need an academic editor before submitting your work?

Need an academic editor before submitting your work?

When Does a Researcher Choose a Quantitative, Qualitative, or Mixed Research Approach?

  • Published: 26 November 2021
  • Volume 53 , pages 113–131, ( 2022 )

Cite this article

  • Feyisa Mulisa   ORCID: orcid.org/0000-0002-0738-6554 1  

8203 Accesses

8 Citations

6 Altmetric

Explore all metrics

In educational studies, the paradigm war over quantitative and qualitative research approaches has raged for more than half a century. The focus in the late twentieth century was on the distinction between the two approaches, and the motivation was to retain one of the approaches’ supremacy. Since the early twenty-first century, there has been a growing interest in situating in the middle position and combining both approaches into a single study or a series of studies. Despite these signs of progress, when it comes to using the appropriate research approach at the right time, beginner educational researchers remain perplexed. This paper, therefore, provides useful guidelines that facilitate the choice of quantitative, qualitative, or mixed research approaches in educational inquiry. To achieve this objective, this article comprises three distinct and underlying areas of interest, which have been structured into three sections. The first section highlights the distinctions between quantitative and qualitative research approaches. The second section discusses the paradigm views that underpin the choice of a particular research approach. Finally, an effort has been made to determine the appropriate time to opt for any of the research approaches that facilitate successful educational investigations. Since truth and the means used to discover it are both dynamic, it is also essential to foresight innovative approaches to research with distinguishing features of applications to educational research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

research paper qualitative and quantitative

Social Learning Theory—Albert Bandura

The use of cronbach’s alpha when developing and reporting research instruments in science education.

Keith S. Taber

research paper qualitative and quantitative

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

Kit S. Double, Joshua A. McGrane & Therese N. Hopfenbeck

Åkerblad, L., Seppänen-Järvelä, R., & Haapakoski, K. (2021). Integrative strategies in mixed methods research. Journal of Mixed Methods Research, 15 (2), 152–170. https://doi.org/10.1177/1558689820957125

Article   Google Scholar  

Allwood, C. M. (2012). The distinction between qualitative and quantitative research methods is problematic. Quality and Quantity, 46 (5), 1417–1429. https://doi.org/10.1007/s11135-011-9455-8

Amaratugna, D., Baldry, D., Sarshar, M., & Newton, R. (2002). Quantitative and qualitative research in the built environment: Application of “mixed” research approach. Work Study, 51 (1), 17–31. https://doi.org/10.1108/00438020210415488

Antwi, S. K., & Hamza, K. (2015). Quantitative and qualitative research paradigms in business research: A philosophical reflections. European Journal of Business and Management, 7 (3), 217–225.

Google Scholar  

Bailey, L. F. (2014). The origin and success of qualitative research. International Journal of Market Research, 56 (2), 167–184. https://doi.org/10.2501/IJMR-2014-013

Belk, R. W. (2013). Qualitative versus quantitative research in marketing. Revista de Negócios, 18 (1), 5–9. https://doi.org/10.7867/1980-4431.2013v18n1p5-9

Brinkmann, S., Jacobsen, M. H., Kristiansen, S., Brinkmann, S., Jacobsen, M. H., & Kristiansen, S. (2014). Historical overview of qualitative research in the social sciences. In The Oxford handbook of qualitative research (pp. 16–42). https://doi.org/10.1093/oxfordhb/9780199811755.013.017

Burke Johnson, R., & Onwuegbuzie, A. J. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33 (7), 14–26.

Choy, L. T. (2014). The strengths and weaknesses of research methodology: Comparison and complimentary between qualitative and quantitative approaches. IOSR Journal of Humanities and Social Science, 19 (4), 99–104.

Cohen, L., Manion, L., & Morrison, K. (2018). Research methods in education (8th ed.). Routledge.

Creswell, J. W. (2009). Research design: Qualitative, quantitative and mixed methods approaches (3rd ed.). Sage Publications, Inc.

Creswell, J. W. (2013). Steps in conducting a scholarly mixed methods study . http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1047&context=dberspeakers

Curry, L. A., Nembhard, I. M., & Bradley, E. H. (2009). Qualitative and mixed methods provide unique contributions to outcomes research. Circulation, 119 , 1442–1452. https://doi.org/10.1161/CIRCULATIONAHA.107.742775

Daniel, E. (2016). The usefulness of qualitative and quantitative approaches and methods in researching problem-solving ability in science education curriculum. Journal of Education and Practice, 7 (15), 91–100.

Dawadi, S. (2017). Are quantitative and qualitative approaches to educational research compatible? The Warwick ELT , 3 (6). https://thewarwickeltezine.wordpress.com/2017/05/31/291/

Ejnavarzala, H. (2019). Epistemology–ontology relations in social research: A review. Sociological Bulletin, 68 (1), 94–104. https://doi.org/10.1177/0038022918819369

Fagan, M. B. (2010). Social construction revisited: Epistemology and scientific practice. Philosophy of Science, 77 (1), 92–116.

Farrell, E. (2020). Researching lived experience in education: Misunderstood or missed opportunity? International Journal of Qualitative Methods, 19 , 1–8. https://doi.org/10.1177/1609406920942066

Fassinger, R., & Morrow, S. L. (2013). Toward best practices in quantitative, qualitative, and mixed-method research: A social justice perspective. Journal of Social Action in Counseling Psychology, 5 (2), 69–83.

Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11 (3), 255–274. https://doi.org/10.3102/01623737011003255

Gunasekare, U. (2015). Mixed research method as the third research paradigm: A literature review. International Journal of Science and Research, 4 (8), 363–367.

Hodkinson, P. (2004). Research as a form of work: Expertise, community and methodological objectivity. British Educational Research Journal, 30 (1), 9–26. https://doi.org/10.1080/01411920310001629947

Howitt, D., & Cramer, D. (2011). Introduction to research methods in psychology (3rd ed.). Pearson Education Limited.

Johnson, R. B., & Christensen, L. (2014). Educational research: Quantitative, qualitative and mixed approaches (5th ed.). Sage Publications, Inc.

Johnson, R. B., & Christensen, L. (2020). Educational research: Quantitative, qualitative, and mixed approaches (7th ed.). Sage Publications, Inc.

Khaldi, K. (2017). Quantitative, qualitative or mixed research: Which research paradigm to use? Journal of Educational and Social Research, 7 (2), 15–24. https://doi.org/10.5901/jesr.2017.v7n2p15

Krivokapic-skoko, B., & O’neill, G. (2011). Beyond the qualitative-quantitative distinction: Some innovative methods for business and management research. International Journal of Multiple Research Approaches, 5 (5), 290–300. https://doi.org/10.5172/mra.2011.5.3.290

Lee, A. S., & Baskerville, R. L. (2003). Generalizing generalizability in information systems research. Information Systems Research . https://doi.org/10.1287/isre.14.3.221.16560

Levers, M. J. D. (2013). Philosophical paradigms, grounded theory, and perspectives on emergence. SAGE Open, 3 (4), 1–6. https://doi.org/10.1177/2158244013517243

Lobe, B., Morgan, D., & Hoffman, K. A. (2020). Qualitative data collection in an era of social distancing. International Journal of Qualitative Methods, 19 , 1–8. https://doi.org/10.1177/1609406920937875

Mack, L. (2010). The philosophical underpinnings of educational research. Polyglossia, 19 , 5–11.

Madill, A., & Gough, B. (2008). Qualitative research and its place in psychological science. Psychological Methods, 13 (3), 254–271. https://doi.org/10.1037/a0013220

Marshall, M. N. (1996). Sampling for qualitative research. Family Practice, 13 (6), 522–525.

Maxwell, J. A., & Reybold, L. E. (2015). Qualitative research. In International encyclopedia of the social & behavioral sciences (2nd ed., Vol. 19, pp. 685–689). Elsevier. https://doi.org/10.1016/B978-0-08-097086-8.10558-6

Mertens, D. M. (2012). What comes first? The paradigm or the approach? Journal of Mixed Methods Research, 6 (4), 255–257. https://doi.org/10.1177/1558689812461574

Meyer, D. K., & Schutz, P. A. (2020). Why talk about qualitative and mixed methods in educational psychology? Introduction to special issue. Educational Psychologist, 55 (4), 193–196. https://doi.org/10.1080/00461520.2020.1796671

Morgan, D. L. (2007). Paradigms lost and pragmatism regained: Methodological implications of combining qualitative and quantitative methods. Journal of Mixed Methods Research, 1 (1), 48–76. https://doi.org/10.1177/2345678906292462

Morgan, D. L., & Nica, A. (2020). Iterative thematic inquiry: A new method for analyzing qualitative data. International Journal of Qualitative Methods, 19 , 1–11. https://doi.org/10.1177/1609406920955118

Östlund, U., Kidd, L., Wengström, Y., & Rowa-Dewar, N. (2011). Combining qualitative and quantitative research within mixed method research designs: A methodological review. International Journal of Nursing Studies, 48 (3), 369–383. https://doi.org/10.1016/j.ijnurstu.2010.10.005

Poli, R. (2018). A note on the classification of future-related methods. European Journal of Futures Research, 6 (1), 1–7. https://doi.org/10.1186/s40309-018-0145-9

Rahman, M. S. (2016). The advantages and disadvantages of using qualitative and quantitative approaches and methods in language “Testing and Assessment” research: A literature review. Journal of Education and Learning, 6 (1), 102. https://doi.org/10.5539/jel.v6n1p102

Sale, J. E. M., Lohfeld, L. H., & Brazil, K. (2002). Revisiting the quantitative-qualitative debate: Implications for mixed methods research. Quality & Quantity, 36 , 43–53. https://doi.org/10.1023/A:1014301607592

Salvador, J. T. (2016). Exploring quantitative and qualitative methodologies: A guide to novice nursing researchers. European Scientific Journal, 12 (18), 107–122. https://doi.org/10.19044/esj.2016.v12n18p107

Shannon-Baker, P. (2016). Making paradigms meaningful in mixed methods research. Journal of Mixed Methods Research, 10 (4), 319–334. https://doi.org/10.1177/1558689815575861

Symonds, J. E., & Gorard, S. (2010). Death of mixed methods? Or the rebirth of research as a craft. Evaluation & Research in Education, 23 (2), 121–136. https://doi.org/10.1080/09500790.2010.483514

Taylor, P., & Kanis, H. (2004). The quantitative-qualitative research dichotomy revisited. Theoretical Issues in Ergonomics Science, 5 (6), 507–516. https://doi.org/10.1080/1463922041233130303418

Techo, V. P. (2016). Research methods-quantitative, qualitative, and mixed methods . Horizons University. https://doi.org/10.13140/RG.2.1.1262.4886

Wohlfart, O. (2020). “Digging Deeper?”: Insights from a novice researcher. International Journal of Qualitative Methods, 19 , 1–5. https://doi.org/10.1177/1609406920963778

Yin, R. K. (2011). Qualitative research from start to finish . The Guilford Press.

Download references

No funding was received for this article.

Author information

Authors and affiliations.

Institute of Education and Behavioral Sciences, Ambo University, Ambo, Ethiopia

Feyisa Mulisa

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Feyisa Mulisa .

Ethics declarations

Conflict of interest.

Not applicable.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Mulisa, F. When Does a Researcher Choose a Quantitative, Qualitative, or Mixed Research Approach?. Interchange 53 , 113–131 (2022). https://doi.org/10.1007/s10780-021-09447-z

Download citation

Received : 29 March 2021

Accepted : 18 November 2021

Published : 26 November 2021

Issue Date : March 2022

DOI : https://doi.org/10.1007/s10780-021-09447-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Quantitative research
  • Qualitative research
  • Mixed research
  • Research approaches
  • Research paradigm
  • Find a journal
  • Publish with us
  • Track your research

Qualitative Study

Affiliations.

  • 1 University of Nebraska Medical Center
  • 2 GDB Research and Statistical Consulting
  • 3 GDB Research and Statistical Consulting/McLaren Macomb Hospital
  • PMID: 29262162
  • Bookshelf ID: NBK470395

Qualitative research is a type of research that explores and provides deeper insights into real-world problems. Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants' experiences, perceptions, and behavior. It answers the hows and whys instead of how many or how much. It could be structured as a stand-alone study, purely relying on qualitative data or it could be part of mixed-methods research that combines qualitative and quantitative data. This review introduces the readers to some basic concepts, definitions, terminology, and application of qualitative research.

Qualitative research at its core, ask open-ended questions whose answers are not easily put into numbers such as ‘how’ and ‘why’. Due to the open-ended nature of the research questions at hand, qualitative research design is often not linear in the same way quantitative design is. One of the strengths of qualitative research is its ability to explain processes and patterns of human behavior that can be difficult to quantify. Phenomena such as experiences, attitudes, and behaviors can be difficult to accurately capture quantitatively, whereas a qualitative approach allows participants themselves to explain how, why, or what they were thinking, feeling, and experiencing at a certain time or during an event of interest. Quantifying qualitative data certainly is possible, but at its core, qualitative data is looking for themes and patterns that can be difficult to quantify and it is important to ensure that the context and narrative of qualitative work are not lost by trying to quantify something that is not meant to be quantified.

However, while qualitative research is sometimes placed in opposition to quantitative research, where they are necessarily opposites and therefore ‘compete’ against each other and the philosophical paradigms associated with each, qualitative and quantitative work are not necessarily opposites nor are they incompatible. While qualitative and quantitative approaches are different, they are not necessarily opposites, and they are certainly not mutually exclusive. For instance, qualitative research can help expand and deepen understanding of data or results obtained from quantitative analysis. For example, say a quantitative analysis has determined that there is a correlation between length of stay and level of patient satisfaction, but why does this correlation exist? This dual-focus scenario shows one way in which qualitative and quantitative research could be integrated together.

Examples of Qualitative Research Approaches

Ethnography

Ethnography as a research design has its origins in social and cultural anthropology, and involves the researcher being directly immersed in the participant’s environment. Through this immersion, the ethnographer can use a variety of data collection techniques with the aim of being able to produce a comprehensive account of the social phenomena that occurred during the research period. That is to say, the researcher’s aim with ethnography is to immerse themselves into the research population and come out of it with accounts of actions, behaviors, events, etc. through the eyes of someone involved in the population. Direct involvement of the researcher with the target population is one benefit of ethnographic research because it can then be possible to find data that is otherwise very difficult to extract and record.

Grounded Theory

Grounded Theory is the “generation of a theoretical model through the experience of observing a study population and developing a comparative analysis of their speech and behavior.” As opposed to quantitative research which is deductive and tests or verifies an existing theory, grounded theory research is inductive and therefore lends itself to research that is aiming to study social interactions or experiences. In essence, Grounded Theory’s goal is to explain for example how and why an event occurs or how and why people might behave a certain way. Through observing the population, a researcher using the Grounded Theory approach can then develop a theory to explain the phenomena of interest.

Phenomenology

Phenomenology is defined as the “study of the meaning of phenomena or the study of the particular”. At first glance, it might seem that Grounded Theory and Phenomenology are quite similar, but upon careful examination, the differences can be seen. At its core, phenomenology looks to investigate experiences from the perspective of the individual. Phenomenology is essentially looking into the ‘lived experiences’ of the participants and aims to examine how and why participants behaved a certain way, from their perspective . Herein lies one of the main differences between Grounded Theory and Phenomenology. Grounded Theory aims to develop a theory for social phenomena through an examination of various data sources whereas Phenomenology focuses on describing and explaining an event or phenomena from the perspective of those who have experienced it.

Narrative Research

One of qualitative research’s strengths lies in its ability to tell a story, often from the perspective of those directly involved in it. Reporting on qualitative research involves including details and descriptions of the setting involved and quotes from participants. This detail is called ‘thick’ or ‘rich’ description and is a strength of qualitative research. Narrative research is rife with the possibilities of ‘thick’ description as this approach weaves together a sequence of events, usually from just one or two individuals, in the hopes of creating a cohesive story, or narrative. While it might seem like a waste of time to focus on such a specific, individual level, understanding one or two people’s narratives for an event or phenomenon can help to inform researchers about the influences that helped shape that narrative. The tension or conflict of differing narratives can be “opportunities for innovation”.

Research Paradigm

Research paradigms are the assumptions, norms, and standards that underpin different approaches to research. Essentially, research paradigms are the ‘worldview’ that inform research. It is valuable for researchers, both qualitative and quantitative, to understand what paradigm they are working within because understanding the theoretical basis of research paradigms allows researchers to understand the strengths and weaknesses of the approach being used and adjust accordingly. Different paradigms have different ontology and epistemologies . Ontology is defined as the "assumptions about the nature of reality” whereas epistemology is defined as the “assumptions about the nature of knowledge” that inform the work researchers do. It is important to understand the ontological and epistemological foundations of the research paradigm researchers are working within to allow for a full understanding of the approach being used and the assumptions that underpin the approach as a whole. Further, it is crucial that researchers understand their own ontological and epistemological assumptions about the world in general because their assumptions about the world will necessarily impact how they interact with research. A discussion of the research paradigm is not complete without describing positivist, postpositivist, and constructivist philosophies.

Positivist vs Postpositivist

To further understand qualitative research, we need to discuss positivist and postpositivist frameworks. Positivism is a philosophy that the scientific method can and should be applied to social as well as natural sciences. Essentially, positivist thinking insists that the social sciences should use natural science methods in its research which stems from positivist ontology that there is an objective reality that exists that is fully independent of our perception of the world as individuals. Quantitative research is rooted in positivist philosophy, which can be seen in the value it places on concepts such as causality, generalizability, and replicability.

Conversely, postpositivists argue that social reality can never be one hundred percent explained but it could be approximated. Indeed, qualitative researchers have been insisting that there are “fundamental limits to the extent to which the methods and procedures of the natural sciences could be applied to the social world” and therefore postpositivist philosophy is often associated with qualitative research. An example of positivist versus postpositivist values in research might be that positivist philosophies value hypothesis-testing, whereas postpositivist philosophies value the ability to formulate a substantive theory.

Constructivist

Constructivism is a subcategory of postpositivism. Most researchers invested in postpositivist research are constructivist as well, meaning they think there is no objective external reality that exists but rather that reality is constructed. Constructivism is a theoretical lens that emphasizes the dynamic nature of our world. “Constructivism contends that individuals’ views are directly influenced by their experiences, and it is these individual experiences and views that shape their perspective of reality”. Essentially, Constructivist thought focuses on how ‘reality’ is not a fixed certainty and experiences, interactions, and backgrounds give people a unique view of the world. Constructivism contends, unlike in positivist views, that there is not necessarily an ‘objective’ reality we all experience. This is the ‘relativist’ ontological view that reality and the world we live in are dynamic and socially constructed. Therefore, qualitative scientific knowledge can be inductive as well as deductive.”

So why is it important to understand the differences in assumptions that different philosophies and approaches to research have? Fundamentally, the assumptions underpinning the research tools a researcher selects provide an overall base for the assumptions the rest of the research will have and can even change the role of the researcher themselves. For example, is the researcher an ‘objective’ observer such as in positivist quantitative work? Or is the researcher an active participant in the research itself, as in postpositivist qualitative work? Understanding the philosophical base of the research undertaken allows researchers to fully understand the implications of their work and their role within the research, as well as reflect on their own positionality and bias as it pertains to the research they are conducting.

Data Sampling

The better the sample represents the intended study population, the more likely the researcher is to encompass the varying factors at play. The following are examples of participant sampling and selection:

Purposive sampling- selection based on the researcher’s rationale in terms of being the most informative.

Criterion sampling-selection based on pre-identified factors.

Convenience sampling- selection based on availability.

Snowball sampling- the selection is by referral from other participants or people who know potential participants.

Extreme case sampling- targeted selection of rare cases.

Typical case sampling-selection based on regular or average participants.

Data Collection and Analysis

Qualitative research uses several techniques including interviews, focus groups, and observation. [1] [2] [3] Interviews may be unstructured, with open-ended questions on a topic and the interviewer adapts to the responses. Structured interviews have a predetermined number of questions that every participant is asked. It is usually one on one and is appropriate for sensitive topics or topics needing an in-depth exploration. Focus groups are often held with 8-12 target participants and are used when group dynamics and collective views on a topic are desired. Researchers can be a participant-observer to share the experiences of the subject or a non-participant or detached observer.

While quantitative research design prescribes a controlled environment for data collection, qualitative data collection may be in a central location or in the environment of the participants, depending on the study goals and design. Qualitative research could amount to a large amount of data. Data is transcribed which may then be coded manually or with the use of Computer Assisted Qualitative Data Analysis Software or CAQDAS such as ATLAS.ti or NVivo.

After the coding process, qualitative research results could be in various formats. It could be a synthesis and interpretation presented with excerpts from the data. Results also could be in the form of themes and theory or model development.

Dissemination

To standardize and facilitate the dissemination of qualitative research outcomes, the healthcare team can use two reporting standards. The Consolidated Criteria for Reporting Qualitative Research or COREQ is a 32-item checklist for interviews and focus groups. The Standards for Reporting Qualitative Research (SRQR) is a checklist covering a wider range of qualitative research.

Examples of Application

Many times a research question will start with qualitative research. The qualitative research will help generate the research hypothesis which can be tested with quantitative methods. After the data is collected and analyzed with quantitative methods, a set of qualitative methods can be used to dive deeper into the data for a better understanding of what the numbers truly mean and their implications. The qualitative methods can then help clarify the quantitative data and also help refine the hypothesis for future research. Furthermore, with qualitative research researchers can explore subjects that are poorly studied with quantitative methods. These include opinions, individual's actions, and social science research.

A good qualitative study design starts with a goal or objective. This should be clearly defined or stated. The target population needs to be specified. A method for obtaining information from the study population must be carefully detailed to ensure there are no omissions of part of the target population. A proper collection method should be selected which will help obtain the desired information without overly limiting the collected data because many times, the information sought is not well compartmentalized or obtained. Finally, the design should ensure adequate methods for analyzing the data. An example may help better clarify some of the various aspects of qualitative research.

A researcher wants to decrease the number of teenagers who smoke in their community. The researcher could begin by asking current teen smokers why they started smoking through structured or unstructured interviews (qualitative research). The researcher can also get together a group of current teenage smokers and conduct a focus group to help brainstorm factors that may have prevented them from starting to smoke (qualitative research).

In this example, the researcher has used qualitative research methods (interviews and focus groups) to generate a list of ideas of both why teens start to smoke as well as factors that may have prevented them from starting to smoke. Next, the researcher compiles this data. The research found that, hypothetically, peer pressure, health issues, cost, being considered “cool,” and rebellious behavior all might increase or decrease the likelihood of teens starting to smoke.

The researcher creates a survey asking teen participants to rank how important each of the above factors is in either starting smoking (for current smokers) or not smoking (for current non-smokers). This survey provides specific numbers (ranked importance of each factor) and is thus a quantitative research tool.

The researcher can use the results of the survey to focus efforts on the one or two highest-ranked factors. Let us say the researcher found that health was the major factor that keeps teens from starting to smoke, and peer pressure was the major factor that contributed to teens to start smoking. The researcher can go back to qualitative research methods to dive deeper into each of these for more information. The researcher wants to focus on how to keep teens from starting to smoke, so they focus on the peer pressure aspect.

The researcher can conduct interviews and/or focus groups (qualitative research) about what types and forms of peer pressure are commonly encountered, where the peer pressure comes from, and where smoking first starts. The researcher hypothetically finds that peer pressure often occurs after school at the local teen hangouts, mostly the local park. The researcher also hypothetically finds that peer pressure comes from older, current smokers who provide the cigarettes.

The researcher could further explore this observation made at the local teen hangouts (qualitative research) and take notes regarding who is smoking, who is not, and what observable factors are at play for peer pressure of smoking. The researcher finds a local park where many local teenagers hang out and see that a shady, overgrown area of the park is where the smokers tend to hang out. The researcher notes the smoking teenagers buy their cigarettes from a local convenience store adjacent to the park where the clerk does not check identification before selling cigarettes. These observations fall under qualitative research.

If the researcher returns to the park and counts how many individuals smoke in each region of the park, this numerical data would be quantitative research. Based on the researcher's efforts thus far, they conclude that local teen smoking and teenagers who start to smoke may decrease if there are fewer overgrown areas of the park and the local convenience store does not sell cigarettes to underage individuals.

The researcher could try to have the parks department reassess the shady areas to make them less conducive to the smokers or identify how to limit the sales of cigarettes to underage individuals by the convenience store. The researcher would then cycle back to qualitative methods of asking at-risk population their perceptions of the changes, what factors are still at play, as well as quantitative research that includes teen smoking rates in the community, the incidence of new teen smokers, among others.

Copyright © 2024, StatPearls Publishing LLC.

  • Introduction
  • Issues of Concern
  • Clinical Significance
  • Enhancing Healthcare Team Outcomes
  • Review Questions

Publication types

  • Study Guide

Banner

Quantitative and Qualitative Research

  • Quantitative vs. Qualitative Research
  • Find quantitative or qualitative research in CINAHL
  • Find quantitative or qualitative research in PsycINFO
  • Relevant book titles

Search FAQs or Ask a Question

Related guide.

Student Services

To get help from a librarian, students can:

1- Call the reference desk at:  516.463.5962

2- Click on "Live Chat" (found on the left-hand side of library webpages). 

3- Schedule a Research Consultation or contact Debra Bernstein at [email protected].

Visit the Hofstra Library homepage!

research paper qualitative and quantitative

  • << Previous: Relevant book titles
  • Last Updated: Mar 25, 2024 12:23 PM
  • URL: https://libguides.hofstra.edu/quantitative-and-qualitative-research

Hofstra University

This site is compliant with the W3C-WAI Web Content Accessibility Guidelines HOFSTRA UNIVERSITY Hempstead, NY 11549-1000 (516) 463-6600 © 2000-2009 Hofstra University

This paper is in the following e-collection/theme issue:

Published on 29.3.2024 in Vol 8 (2024)

Factors Explaining the Use of Web-Based Consultations With Physicians by Young and Middle-Aged Individuals in China: Qualitative Comparative Analysis

Authors of this article:

Author Orcid Image

Original Paper

  • Chunyu Zhang 1 , PhD   ; 
  • Ning Hu 2 , MSc   ; 
  • Rui Li 3 , MA   ; 
  • Aiping Zhu 4 , BMed   ; 
  • Zhongguang Yu 3 , PhD  

1 Department of Human Resources, China-Japan Friendship Hospital, Beijing, China

2 School of Management, Beijing University of Chinese Medicine, Beijing, China

3 Respiratory Centre, China-Japan Friendship Hospital, Beijing, China

4 Hospital Office, China-Japan Friendship Hospital, Beijing, China

Corresponding Author:

Zhongguang Yu, PhD

Respiratory Centre

China-Japan Friendship Hospital

Yinghua Road 2#

Beijing, 100013

Phone: 86 84206468

Email: [email protected]

Background: It was only upon the occurrence of the COVID-19 pandemic that the demand for web-based consultations with physicians grew at unprecedented rates. To meet the demand, the service environment developed rapidly during the pandemic.

Objective: This study aimed to identify the current status of the use of web-based consultations with physicians among young and middle-aged Chinese individuals and explore users’ perspectives on key factors that influence its use in terms of optimizing benefits and compensating for disadvantages.

Methods: We conducted semistructured interviews with 65 individuals (aged 18 to 60 years) across China between September and October 2022. The interviewees were selected through snowball sampling. They described their experiences of using web-based physician consultations and the reasons for using or not using the service. Based on the Andersen Behavioral Model, a qualitative comparative analysis was used to analyze the factors associated with the use of web-based physician consultations and explore the combinations of these factors.

Results: In all, 31 (48%) of the 65 interviewees used web-based consultation services. The singular necessary condition analysis revealed that the complementary role of the service and perceived convenience are necessary conditions for the use of web-based consultation services, and user’s confidence in the service was a sufficient condition. Based on the Andersen Behavioral Model, the configuration analysis uncovered 2 interpretation models: an enabling-oriented model and a need-oriented model. The basic combination of the enabling-oriented model included income and perceived convenience. The basic combination of the need-oriented model included complementary role and user’s confidence.

Conclusions: Among the factors associated with the use of web-based consultations, perceived convenience, complementary role, and user’s confidence were essential factors. Clear instructions on the conduct of the service, cost regulations, provider qualifications guarantee, privacy and safety supervision, the consultations’ application in chronic disease management settings, and subsequent visits can promote the positive development of web-based consultations.

Introduction

The term “internet health care service” refers to a closed-loop service that includes health education, medical information inquiry, electronic health files, disease risk assessment, web-based consultation with physicians, electronic prescription, remote consultation, and remote treatment and rehabilitation via the internet and other technological means [ 1 ].

The COVID-19 pandemic created a demand for internet health care services at an unprecedented rate [ 2 - 4 ], as patients became reluctant to go to hospitals because of the fear of infection [ 5 , 6 ]. Accordingly, an increasing number of hospitals and internet companies started to, and continue to, venture into the internet health care industry. Reports show that 52% of outpatient departments in Germany have already adopted internet health care services [ 7 ]. By June 2022, more than 1700 hospitals in China were providing services using the internet, an increase from 100 in December 2018 [ 8 ].

These rapid changes and the quick adoption of internet health care services during the pandemic, however, have impeded the possibility of sufficient analyses on the experience of accessing these services and on how providers can complement the functions of these services to make them more accessible and attractive to users, as well as promote patients’ intention to use the service.

In this study, we only focus on web-based consultations with physicians, which is a core and controversial segment of internet health care services. Some researchers have studied the barriers to and facilitators of web-based consultations and found that perceived convenience, emotional preference, perceived risks, etc, influence behavioral intention [ 9 , 10 ].

However, factors associated with the use of web-based consultations are mixed. Understanding which factors are essential is conducive to optimizing benefits and compensating for disadvantages. Given that young (18-35 years) and middle-aged (35-60 years) individuals are the groups that use web-based consultations the most frequently, we conducted interviews among them to explore the reasons why web-based consultations are used or not used. Then, based on the Andersen Behavioral Model, we applied a qualitative comparative analysis (QCA) approach to analyze evidence from the interviews, to identify how combinations of these interdependent factors lead to the use of web-based consultations with physicians.

research paper qualitative and quantitative

Theoretical Background

The Anderson Behavioral Model, developed by Andersen [ 11 ] in 1968, has been widely used to analyze the factors associated with health service use based on 3 dimensions: predisposing, enabling, and need factors [ 12 - 14 ]. Based on the Andersen Behavioral Model, this study also discussed the factors affecting web-based consultations with physicians in China.

QCA Methodology

The use of web-based consultation has complex influences rather than a single effect. QCA has been applied to explore the different combinations of health care interventions because it bridges qualitative and quantitative methodologies [ 15 ]. Based on set theory, QCA compares characteristics of the cases in relation to the outcomes by a scoring system. Moreover, QCA has an advantage in analyzing small samples, which usually requires 10 to 80 cases [ 16 , 17 ]. Crisp-set QCA (csQCA) yields binary scores of 0 and 1, indicating “full out” or “full in” in certain conditions [ 18 ].

Sample Selection and Data Collection

The semistructured interviews were centered around three broad questions: (1) Do you have experience using internet health care services? (2) If yes, which function do you use and why do you use it? Which function do you never use and why are you are reluctant to use it? and (3) If no, why do you never use internet health care services? When describing their experiences, the participants were asked to share examples and not only feelings about internet health care services.

We conducted interviews with residents of provinces in Eastern, Western, and Central China between September and October 2022.

The initial 5 samples were selected by convenience, and they were patients visiting the China-Japan Friendship Hospital. Subsequently, we asked them to recommend 1 or 2 interviewees, such as their friends, colleagues, or relatives, randomly. We repeated this process until the information on why internet health care services were or were not used was saturated. To obtain representative samples, we analyzed the characteristics of former samples and provided detailed requirements with regard to age, location, income, education, and sex for the following samples.

In total, 70 participants were interviewed, and 5 interviews were excluded owing to a lack of information regarding web-based consultations with physicians. The sample size for QCA should be at least 2 k , where k is the number of conditions [ 17 , 19 ]. The study includes 6 conditions; hence, the sample size should be at least 64. Ultimately, the study included 65 interviews.

Variable Measurement and Calibration

We analyzed the transcripts using a team-based inductive approach. First, the audio data were transcribed verbatim by a third-party company specialized in transcriptions in the Chinese language; once transcribed, the audio recordings were subsequently discarded to protect the participants’ confidentiality. Second, the first round of open coding was conducted using NVivo 12 (QSR International), and we coded the transcripts independently. We discussed and resolved discrepancies and then recoded the data to compile the major themes. Finally, based on the Andersen Behavioral Model, both the conditions and the results were identified by the lead author, reviewed by coauthors, and finalized by the corresponding author (see Table 1 ). In this study, csQCA was conducted using a program for crisp and fuzzy set with the fsQCA3.0 package (Charles C Ragin and Sean Davey).

Ethical Considerations

This study was approved by the China-Japan Friendship Hospital (approval 202-ky-032). We asked the participants whether they would be willing to be interviewed over the phone. Once the participants confirmed that they were interested in participating in the study, we made an appointment with them before the interview. At the beginning of the interview, we reviewed a consent form with the participants and obtained their verbal consent to proceed. Interviews were conducted primarily through phone calls because we aimed to reach more residents from different regions across China. All interviews were recorded and transcribed verbatim for data analysis. No compensation was provided for participation.

Participants’ Characteristics

Participants were recruited from the provinces of Beijing, Shanghai, Guangdong, and Zhejiang in Eastern China (31/65, 48%); Jilin, Henan, and Jiangxi in Central China (14/65, 22%); and Sichuan, Yunan, and Qinghai in Western China (20/65, 31%). In total, 38% (25/65) of the participants were male and 62% (40/65) were female, and the participants’ average age was 35.4 (range 18-51) years ( Table 2 ).

Participants’ Experiences of Web-Based Consultations

In total, 31 (48%) out of 65 participants had experience consulting with physicians over the web. During the COVID-19 pandemic, web-based consultations allowed people to avoid going out and minimized the risk of infection. Although web-based consultations were not always feasible with regard to curing diseases, the participants used them as a prediagnosis tool, which helped them make appropriate decisions regarding what to do next about their potential condition. Participant 1 shared his web-based consultation experience with us:

My wife was suffering from gallstones. We paid for an appointment with a famous physician to receive advice on the need for surgery. After uploading the results of an exam and consulting with the physician through the internet, we accepted his suggestion and she underwent an operation.

Factors Explaining the Use of Web-Based Consultations With Physicians

Necessity analysis of individual conditions.

The first step of QCA is to examine whether a single condition (including its noncollection) is necessary for a complete merger. When the consistency level is greater than 0.8, the condition is considered sufficient for the use of web-based consultations with physicians. When the consistency level is greater than 0.9, the condition is regarded as necessary for the use [ 17 , 20 , 21 ].

Table 3 shows the test result of the necessary conditions for the use of web-based consultations with physician using the fsQCA 3.0 package. The consistency of “complementary role” and “perceived convenience” exceeded 0.9. Thus, the complementary role of web-based consultations and its perceived convenience are necessary conditions for the use (consistency of 0.968 and 0.935, respectively), followed by user’s confidence (consistency of 0.806), which is a sufficient condition for the use.

a “~” means that a factor does not appear or is “not.”

b Italics denote that the consistency exceeded 0.8.

Adequacy Analysis of Conditional Configuration

In operating the truth table, the configuration analysis was applied to reveal the sufficiency analysis of the use caused by different configurations composed of multiple conditions. We set the consistency threshold to 0.8 and the case frequency threshold to 1 and calculated the complex solution, parsimonious solution, and intermediate solution.

As indicated in Table 4 , there are 4 paths to promote the “use of web-based consultation with physicians.” Among the 4 combined paths, the unique coverage of S2 and S4 was 0.177 and 0.274, respectively. The unique coverage of S1 and S3 was 0.032. In total, these 4 paths showed strong explanatory power due to the good consistency (0.953) and the relative high coverage (0.661).

a General conditions.

b General conditions do not appear.

c Core condition.

d Corresponding conditions with path do not matter.

e Core conditions do not appear.

Based on the Andersen Behavioral Model [ 11 ], we merged the 4 paths into 2 to build a more explanatory model. The first interpretation model is an enabling-oriented model (M1), which includes paths S1 and S2. The basic expression is M1 = age × income × perceived convenience × complementary role (× education + × user’s confidence). The basic combination is the enabling dimension including income and perceived convenience. That is, when web-based consultation brings perceived convenience, the relative high-income group will opt for it.

The participants regarded time saving and avoiding infection during the COVID-19 pandemic as the main conveniences brought by web-based consultations, whereas they regarded complex conduct procedures and late responses as inconveniences.

Participant 2 used web-based consultations because of its time saving characteristic. He said the following:

I have always made appointments with physicians through Haodaifu [an internet health care platform]. I am satisfied with their services because this website informs me about an upcoming appointment beforehand. Meanwhile, the physicians come [for the consultation] on time. The waiting time is not much.

Participant 3 used web-based consultations to avoid COVID-19 infection. She said the following:

I get nervous when my little kid feels any discomfort. On the one hand, I am afraid to go to the hospital because of the risk of infection owing to the COVID-19 pandemic. On the other hand, I also get worried about the adverse consequences of delaying [the child’s treatment]. As a result, I usually opt for a web-based consultation immediately, and use it to determine the necessity of an in-person visit.

Participant 4 complained about the complex procedures that lack instructions. He said the following:

The registration process is complex. A lot of personal information must be entered before beginning the web-based consultation. Due to a lack of clear instructions, it is difficult to figure out how to begin the service. I attempted to register the system, but it was unable to use the service.

The second interpretation model is a need-oriented model (M2), which includes paths S3 and S4. The basic expression is M2 = complementary role × user’s confidence × ~income (× age × ~education × ~perceived convenience + × ~age × education × perceived convenience). The basic combination is the need dimension including complementary role and user’s confidence. That is, regardless of age and education, when web-based consultations are needed, the relative non–high-income group does not care whether it is inconvenient and will opt for it.

In terms of minor problems or primary suggestions, web-based consultations were regarded as complementary to conventional consultations. Participants 3 and 5 said the following, respectively:

I also get worried about the adverse consequences of delaying [the child’s treatment]. As a result, I usually opt for a web-based consultation immediately, and use it to determine the necessity of an in-person visit.
Some specialties, such as dentistry and ophthalmology, require careful examination through the use of instruments before making the diagnosis. Regarding urgent cases, it would still be better for patients to visit the hospital.

Compared with in-person consultations, web-based communications are less smooth because physicians are unable to observe the patient’s body language and emotions. Some participants mentioned that web-based consultation services cannot perform laboratory tests and physical exams when it comes to diagnosis. Moreover, the costs of web-based consultations are not yet covered by the social health care insurance system. This means that patients will have to bear the cost of internet health care services. Due these reasons, some participants did not regard it as a substitute for in-person consultations. Regarding this, Participants 6 and 7 stated the following, respectively:

If it is not face-to-face consultation, I am afraid I could not describe [the symptom] clearly and the doctors would misunderstand me.
For senior physicians of P Hospital, the cost of a web-based consultation is three times that of a conventional consultation. Meanwhile, the expenditure on web-based consultations cannot be reimbursed by social healthcare insurance.

Some participants do not have confidence in web-based consultation services owing to privacy, safety, and qualification concerns, as well as problems surrounding web-based diagnosis. Below is an interview excerpt of Participant 8, who has experience in using text web-based consultations but not video consultations:

Although I never used video consultations, I am afraid that the system records the whole process automatically. I am worried that the video will be misused without my permission. Meanwhile, it is difficult to confirm the qualification of the doctors providing the service. After visiting the professor in C Hospital for a lung infection, I uploaded the results of a chest CT for further suggestions. I doubted the suggestion made by the professor’s students primarily. Given the busy schedule of the professor, his students had made the initial suggestions, which were later checked by the professor himself. So, I only trust the platforms run by public hospitals.

Robustness Test

To test the robustness, we increased the consistency level from 0.8 to 0.85 and we also decreased it to 0.72. The result showed that the configuration paths after the adjustment were consistent with those before the adjustment, and the coverage and consistency did not change substantially. Therefore, the results were robust.

Principal Findings

We examined the current status of the use of web-based consultations with physicians and the factors associated with the service among young and middle-aged Chinese individuals. About half (31/65, 48%) of 18- to 60-year-old residents have experienced web-based consultations. Among the factors associated with the use of web-based consultation, perceived convenience, complementary role, and user’s confidence were found to be the most essential factors.

Optimizing Web-Based Consultations

We found that perceived convenience is a necessary condition enabling participants to use web-based consultations. In this study, time saving and avoiding COVID-19 infection, which are conveniences provided by web-based consultations, promote the participants’ use of it. Participants in our study, similar to patients in other countries, strongly wish to spend less time to access the services, both when making appointments and while waiting for the appointment at the location; they prefer web-based access to appointment scheduling, want SMS text messaging services for reminders, and prefer for physicians to be available during evenings and weekends [ 22 ]. The web-based consultation system provided patients with time-saving and convenient solutions for their health care needs across all treatment processes. Patients could make appointments according to their own schedule and do not have to spend time traveling to the appointments. These findings concur with the research done in the United States. Almathami et al [ 23 ] conducted a survey in Saudi Arabia and found that saving time would increase the motivation toward the use of web-based consultations.

The COVID-19 pandemic positively influenced web-based consultation use. This is in line with findings of past studies. Studies show that internet health care services enable patients to avoid going out, decrease the time spent at hospitals when patients need to visit hospitals, and minimize infection risks [ 3 , 22 , 24 ]. Thus, it is not surprising that the COVID-19 pandemic catalyzed the development and use of the service. Although the service cannot fully substitute traditional in-person appointments, various patients were willing to use web-based consultations in the post–COVID-19 era in the long term.

In our research, participants remarked about the unclear instructions hampering the use of the service. Prior studies also find that patients who lack basic internet-related knowledge are excluded from internet health care services [ 25 , 26 ]. The web-based consultation environment requires patients to be well versed in using web-based platforms and electronic gadgets, and the skill levels regarding this vary by patient. Those with low literacy or limited internet-related knowledge are reluctant to use the service. This potential situation was highlighted in prior studies [ 27 , 28 ]. In the future, web-based consultation providers could attempt to assist persons with less accessibility to the platforms by creating intuitive instructions or even providing staff to support these people and explain how they can navigate the service step-by-step. They can also train patients on the use of available technologies prior to them making an appointment. For example, a video on how to book a web-based consultation could be provided on the front page to guide patients.

Focusing on the Needs of Residents

We found that once the participants’ needs were met, they opted to use web-based consultations with physicians.

In the study, although web-based consultations do not have provisions for laboratory tests and physical exams, they serve as a supplement for minor issues and primary suggestions.

Meanwhile, some studies reported that web-based consultations improved outcomes in chronic disease management such as diabetes and hyperactivity disorder [ 29 , 30 ]. Considering patients’ preference and need, applying the service in chronic diseases management and subsequent visits may expand its complementary role and benefit patients to a greater extent.

Our results found that some participants did not regard web-based consultations as complementary due to its cost. This is in line with previous studies that found that the cost is a barrier influencing the use of the service, even in high-income countries [ 26 ]. Moreover, similar to Germany and the United States, clear regulations about web-based consultations are lacking in China; accordingly, not only do the costs of web-based consultations vary widely, but expenditures on the service are not yet covered by the social health insurance systems [ 5 , 31 ]. Thus, the economic burden on patients may impede their use of web-based consultations.

Users’ confidence is a sufficient factor influencing the use of web-based consultations in the study. The participants use the service if they feel safe. Several participants expressed concerns about the safety and privacy of web-based platforms, as well as the qualification of physicians. This corroborates the findings of prior studies, wherein participants expressed their concern about such safety and privacy issues and believed that the safety and privacy of users should be guaranteed by clear regulations for such services [ 32 , 33 ]. These regulations should ensure that patient data cannot be misused for purposes other than health care or shared without patients’ informed consent. Best practices and standards should also be created to ensure that providers have the relevant qualifications and service quality to provide web-based consultations.

Limitations

Because of the COVID-19 pandemic, our semistructured interviews were mostly conducted through phone calls. This hindered our ability to observe the participants’ body language and nonverbal cues. Nonetheless, we contacted the participants to explain the topic and purpose of the interview. We shared the questions with the participants 1-7 days in advance, which the participants deemed as adequate and reasonable, enabling them to provide more comprehensive information.

Conclusions

In conclusion, the Andersen Behavioral Model represents a profound reflection and exploration of the factors associated with web-based consultation use from the user’s perspective. Additionally, the csQCA offers guidance for optimizing the benefits of the service. Perceived convenience, complementary role, and user’s confidence are the essential influencers associated with the use of the service. Clear instructions, comprehensive regulations, and appropriate application can promote the positive development of web-based consultations.

Acknowledgments

We would like to thank the interviewees who participated in the study. This work was supported by National Natural Science Fund for Young Scholars of China (72104255), Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (CIFMS; 2021-I2M-1-046), and National Health Commission Human Resources Development Center for Public Hospital Human Resource Research Project (RCLX2215018).

Data Availability

The data that support the findings presented in this study are available from the corresponding author on reasonable request.

Authors' Contributions

CZ and ZY played a significant role in study design, recruitment, data coding, and paper writing. NH was responsible for conducting the statistical analysis and drafting the Methods and Results sections. RL performed all interviews and data coding. AZ contributed to data coding. All authors thoroughly reviewed the paper before submission and granted their approval for publication.

Conflicts of Interest

None declared.

  • Zhang X, Qing Q, P DZ, T LK. Understanding and analysis of "Internet+Medicine" [Article in Chinese]. China Health Industry. 2017;10:67-69. [ CrossRef ]
  • Del Prete E, Francesconi A, Palermo G, Mazzucchi S, Frosini D, Morganti R, et al. Prevalence and impact of COVID-19 in Parkinson's disease: evidence from a multi-center survey in Tuscany region. J Neurol. Apr 2021;268(4):1179-1187. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hassan A, Mari Z, Gatto EM, Cardozo A, Youn J, Okubadejo N, et al. Global survey on telemedicine utilization for movement disorders during the COVID-19 pandemic. Mov Disord. Oct 2020;35(10):1701-1711. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Seiler N, Chaudhry HJ, Lovitch K, Heyison C, Karacuschansky A, Organick-Lee P, et al. Telehealth services and the law: the rapidly evolving regulatory landscape and considerations for sexually transmitted infection and HIV services. Sex Transm Dis. Nov 01, 2022;49(11S Suppl 2):S18-S21. [ CrossRef ] [ Medline ]
  • Byambasuren O, Greenwood H, Bakhit M, Atkins T, Clark J, Scott AM, et al. Comparison of telephone and video telehealth consultations: systematic review. J Med Internet Res. Nov 17, 2023;25:e49942. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang W, Sun L, Liu T, Lai T. The use of e-health during the COVID-19 pandemic: a case study in China's Hubei province. Health Sociol Rev. Nov 2022;31(3):215-231. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhou C, Hao Y, Lan Y, Li W. To introduce or not? strategic analysis of hospital operations with telemedicine. Eur J Oper Res. Jan 01, 2023;304(1):292-307. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang J. China's internet medical industry summary 2022. Analysys. Jul 11, 2022. URL: https://boyue.analysys.cn/sail/view/portal/index.html#/detail/20020606 [accessed 2024-01-08]
  • Almathami HKY, Win KT, Vlahu-Gjorgievska E. Barriers and facilitators that influence telemedicine-based, real-time, online consultation at patients' homes: systematic literature review. J Med Internet Res. Feb 20, 2020;22(2):e16407. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li D, Hu Y, Pfaff H, Wang L, Deng L, Lu C, et al. Determinants of patients' intention to use the online inquiry services provided by internet hospitals: empirical evidence from China. J Med Internet Res. Oct 29, 2020;22(10):e22716. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Andersen R. A Behavioral Model of Families' Use of Health Services. Chicago, IL. University of Chicago; 1968.
  • Lemming MR, Calsyn RJ. Utility of the behavioral model in predicting service utilization by individuals suffering from severe mental illness and homelessness. Community Ment Health J. Aug 2004;40(4):347-364. [ CrossRef ] [ Medline ]
  • Andersen RM. National health surveys and the behavioral model of health services use. Med Care. Jul 2008;46(7):647-653. [ CrossRef ] [ Medline ]
  • Teng L, Li Y. Analysis on the willingness and influencing factors of choosing primary healthcare institutions among patients with chronic conditions in China: a cross-sectional study. BMJ Open. Mar 30, 2022;12(3):e054783. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zahroh RI, Sutcliffe K, Kneale D, Vazquez Corona M, Betrán AP, Opiyo N, et al. Educational interventions targeting pregnant women to optimise the use of caesarean section: what are the essential elements? a qualitative comparative analysis. BMC Public Health. Sep 23, 2023;23(1):1851. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ragin CC. Qualitative comparative analysis using fuzzy sets (fsQCA). In: Rihous B, Ragin CC, editors. Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques. Thousand Oaks, CA. SAGE Publications; 2009;87-122.
  • Ragin CC. Redesigning Social Inquiry: Fuzzy Sets and Beyond. Chicago, IL. University of Chicago Press; 2008.
  • Harris K, Kneale D, Lasserson TJ, McDonald VM, Grigg J, Thomas J. School-based self-management interventions for asthma in children and adolescents: a mixed methods systematic review. Cochrane Database Syst Rev. Jan 28, 2019;1(1):CD011651. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Farrugia B. WASP (write a scientific paper): an introduction to set-theoretic methods and qualitative comparative analysis. Early Hum Dev. Jun 2019;133:43-47. [ CrossRef ] [ Medline ]
  • Thiem A. Standards of good practice and the methodology of necessary conditions in qualitative comparative analysis. Polit Anal. Jan 4, 2017;24(4):478-484. [ CrossRef ]
  • Lu L, Shi S, Liu B, Liu C. Analysis of factors influencing the organizational capacity of institutional review boards in China: a crisp-set qualitative comparative analysis based on 107 cases. BMC Med Ethics. Sep 26, 2023;24(1):74. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang C, Zhu K, Lin Z, Huang P, Pan Y, Sun B, et al. Utility of deep brain stimulation telemedicine for patients with movement disorders during the COVID outbreak in China. Neuromodulation. Feb 2021;24(2):337-342. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Almathami HKY, Win KT, Vlahu-Gjorgievska E. An empirical study on factors influencing consumers' motivation towards teleconsultation system use. a preliminary report about the Sehha application from Saudi Arabia. Int J Med Inform. Jul 2022;163:104775. [ CrossRef ] [ Medline ]
  • Wosik J, Fudim M, Cameron B, Gellad ZF, Cho A, Phinney D, et al. Telehealth transformation: COVID-19 and the rise of virtual care. J Am Med Inform Assoc. Jun 01, 2020;27(6):957-962. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mascaro JS, Catic A, Srivastava M, Diller M, Rana S, Escoffery C, et al. Examination of provider and patient knowledge, beliefs, and preferences in integrative oncology at a National Cancer Institute-designated comprehensive cancer center. Integr Med Rep. 2022;1(1):66-75. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Neves AL, Burgers J. Digital technologies in primary care: implications for patient care and future research. Eur J Gen Pract. Dec 2022;28(1):203-208. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Neavel C, Watkins SC, Chavez M. Youth, social media, and telehealth: how COVID-19 changed our interactions. Pediatr Ann. Apr 2022;51(4):e161-e166. [ CrossRef ] [ Medline ]
  • Chen K, Zhang C, Gurley A, Akkem S, Jackson H. Appointment non-attendance for telehealth versus in-person primary care visits at a large public healthcare system. J Gen Intern Med. Mar 2023;38(4):922-928. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kubes JN, Jones L, Hassan S, Franks N, Wiley Z, Kulshreshtha A. Differences in diabetes control in telemedicine vs. in-person only visits in ambulatory care setting. Prev Med Rep. Dec 2022;30:102009. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Pritchard AE, Northrup RA, Peterson R, Lieb R, Wexler D, Ng R, et al. Can we expand the pool of youth who receive telehealth assessments for ADHD? covariates of service utilization. J Atten Disord. Jan 2023;27(2):159-168. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dorsey ER, Okun MS, Bloem BR. Care, convenience, comfort, confidentiality, and contagion: the 5 C's that will shape the future of telemedicine. J Parkinsons Dis. 2020;10(3):893-897. [ CrossRef ] [ Medline ]
  • Oelmeier K, Schmitz R, Möllers M, Braun J, Deharde D, Sourouni M, et al. Satisfaction with and feasibility of prenatal counseling via telemedicine: a prospective cohort study. Telemed J E Health. Aug 2022;28(8):1193-1198. [ CrossRef ] [ Medline ]
  • Chen K, Davoodi NM, Strauss DH, Li M, Jiménez FN, Guthrie KM, et al. Strategies to ensure continuity of care using telemedicine with older adults during COVID-19: a qualitative study of physicians in primary care and geriatrics. J Appl Gerontol. Nov 2022;41(11):2282-2295. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Mavragani; submitted 17.06.23; peer-reviewed by S Wu, C Juhra, P Codyre, P Huang, W LaMendola; comments to author 01.12.23; revised version received 19.01.24; accepted 07.03.24; published 29.03.24.

©Chunyu Zhang, Ning Hu, Rui Li, Aiping Zhu, Zhongguang Yu. Originally published in JMIR Formative Research (https://formative.jmir.org), 29.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Published on 28.3.2024 in Vol 26 (2024)

Augmenting K-Means Clustering With Qualitative Data to Discover the Engagement Patterns of Older Adults With Multimorbidity When Using Digital Health Technologies: Proof-of-Concept Trial

Authors of this article:

Author Orcid Image

Original Paper

  • Yiyang Sheng 1 , MSc   ; 
  • Raymond Bond 2 , PhD   ; 
  • Rajesh Jaiswal 3 , PhD   ; 
  • John Dinsmore 4 , PhD   ; 
  • Julie Doyle 1 , PhD  

1 NetwellCASALA, Dundalk Institution of Technology, Dundalk, Ireland

2 School of Computing, Ulster University, Jordanstown, United Kingdom

3 School of Enterprise Computing and Digital Transformation, Technological University Dublin, Dublin, Ireland

4 Trinity Centre for Practice and Healthcare Innovation, School of Nursing and Midwifery, Trinity College Dublin, Dublin, Ireland

Corresponding Author:

Yiyang Sheng, MSc

NetwellCASALA

Dundalk Institution of Technology

Dublin Road, PJ Carrolls Building, Dundalk Institute of Technology

Co.Louth, Ireland

Dundalk, A91 K584

Phone: 353 894308214

Email: [email protected]

Background: Multiple chronic conditions (multimorbidity) are becoming more prevalent among aging populations. Digital health technologies have the potential to assist in the self-management of multimorbidity, improving the awareness and monitoring of health and well-being, supporting a better understanding of the disease, and encouraging behavior change.

Objective: The aim of this study was to analyze how 60 older adults (mean age 74, SD 6.4; range 65-92 years) with multimorbidity engaged with digital symptom and well-being monitoring when using a digital health platform over a period of approximately 12 months.

Methods: Principal component analysis and clustering analysis were used to group participants based on their levels of engagement, and the data analysis focused on characteristics (eg, age, sex, and chronic health conditions), engagement outcomes, and symptom outcomes of the different clusters that were discovered.

Results: Three clusters were identified: the typical user group, the least engaged user group, and the highly engaged user group. Our findings show that age, sex, and the types of chronic health conditions do not influence engagement. The 3 primary factors influencing engagement were whether the same device was used to submit different health and well-being parameters, the number of manual operations required to take a reading, and the daily routine of the participants. The findings also indicate that higher levels of engagement may improve the participants’ outcomes (eg, reduce symptom exacerbation and increase physical activity).

Conclusions: The findings indicate potential factors that influence older adult engagement with digital health technologies for home-based multimorbidity self-management. The least engaged user groups showed decreased health and well-being outcomes related to multimorbidity self-management. Addressing the factors highlighted in this study in the design and implementation of home-based digital health technologies may improve symptom management and physical activity outcomes for older adults self-managing multimorbidity.

Introduction

According to the United Nations, the number of people aged ≥65 years is growing faster than all other age groups [ 1 ]. The worldwide population of people aged ≥65 years will increase from approximately 550 million in 2000 to 973 million in 2030 [ 2 ]. Furthermore, by 2050, approximately 16% of the world’s population will be aged >65 years, whereas 426 million people will be aged >80 years [ 1 ]. Living longer is a great benefit to today’s society. However, this comes with several challenges. Aging can be associated with many health problems, including multimorbidity (ie, the presence of ≥2 chronic conditions) [ 3 ]. The prevalence rate of multimorbidity among older adults is estimated to be between 55% and 98%, and the factors associated with multimorbidity are older age, female sex, and low socioeconomic status [ 4 ]. In the United States, almost 75% of older adults have multimorbidity [ 5 ], and it was estimated that 50 million people in the European Union were living with multimorbidity in 2015 [ 6 ]. Likewise, the prevalence rate of multimorbidity is 69.3% among older adults in China [ 5 ].

Home-based self-management for chronic health conditions involves actions and behaviors that protect and promote good health care practices comprising the management of physical, emotional, and social care [ 7 ]. Engaging in self-management can help older adults understand and manage their health conditions, prevent illness, and promote wellness [ 7 , 8 ]. However, self-management for older adults with multimorbidity is a long-term, complex, and challenging mission [ 9 , 10 ]. There are numerous self-care tasks to engage in, which can be very complicated, especially for people with multiple chronic health conditions. Furthermore, the severity of the disease can negatively impact a person’s ability to engage in self-management [ 10 ].

Digital home-based health technologies have the potential to support better engagement with self-management interventions, such as the monitoring of symptom and well-being parameters as well as medication adherence [ 10 , 11 ]. Such technologies can help older adults understand their disease or diseases, respond to changes, and communicate with health care providers [ 12 - 14 ]. Furthermore, digital health technologies can be tailored to individual motivations and personal needs [ 13 ], which can improve sustained use [ 15 ] and result in people feeling supported [ 16 ]. Digital self-management can also create better opportunities for adoption and adherence in the long term compared with paper booklet self-management [ 16 ]. Moreover, digital health technologies, such as small wearable monitoring devices, can increase the frequency of symptom monitoring for patients with minimal stress compared with symptom monitoring with manual notifications [ 17 ].

A large body of research implements data mining and machine learning algorithms using data acquired from home-based health care data sets. Data mining techniques, such as data visualization, clustering, classification, and prediction, to name a few, can help researchers understand users, behaviors, and health care phenomena by identifying novel, interesting patterns. These techniques can also be used to build predictive models [ 18 - 21 ]. In addition, data mining techniques can help in designing health care management systems and tracking the state of a person’s chronic disease, resulting in appropriate interventions and a reduction in hospital admissions [ 18 , 22 ]. Vast amounts of data can be generated when users interact with digital health technologies, which provides an opportunity to understand chronic illnesses as well as elucidate how users engage with digital health technologies in the real world. Armstrong et al [ 23 ] used the k-means algorithm to identify previously unknown patterns of clinical characteristics in home care rehabilitation services. The authors used k-means cluster analysis to analyze data from 150,253 clients and discovered new insights into the clients’ characteristics and their needs, which led to more appropriate rehabilitation services for home care clients. Madigan and Curet [ 22 ] used classification and regression trees to investigate a home-based health care data set that comprised 580 patients who had 3 specific conditions: chronic obstructive pulmonary disease (COPD), heart failure (HF), and hip replacement. They found that data mining methods identified the dependencies and interactions that influence the results, thereby improving the accuracy of risk adjustment methods and establishing practical benchmarks [ 22 ]. Other research [ 24 ] has developed a flow diagram of a proposed platform by using machine learning methods to analyze multiple health care data sets, including medical images as well as diagnostic and voice records. The authors believe that the system could help people in resource-limited areas, which have lower ratios of physicians and hospitals, to diagnose diseases such as breast cancer, heart disease (HD), diabetes, and liver disease at a lower cost and in less time than local hospitals. In the study, the accuracy of disease detection was >95% [ 24 ].

There are many different approaches to clustering analysis of health care data sets, such as k-means, density-based spatial clustering of applications with noise, agglomerative hierarchical clustering, self-organizing maps, partitioning around medoids algorithm, hybrid hierarchical clustering, and so on [ 25 - 28 ]. K-means clustering is 1 of the most commonly used clustering or unsupervised machine learning algorithms [ 19 , 29 ], and it is relatively easy to implement and relatively fast [ 30 - 32 ]. In addition, k-means has been used in research studies related to chronic health conditions such as diabetes [ 33 ], COPD [ 34 , 35 ], and HF [ 36 ]; for example, a cloud-based framework with k-means clustering technique has been used for the diagnosis of diabetes and was found to be more efficient and suitable for handling extensive data sets in cloud computing platforms than hierarchical clustering [ 32 ]. Violán et al [ 37 ] analyzed data from 408,994 patients aged 45 to 64 years with multimorbidity using k-means clustering to ascertain multimorbidity patterns. The authors stratified the k-means clustering analysis by sex, and 6 multimorbidity patterns were found for each sex. They also suggest that clusters identified by multimorbidity patterns obtained using nonhierarchical clustering analysis (eg, k-means and k-medoids) are more consistent with clinical practice [ 37 ].

The majority of data mining studies on chronic health conditions focus on the diseases themselves and their symptoms; there is less exploration of the patterns of engagement of persons with multimorbidity with digital health technologies. However, data mining and machine learning are excellent ways to understand users’ engagement patterns with digital health technologies. A study by McCauley et al [ 38 ] compared clustering analysis of the user interaction event log data from a reminiscence mobile app that was designed for people living with dementia. In addition to performing quantitative user interaction log analysis, the authors also gathered data on the qualitative experience of users. The study showed the benefits of using data mining to analyze the user log data with complementary qualitative data analysis [ 38 ]. This is a research challenge where both quantitative and qualitative methods can be combined to fully understand users; for example, the quantitative analysis of the user event data can tell us about use patterns, the preferred times of day to use the app, the feature use, and so on, but qualitative data (eg, user interviews) are necessary to understand why these use patterns exist.

The aim of this study was to analyze how older adults with multimorbidity engage with digital symptom and health monitoring over a period of approximately 12 months using a digital health platform. In this study, user log data of engagement with digital health technology and user interview qualitative data were examined to explore the patterns of engagement. K-means clustering was used to analyze the user log data. The study had four research questions: (1) How do clusters differ in terms of participant characteristics such as age, sex, and health conditions? (2) How do clusters differ in terms of patterns of engagement, such as the number of days a week participants take readings (eg, weight and blood pressure [BP])? (3) How do engagement rates with the different devices correlate with each other (determined by analyzing the weekly submissions of every parameter and the interviews of participants)? and (4) How do engagement rates affect participants’ health condition symptoms, such as BP, blood glucose (BG) level, weight, peripheral oxygen saturation (SpO 2 ) level, and physical activity (PA)?

The study was a proof-of-concept trial with an action research design and mixed methods approach. Action research is a period of investigation that “describes, interprets, and explains social situations while executing a change intervention aimed at improvement and involvement” [ 39 ]. An action research approach supports the generation of solutions to practical problems while using methods to understand the contexts of care as well as the needs and experiences of participants.

Recruitment and Sample

Although 120 participants consented to take part across Ireland and Belgium, this paper reports on data from 60 Irish older adults with multiple chronic health conditions (≥2 of the following: COPD, HF, HD, and diabetes). Participants were recruited through purposive sampling and from multiple sources, including through health care organizations (general practitioner clinics and specialist clinics), relevant older adult networks, chronic disease support groups, social media, and local newspaper advertising. Recruitment strategies included the use of study flyers and advertisements as well as giving talks and platform demonstrations.

Sources of Data

The data set was collected during the Integrated Technology Systems for Proactive Patient Centred Care (ProACT) project proof-of-concept trial. As the trial was a proof-of-concept of a novel digital health platform, the main goal was to understand how the platform worked or did not work, rather than whether it worked. Thus, to determine sample size, a pragmatic approach was taken in line with two important factors: (1) Is the sample size large enough to provide a reliable analysis of the ecosystem? and (2) Is the sample size small enough to be financially feasible? The literature suggests that overall sample size in proof-of-concept digital health trials is low. A review of 1030 studies on technical interventions for management of chronic disease that focused on HF (436 studies), stroke (422 studies), and COPD (172 studies) suggested that robust sample sizes were 17 for COPD, 19 for HF, and 21 for stroke [ 40 ]. Full details on the study protocol can be found in the study by Dinsmore et al [ 41 ].

Participants used a suite of sensor devices (ie, BP monitors, weight scales, glucometers, pulse oximeters, and activity watches) and a tablet app to monitor their health conditions and well-being. All participants received a smartwatch to measure PA levels and sleep, a BP monitor to measure BP and pulse rate, and a weight scale. A BG meter was provided to participants with diabetes, and a pulse oximeter was provided to those with COPD to measure SpO 2 levels. In addition, all participants received an iPad with a custom-designed app, the ProACT CareApp, that allowed users to view their data, provide self-report (SR) data on symptoms that could not be easily captured through a sensor (eg, breathlessness and edema) and well-being (eg, mood and satisfaction with social life), receive targeted education based on their current health status, set PA goals, and share their data with others. The ProACT platform was designed and developed following an extensive user-centered design process. This involved interviews, focus groups, co-design sessions (hands-on design activities with participants), and usability testing before the platform’s deployment in the trial. A total of 58 people with multimorbidity and 106 care network participants, including informal carers, formal carers, and health care professionals, took part in this process. Findings from the user-centered design process have been published elsewhere [ 42 , 43 ]. More detailed information about the full ProACT platform and the CareApp used by participants can be found in the study by Doyle et al [ 44 ].

The study took place between April 1, 2018, and June 30, 2019. Participants in the trial typically participated for 12 months, although some stayed on for 14 months and others for 9 months (in the case of those who entered the trial later). One of the trial objectives was to understand real-world engagement. Therefore, participants were asked to take readings with the devices and provide SR data in the ProACT CareApp whenever they wished (not necessarily daily). As part of the trial, participants were assisted by technical help desk staff who responded to questions about the technology, and home visits were conducted as needed to resolve issues. In addition, a clinical triage service monitored the participants’ readings and contacted them in instances of abnormal parameter values (eg, high BP and low SpO 2 levels) [ 45 ]. Participants also received a monthly check-in telephone call from 1 of the triage nurses.

Table 1 outlines the types of health and well-being metrics that were collected, as well as the collection method and the number of participants who collected that type of data. The health and well-being metrics were determined from the interviews and focus groups held with health care professionals during the design of the ProACT platform to determine the most important symptom and well-being parameters to monitor across the health conditions of interest [ 42 ]. Off-the-shelf digital devices manufactured by 2 providers, Withings and iHealth, were used during the trial. Data from these providers were extracted into a custom platform called Context-Aware Broker and Inference Engine–Subject Information Management System (CABIE-SIMS), which includes a data aggregator for storing health and well-being data. All devices require the user to interact with them in some way. However, some devices needed more interaction than others (eg, taking a BG reading involved several steps, but PA and sleep only required participants to open the activity watch app to sync the relevant data). The activity watch was supposed to synchronize automatically without user interaction. However, inconsistencies with syncing meant that users were advised to open the Withings app to sync their data. The CABIE-SIMS platform would display the readings in near real time, apart from PA data, which were collected at regular intervals throughout the day, whereas sleep data were gathered every morning. Table 1 lists the types of data that were collected and the number of participants who collected them. In addition, semistructured interviews were conducted with all participants at 4 time points throughout the trial to understand their experience of using the ProACT platform. Although a full qualitative thematic analysis was outside the scope of this study and was reported on elsewhere [ 44 ], interview transcripts for participants of interest to the analysis presented in this paper were reviewed as part of this study to provide an enhanced understanding of the results.

a SpO 2 : peripheral oxygen saturation.

b HF: heart failure.

c ProACT: Integrated Technology Systems for Proactive Patient Centred Care.

d CABIE-SIMS: Context-Aware Broker and Inference Engine–Subject Information Management System.

e COPD: chronic obstructive pulmonary disease.

Data Analysis Methods

The original data set in the CABIE-SIMS platform was formatted using the JSON format. As a first step, a JSON-to-CSV file converter was used to make the data set more accessible for data analysis. The main focus was on dealing with duplicate data and missing data during the data cleaning phase. Data duplication might occur when a user uploads their SpO 2 reading 3 times in 2 minutes as a result of mispressing the button. In such cases, only 1 record was added to the cleaned data file. As for missing data, the data set file comprised “N/A” (not available) values for all missing data.

The cleaned data set was preprocessed using Microsoft Excel, the R programming language (R Foundation for Statistical Computing), and RStudio (Posit Software, PBC). The preprocessed data set included participants’ details (ID, sex, age, and chronic health conditions) and the number of days of weekly submissions of every parameter (BP, pulse rate, SpO 2 level, BG level, weight, PA, SR data, and sleep). All analyses (including correlation analysis, principal component analysis [PCA], k-means clustering, 2-tailed t test, and 1-way ANOVA) were implemented in the R programming language and RStudio.

After performing Shapiro-Wilk normality tests on the data submitted each week, we found that the data were not normally distributed. Therefore, Spearman correlation was used to check the correlation among the parameters. Correlation analysis and PCA were used to determine which portions of the data would be included in the k-means clustering. Correlation analysis determined which characteristics or parameters should be selected, and PCA determined the number of dimensions that should be selected as features for clustering. In the clustering process, the weekly submission of each parameter was considered as an independent variable for the discovery of participant clusters, and the outcome of the clustering was a categorical taxonomy that was used to label the 3 discovered clusters. Similarly, the Shapiro-Wilk test was conducted to check the normality of the variables in each group. It was found that most of the variables in each group were normally distributed, and only the weight data submission records of cluster 3, the PA data submission records of cluster 2, the SR data submission records of cluster 3, and the sleep data submission records of cluster 1 were not normally distributed. Therefore, the 2-tailed t test and 1-way ANOVA were used to compare different groups of variables. The 2-tailed t test was used to compare 2 groups of variables, whereas 1-way ANOVA was used to compare ≥2 groups of variables. P values >.05 indicated that there were no statistically significant differences among the groups of variables [ 46 ].

As for the qualitative data from the interviews, we performed keyword searches after a review of the entire interview; for example, when the data analysis was related to BP and weight monitoring, a search with the keywords “blood pressure,” “weight,” or “scale” was performed to identify relevant information. In addition, when the aim was to understand the impact of digital health care technology, we focused on specific questions in the second interview, such as “Has it had any impact on the management of your health?”

Ethical Considerations

Ethics approval was received from 3 ethics committees: the Health Service Executive North East Area Research Ethics Committee, the School of Health and Science Research Ethics Committee at Dundalk Institute of Technology, and the Faculty of Health Sciences Research Ethics Committee at Trinity College Dublin. All procedures were in line with the European Union’s General Data Protection Regulation for research projects, with the platform and trial methods and procedures undergoing data protection impact assessments. Written informed consent was obtained on an individual basis from participants in accordance with legal and ethics guidelines after a careful explanation of the study and the provision of patient information and informed consent forms in plain language. All participants were informed of their right to withdraw from the study at any time without having to provide a reason. Participants were not compensated for their time. Data stored within the CABIE-SIMS platform were identifiable because they were shared (with the participant’s consent) with the clinical triage teams and health care professionals. This was clearly outlined in the participant information leaflet and consent form. However, the data set that was extracted for the purpose of the analysis presented in this paper was pseudonymized.

Participants

A total of 60 older adults were enrolled in the study. The average age of participants was 74 (SD 6.4; range 65-92) years; 60% (36) were male individuals, and 40% (24/60) were female individuals. The most common combination of health conditions was diabetes and HD (30/60, 50%), which was followed by COPD and HD (16/60, 27%); HF and HD (7/60, 12%); diabetes and COPD (3/60, 5%); diabetes and HF (1/60, 2%); COPD and HF (1/60, 2%); HF, HD, and COPD (1/60, 2%); and COPD, HD, and diabetes (1/60, 2%). Of the 60 participants, 11 (18%) had HF, 55 (92%) had HD, 22 (37%) had COPD, and 31 (52%) had diabetes. Over the course of the trial, of the 60 participants, 8 (13%) withdrew, and 3 (5%) died. However, this study included data from all participants in the beginning, as long as the participant had at least 1 piece of data. Hence, of the 60 participants, we included 56 (93%) in our analysis, whereas 4 (7%) were excluded because no data were recorded.

Correlation of Submission Parameters

To help determine which distinct use characteristics or parameters (such as the weekly frequency of BP data submissions) should be selected as features for clustering, the correlations among the parameters were calculated. Figure 1 shows the correlation matrix for all parameter weekly submissions (days). In this study, a moderate correlation (correlation coefficient between 0.3 to 0.7 and −0.7 to −0.3) [ 47 , 48 ] was chosen as the standard for selecting parameters. First, every participant received a BP monitor to measure BP, and pulse rate was collected as part of the BP measurement. Moreover, the correlation coefficient between BP and pulse rate was 0.93, a strong correlation. In this case, BP was selected for clustering rather than pulse rate. As for the other parameters, the correlations between BP and weight (0.51), PA (0.55), SR data (0.41), and sleep (0.55) were moderate, whereas the correlations between BP and SpO 2 level (0.05) and BG (0.24) were weak. In addition, the correlations between SpO 2 level and weight (−0.25), PA (0.16), SR data (0.29), and sleep (−0.24) were weak. Therefore, SpO 2 level was not selected for clustering. Likewise, the correlations between BG and weight (0.19), PA (0.2), SR data (−0.06), and sleep (0.25) were weak. Therefore, BG was not selected for clustering. Thus, BP, weight, PA, SR data, and sleep were selected for clustering.

research paper qualitative and quantitative

PCA and Clustering

The fundamental question for k-means clustering is this: how many clusters (k) should be discovered? To determine the optimum number of clusters, we further investigated the data through visualization offered by PCA. As can be seen from Figure 2 , the first 2 principal components (PCs) explain 73.6% of the variation, which is an acceptably large percentage. However, after a check of individual contributions, we found that there were 3 participants—P038, P016, and P015—who contributed substantially to PC1 and PC2. After a check of the original data set, we found that P038 submitted symptom parameters only on 1 day, and P016 submitted symptom parameters only on 2 days. Conversely, P015 submitted parameters almost every day during the trial. Therefore, P038 and P016 were omitted from clustering.

After removing the outliers (P038 and P016), we found that the first 2 PCs explain 70.5% of the variation ( Figure 3 ), which is an acceptably large percentage.

The clusters were projected into 2 dimensions as shown in Figure 4 . Each subpart in Figure 4 shows a different number of clusters (k). When k=2, the data are obviously separated into 2 big clusters. Similarly, when k=3, the clusters are still separated very well into 3 clusters. When k=4, the clusters are well separated, but compared with the subpart with 3 clusters, 2 clusters are similar, whereas cluster 1, which only has 3 participants, is a relatively small cluster. When k=5, there is some overlap between cluster 1 and cluster 2. Likewise, Figure 5 shows the optimal number of clusters using the elbow method. In view of this, we determined that 3 clusters of participants separate the data set best. The 3 clusters can be labeled as the least engaged user group (cluster 1), the highly engaged user group (cluster 2), and the typical user group (cluster 3).

In the remainder of this section, we report on the examination of the clusters with respect to participant characteristics and the weekly submissions (days) of different parameters in a visual manner to reveal potential correlations and insights. Finally, we report on the examination of the correlations among all parameters by PCA.

research paper qualitative and quantitative

Participant Characteristics

As seen in Figure 6 , the distribution of age within the 3 clusters is similar, with the P value of the 1-way ANOVA being .93, because all participants in this trial were older adults. However, the median age in the cluster 3 box plot is slightly higher than the median ages in the box plots of the other 2 clusters, and the average age of cluster 2 participants (74.1 years) is lower than that of cluster 1 (74.6 years) and cluster 3 (74.8 years; Table 2 ) participants. As Table 2 shows, 6 (26%) of the 23 female participants are in cluster 1 compared with 7 (23%) of the 31 male participants. However, the male participants in cluster 2 (10/31, 32%) and cluster 3 (14/31, 45%) represent higher proportions of total male participants compared with female participants in cluster 2 (7/23, 30%) and cluster 3 (10/23, 43%). Figure 7 shows the proportion of the 4 chronic health conditions within the 3 clusters. Cluster 1 has the largest proportion of participants with COPD and the smallest proportion of participants with diabetes. Moreover, cluster 3 has the smallest proportion of participants with HF (3/24, 13%; Table 2 ).

research paper qualitative and quantitative

a COPD: chronic obstructive pulmonary disease.

research paper qualitative and quantitative

Participant Engagement Outcomes

Cluster 2 has the longest average enrollment time at 352 days compared with cluster 3 at 335 days and cluster 1 at 330 days. As seen in Figure 8 , the overall distribution of the BP data weekly submissions is different, with the P value of the 1-way ANOVA being 8.4 × 10 −9 . The frequency of BP data weekly submissions (days) of cluster 2 exceeds the frequencies of cluster 1 and cluster 3, which means that participants in cluster 2 have a higher frequency of BP data submissions than those in the other 2 clusters. The median and maximum of cluster 3 are higher than those of cluster 1, but the minimum of cluster 3 is lower than that of cluster 1. Likewise, as seen in Table 3 , the mean and SD of cluster 1 (mean 2.5, SD 1.4) are smaller than those of cluster 3 (mean 2.9, SD 2.9).

As Figure 9 shows, the overall distribution of the weekly submissions of weight data is different, with the P value of the 1-way ANOVA being 1.4 × 10 −13 , because the participants in cluster 2 submitted weight parameters more frequently than those in cluster 1 and cluster 3. In addition, similar to the BP data submissions, the median of cluster 3 is higher than that of cluster 1. As seen in Figure 9 , there are 3 outliers in cluster 2. The top outlier is P015, who submitted a weight reading almost every day. During the trial, this participant mentioned many times in the interviews that his goal was to lose weight and that he used the scale to check his progress:

I’ve set out to reduce my weight. The doctor has been saying to me you know there’s where you are and you should be over here. So, I’ve been using the weighing thing just to clock, to track reduction of weight. [P015]

The other 2 outliers are P051 and P053, both of whom mentioned taking their weight measurements as part of their daily routine:

Once I get up in the morning the first thing is I weigh myself. That is, the day starts off with the weight, right. [P053]

Although their frequency of weekly weight data submissions is lower than that of all other participants in cluster 2, it is still higher than that of most of the participants in the other 2 clusters.

In Table 3 , it can be observed that the average frequency of weekly submissions of PA and sleep data for every cluster is higher than the frequencies of other variables, and the SDs are relatively low. This is likely because participants only needed to open the Withings app once a day to ensure the syncing of data. However, the overall distributions of PA and sleep data submissions are different in Figure 10 and Figure 11 , with the P values of the 1-way ANOVA being 1.1 × 10 −9 and 3.7 × 10 −10 , respectively. Moreover, as Figure 10 and Figure 11 show, there are still some outliers who have a low frequency of submissions, and the box plot of cluster 1 is lower than the box plots of cluster 2 and cluster 3 in both figures. The reasons for the low frequency of submissions can mostly be explained by (1) technical issues, including internet connection issues, devices not syncing, and devices needing to be paired again; (2) participants forgetting to put the watch back on after taking it off; and (3) participants stopping using the devices (eg, some participants do not like wearing the watch while sleeping or when they go on holiday):

I was without my watch there for the last month or 3 or 4 weeks [owing to technical issues], and I missed it very badly because everything I look at the watch to tell the time, I was looking at my steps. [P042]
I don’t wear it, I told them I wouldn’t wear the watch at night, I don’t like it. [P030]

Unlike in the case of other variables, the submission of SR data through the ProACT CareApp required participants to reflect on each question and their status before selecting the appropriate answer. Participants had different questions to answer based on their health conditions; for example, participants with HF and COPD were asked to answer symptom-related questions, whereas those with diabetes were not. All participants were presented with general well-being and mood questions. Therefore, for some participants, self-reporting could possibly take more time than using the health monitoring devices. As shown in Table 3 , the frequency of average weekly submissions of SR data within the 3 clusters is relatively small and the SDs are large, which means that the frequency of SR data submissions is lower than that of other variables. Furthermore, there were approximately 5 questions asked daily about general well-being, and some participants would skip the questions if they thought the question was unnecessary or not relevant:

Researcher: And do you answer your daily questions? P027: Yeah, once a week.
Researcher: Once a week, okay. P027: But they’re the same.

As Figure 12 shows, the distribution of SR data submissions is different, with the P value of the 1-way ANOVA being .001. In Figure 12 , the median of cluster 2 is higher than the medians of the other 2 clusters, and compared with other variables, but unlike other parameters, cluster 2 also has some participants who had very low SR data submission rates (close to 0). SR data is the only parameter where cluster 1 has a higher median than cluster 3.

research paper qualitative and quantitative

a Lowest submission rate across the clusters.

b Highest submission rate across the clusters.

research paper qualitative and quantitative

The Correlation Among the Weekly Submissions of Different Parameters

As seen in Figure 13 , the arrows of BP and weight point to the same side of the plot, which shows a strong correlation. Likewise, PA and sleep also have a strong correlation. As noted previously, the strong correlation between PA and sleep is because the same device collected these 2 measurements, and participants only needed to sync the data once a day. By contrast, BP and weight were collected by 2 different devices but are strongly correlated. During interviews, many participants mentioned that their daily routine with the ProACT platform involved taking both BP and weight readings:

Usually in the morning when I get out of the bed, first, I go into the bathroom, wash my hands and come back, then weigh myself, do my blood pressure, do my bloods. [P008]
I now have a routine that I let the system read my watch first thing, then I do my blood pressure thing and then I do the weight. [P015]
As I said, it’s keeping me in line with my, when I dip my finger, my weight, my blood pressure. [P040]
I use it in the morning and at night for putting in the details of blood pressure in the morning and then the blood glucose at night. Yes, there’s nothing else, is there? Oh, every morning the [weight] scales. [P058]

By contrast, as shown in Figure 13 , SR data have a weak correlation with other parameters, for reasons noted earlier.

research paper qualitative and quantitative

Parameter Variation Over Time

Analysis was conducted to determine any differences among the clusters in terms of symptom and well-being parameter changes over the course of the trial. Table 4 provides a description of each cluster in this regard. As Figure 14 shows, the box plot of cluster 2 is comparatively short in every time period of the trial, and the medians of cluster 2 and cluster 3 are more stable than the median of cluster 1. In addition, the median of cluster 1 is increasing over time, whereas the medians of cluster 2 and cluster 3 are decreasing and within the normal systolic BP of older adults [ 49 ] ( Figure 14 ). As can be seen in Table 5 , cluster 2 has a P value of .51 for systolic BP and a P value of .52 for diastolic BP, which are higher than the P values of cluster 1 ( P =.19 and P =.16, respectively) and cluster 3 ( P =.27 and P =.35, respectively). Therefore, participants in cluster 2, as highly engaged users, have more stable B P values than those in the other 2 clusters. By contrast, participants in cluster 1, as the least engaged users, have the most unstable B P values.

As seen in Figure 15 , the median of cluster 2 is relatively higher than the medians of the other 2 clusters. The median of cluster 3 is increasing over time. In the second and third time periods of the trial, the box plot of cluster 1 is comparatively short. Normal SpO 2 levels are between 95% and 100%, but older adults may have SpO 2 levels closer to 95% [ 50 ]. In addition, for patients with COPD, SpO 2 levels range between 88% and 92% [ 51 ]. In this case, there is not much difference in terms of SpO 2 levels, and most of the SpO 2 levels are between 90% and 95% in this study. However, the SpO 2 levels of cluster 1 and cluster 2 were maintained at a relatively high level during the trial. As for cluster 3, the SpO 2 levels were comparatively low but relatively the same as those in the other 2 clusters in the later period of the trial. Therefore, the SpO 2 levels of cluster 3 ( P =.25) are relatively unstable compared with those of cluster 1 ( P =.66) and cluster 2 ( P =.59). As such, there is little correlation between SpO 2 levels and engagement with digital health monitoring.

In relation to BG, Figure 16 shows that the box plot of cluster 2 is relatively lower than the box plots of the other 2 clusters in the second and third time periods. Moreover, the medians of cluster 2 and cluster 3 are lower than those of cluster 1 in the second and third time periods. The BG levels in cluster 2 and cluster 3 decreased at later periods of the trial compared with the beginning of the trial, but those in cluster 1 increased. Cluster 3 ( P =.25), as the typical user group, had more significant change than cluster 1 ( P =.50) and cluster 2 ( P =.41). Overall, participants with a higher engagement rate had better BG control.

In relation to weight, Figure 17 shows that the box plot of cluster 2 is lower than the box plots of the other 2 clusters and comparatively short. As Table 5 shows, the P value of cluster 2 weight data is .72, which is higher than the P values of cluster 1 (.47) and cluster 3 (.61). Therefore, participants in cluster 2 had a relatively stable weight during the trial. In addition, as seen in Figure 17 , the median weight of cluster 1 participants is decreasing, whereas that of cluster 3 participants is increasing. It is well known that there are many factors that can influence body weight, such as PA, diet, environmental factors, and so on. [ 52 ]. In this case, engagement with digital health and well-being monitoring may help control weight but the impact is not significant.

As Table 5 shows, the P value of cluster 2 PA (.049) is lower than .05, which means that there are significant differences among the 3 time slots in cluster 2. However, the median of cluster 2 PA, as seen in Figure 18 , is still higher than the medians of the other 2 clusters. In cluster 2, approximately 50% of daily PA (steps) consists of >2500 steps. Overall, participants with a higher engagement rate also had a higher level of PA.

a BP: blood pressure.

b BG: blood glucose.

c SR: self-report.

d PA: physical activity.

research paper qualitative and quantitative

b SpO 2 : peripheral oxygen saturation.

c BG: blood glucose.

research paper qualitative and quantitative

Principal Findings

Digital health technologies hold great promise to help older adults with multimorbidity to improve health management and health outcomes. However, such benefits can only be realized if users engage with the technology. The aim of this study was to explore the engagement patterns of older adults with multimorbidity with digital self-management by using data mining to analyze users’ weekly submission data. Three clusters were identified: cluster 1 (the least engaged user group), cluster 2 (the highly engaged user group), and cluster 3 (the typical user group). The subsequent analysis focused on how the clusters differ in terms of participant characteristics, patterns of engagement, and stabilization of health condition symptoms and well-being parameters over time, as well as how engagement rates with the different devices correlate with each other.

The key findings from the study are as follows:

  • There is no significant difference in participants’ characteristics among the clusters in general. The highly engaged group had the lowest average age ( Table 4 ), and there was no significant difference with regard to sex and health conditions among these clusters. The least engaged user group had fewer male participants and participants with diabetes.
  • There are 3 main factors influencing the correlations among the submission rates of different parameters. The first concerns whether the same device was used to submit the parameters, the second concerns the number of manual operations required to submit the parameter, and the third concerns the daily routine of the participants.
  • Increased engagement with devices may improve the participants’ health and well-being outcomes (eg, symptoms and PA levels). However, the difference between the highly engaged user group and the typical user group was relatively minimal compared with the difference between the highly engaged user group and the least engaged user group.

Each of these findings is discussed in further detail in the following subsections.

Although the findings presented in this paper focus on engagement based on the ProACT trial participants’ use data, the interviews that were carried out as part of the trial identified additional potential factors of engagement. As reported in the study by Doyle et al [ 44 ], participants spoke about how they used the data to support their self-management (eg, taking action based on their data) and experienced various benefits, including increased knowledge of their health conditions and well-being, symptom optimization, reductions in weight, increased PA, and increased confidence to participate in certain activities as a result of health improvements. The peace of mind and encouragement provided by the clinical triage service as well as the technical support available were also identified during the interviews as potential factors positively impacting engagement [ 44 ]. In addition, the platform was found to be usable, and it imposed minimal burden on participants ( Table 1 ). These findings supplement the quantitative findings presented in this paper.

Age, Sex, Health Condition Types, and Engagement

In this study, the difference in engagement with health care technologies between the sex was not significant. Of the 23 female participants, 6 (26%) were part of the least engaged user group compared with 7 (23%) of the 31 male participants. Moreover, there were lower proportions of female participants in the highly engaged user group (7/23, 30%) and typical user group (10/23, 43%) compared with male participants (10/31, 32% and 14/31, 45%, respectively). Other research has found that engagement with mobile health technology for BP monitoring was independent of sex [ 53 ]. However, there are also some studies that show that female participants are more likely to engage with digital mental health care interventions [ 54 , 55 ]. Therefore, sex cannot be considered as a separate criterion when comparing engagement with health care technologies, and it was not found to have significant impact on engagement in this study. Regarding age, many studies have shown that younger people are more likely to use health care technologies than older adults [ 56 , 57 ]. Although all participants in our study are older adults, the highly engaged user group is the youngest group. However, there was no significant difference in age among the clusters, with some of the oldest users being part of cluster 3, the typical user cluster. Similarly, the health conditions of a participant did not significantly impact their level of engagement. Other research [ 53 ] found that participants who were highly engaged with health monitoring had higher rates of hypertension, chronic kidney disease, and hypercholesterolemia than those with lower engagement levels. Our findings indicate that the highly engaged user group had a higher proportion of participants with diabetes, and the least engaged user group had a higher proportion of participants with COPD. Further research is needed to understand why there might be differences in engagement depending on health conditions. In our study, participants with COPD also self-reported on certain symptoms, such as breathlessness, chest tightness, and sputum amount and color. Although engagement with specific questions was not explored, participants in cluster 1, the least engaged user group, self-reported more frequently than those in cluster 3, the typical user group. Our findings also indicate that participants monitoring BG level and BP experienced better symptom stabilization over time than those monitoring SpO 2 level. It has been noted that the expected benefits of technology (eg, increased safety and usefulness) and need for technology (eg, subjective health status and perception of need) are 2 important factors that can influence the acceptance and use of technology by older adults [ 58 ]. It is also well understood that engaging in monitoring BG level can help people with diabetes to better self-manage and make decisions about diet, exercise, and medication [ 59 ].

Factors Influencing Engagement

Many research studies use P values to show the level of similarity or difference among clusters [ 60 - 63 ]. For most of the engagement outcomes in this study, all clusters significantly differed, with 1-way ANOVA P <.001, with the exception being SR data ( P =.001). In addition, the 2-tailed t test P values showed that cluster 2 was significantly different from cluster 1 and cluster 3 in BP and weight data submission rates, whereas cluster 1 was significantly different from cluster 2 and cluster 3 in PA and sleep data submission rates. As for SR data submission rates, all 3 two-tailed t tests had P values >.001, meaning that there were no significant differences between any 2 of these clusters. Therefore, all 5 parameters used for clustering were separated into 3 groups based on the correlations of submission rates: 1 for BP and weight, 1 for PA and sleep, and 1 for SR data. PA and sleep data submission rates have a strong correlation because participants used the same device to record daily PA and sleeping conditions. SR data submission rates have a weak correlation with other parameters’ submission rates. Our previous research found that user retention in terms of submitting SR data was poorer than user retention in terms of using digital health devices, possibly because more manual operations are involved in the submission of SR data than other parameters or because the same questions were asked regularly, as noted by P027 in the Participant Engagement Outcomes subsection [ 64 ].

Other research that analyzed engagement with a diabetes support app found that user engagement was lower when more manual data entry was required [ 65 ]. In contrast to the other 2 groups of parameters, BP and weight data are collected using different devices. Whereas measuring BP requires using a BP monitor and manually synchronizing the data, measuring weight simply requires standing on the weight scale, and the data are automatically synchronized. Therefore, the manual operations involved in submitting BP and weight data are slightly different. However, the results showed a strong correlation between BP and weight because many participants preferred to measure both BP and weight together and incorporate taking these measurements into their daily routines. Research has indicated that if the use of a health care device becomes a regular routine, then participants will use it without consciously thinking about it [ 66 ]. Likewise, Yuan et al [ 67 ] note that integrating health apps into people’s daily activities and forming regular habits can increase people’s willingness to continue using the apps. However, participants using health care technology for long periods of time might become less receptive to exploring the system compared with using it based on the established methods to which they are accustomed [ 68 ]. In this study, many participants bundled their BP measurement with their weight measurement during their morning routine. Therefore, the engagement rates of interacting with these 2 devices were enhanced by each other. Future work could explore how to integrate additional measurements, such as monitoring SpO 2 level as well as self-reporting into this routine (eg, through prompting the user to submit these parameters while they are engaging with monitoring other parameters, such as BP and weight).

Relationship Between Engagement and Health and Well-Being Outcomes

Our third finding indicates that higher levels of engagement with digital health monitoring may result in better outcomes, such as symptom stabilization and increased PA levels. Milani et al [ 69 ] found that digital health care interventions can help people achieve BP control and improve hypertension control compared with usual care. In their study, users in the digital intervention group took an average of 4.2 readings a week. Compared with our study, this rate is lower than that of cluster 2 (5.7), the highly engaged user group, but higher than cluster 1 (2.5) and cluster 3 (2.9) rates. In our study, participants with a higher engagement rate experienced more stable BP, and for the majority of these participants (34/41, 83%), levels were maintained within the recommended thresholds of 140/90 mm Hg [ 70 ]. Many studies have shown that as engagement in digital diabetes interventions increases, patients will experience greater reductions in BG level compared with those with lower engagement [ 71 , 72 ]. However, in our study, BG levels in both the highly engaged user group (cluster 2) and the least engaged user group (cluster 1) increased in the later stages of the trial. Only the BG levels of the typical user group (cluster 3) decreased over time, which could be because the cluster 3 participants performed more PA in the later stages of the trial than during other time periods, as Figure 18 shows. Cluster 2, the highly engaged user group, maintained a relatively high level of PA during the trial period, although it continued to decline throughout the trial. Other research shows that more PA can also lead to better weight control and management [ 73 , 74 ], which could be 1 of the reasons why cluster 2 participants maintained their weight.

Limitations

There are some limitations to the research presented in this paper. First, although the sample size (n=60) was relatively large for a digital health study, the sample sizes for some parameters were small because not all participants monitored all parameters. Second, the participants were clustered based on weekly submissions of parameters only. If more features were included in clustering, such as submission intervals, participants could be grouped differently. It should also be pointed out that correlation is not a causality with respect to analyzing engagement rates with outcomes.

Conclusions

This study presents findings after the clustering of a data set that was generated from a longitudinal study of older adults using a digital health technology platform (ProACT) to self-manage multiple chronic health conditions. The highly engaged user group cluster (includes 17/54, 31% of users) had the lowest average age and highest frequency of submissions for every parameter. Engagement with digital health care technologies may also influence health and well-being outcomes (eg, symptoms and PA levels). The least engaged user group in our study had relatively poorer outcomes. However, the difference between the outcomes of the highly engaged user group and those of the typical user group is relatively small. There are 3 possible reasons for the correlations between the submission rates of parameters and devices. First, if 2 parameters are collected by the same device, they usually have a strong correlation, and users will engage with both equally. Second, the devices that involve fewer steps and parameters with less manual data entry will have a weak correlation with those devices that require more manual operations and data entry. Finally, participants’ daily routines also influence the correlations among devices; for example, in this study, many participants had developed a daily routine to weigh themselves after measuring their BP, which led to a strong correlation between BP and weight data submission rates. Future work should explore how to integrate the monitoring of additional parameters into a user’s routine and whether additional characteristics, such as the severity of disease or technical proficiency, impact engagement.

Acknowledgments

This work was part funded by the Integrated Technology Systems for Proactive Patient Centred Care (ProACT) project and has received funding from the European Union (EU)–funded Horizon 2020 research and innovation program (689996). This work was part funded by the EU’s INTERREG VA program, managed by the Special EU Programs Body through the Eastern Corridor Medical Engineering Centre (ECME) project. This work was part funded by the Scaling European Citizen Driven Transferable and Transformative Digital Health (SEURO) project and has received funding from the EU-funded Horizon 2020 research and innovation program (945449). This work was part funded by the COVID-19 Relief for Researchers Scheme set up by Ireland’s Higher Education Authority. The authors would like to sincerely thank all the participants of this research for their valuable time.

Conflicts of Interest

None declared.

  • Ageing. United Nations. 2020. URL: https://www.un.org/en/global-issues/ageing [accessed 2022-01-13]
  • Centers for Disease Control and Prevention (CDC). Trends in aging--United States and worldwide. MMWR Morb Mortal Wkly Rep. Feb 14, 2003;52(6):101-104. [ FREE Full text ] [ Medline ]
  • Valderas JM, Starfield B, Sibbald B, Salisbury C, Roland M. Defining comorbidity: implications for understanding health and health services. Ann Fam Med. Jul 13, 2009;7(4):357-363. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Marengoni A, Angleman S, Melis R, Mangialasche F, Karp A, Garmen A, et al. Aging with multimorbidity: a systematic review of the literature. Ageing Res Rev. Sep 2011;10(4):430-439. [ CrossRef ] [ Medline ]
  • Zhang L, Ma L, Sun F, Tang Z, Chan P. A multicenter study of multimorbidity in older adult inpatients in China. J Nutr Health Aging. Mar 2020;24(3):269-276. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • van der Heide I, Snoeijs S, Melchiorre MG, Quattrini S, Boerma W, Schellevis F, et al. Innovating care for people with multiple chronic conditions in Europe. Innovating Care for people with Multiple Chronic Conditions in Europe (ICARE4EU). 2015. URL: http:/​/www.​icare4eu.org/​pdf/​Innovating-care-for-people-with-multiple-chronic-conditions-in-Europe.​pdf [accessed 2024-01-29]
  • Bartlett SJ, Lambert SD, McCusker J, Yaffe M, de Raad M, Belzile E, et al. Self-management across chronic diseases: targeting education and support needs. Patient Educ Couns. Feb 2020;103(2):398-404. [ CrossRef ] [ Medline ]
  • Anekwe TD, Rahkovsky I. Self-management: a comprehensive approach to management of chronic conditions. Am J Public Health. Dec 2018;108(S6):S430-S436. [ CrossRef ]
  • Barlow J, Wright C, Sheasby J, Turner A, Hainsworth J. Self-management approaches for people with chronic conditions: a review. Patient Educ Couns. 2002;48(2):177-187. [ CrossRef ] [ Medline ]
  • Setiawan IM, Zhou L, Alfikri Z, Saptono A, Fairman AD, Dicianno BE, et al. An adaptive mobile health system to support self-management for persons with chronic conditions and disabilities: usability and feasibility studie. JMIR Form Res. Apr 25, 2019;3(2):e12982. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Alanzi T. mHealth for diabetes self-management in the Kingdom of Saudi Arabia: barriers and solutions. J Multidiscip Healthc. 2018;11:535-546. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nunes F, Verdezoto N, Fitzpatrick G, Kyng M, Grönvall E, Storni C. Self-care technologies in HCI. ACM Trans Comput Hum Interact. Dec 14, 2015;22(6):1-45. [ CrossRef ]
  • Klasnja P, Kendall L, Pratt W, Blondon K. Long-term engagement with health-management technology: a dynamic process in diabetes. AMIA Annu Symp Proc. 2015;2015:756-765. [ FREE Full text ] [ Medline ]
  • Talboom-Kamp EP, Verdijk NA, Harmans LM, Numans ME, Chavannes NH. An eHealth platform to manage chronic disease in primary care: an innovative approach. Interact J Med Res. Feb 09, 2016;5(1):e5. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tighe SA, Ball K, Kensing F, Kayser L, Rawstorn JC, Maddison R. Toward a digital platform for the self-management of noncommunicable disease: systematic review of platform-like interventions. J Med Internet Res. Oct 28, 2020;22(10):e16774. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Pettersson B, Wiklund M, Janols R, Lindgren H, Lundin-Olsson L, Skelton DA, et al. 'Managing pieces of a personal puzzle' - older people's experiences of self-management falls prevention exercise guided by a digital program or a booklet. BMC Geriatr. Feb 18, 2019;19(1):43. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kario K. Management of hypertension in the digital era: mall wearable monitoring devices for remote blood pressure monitoring. Hypertension. Sep 2020;76(3):640-650. [ CrossRef ]
  • Koh HC, Tan G. Data mining applications in healthcare. J Healthc Inf Manag. 2005;19(2):64-72. [ Medline ]
  • Alsayat A, El-Sayed H. Efficient genetic k-means clustering for health care knowledge discovery. In: Proceedings of the 14th International Conference on Software Engineering Research, Management and Applications. 2016. Presented at: SERA '16; June 8-10, 2016;45-52; Towson, MD. URL: https://ieeexplore.ieee.org/document/7516127 [ CrossRef ]
  • Katsis Y, Balac N, Chapman D, Kapoor M, Block J, Griswold WG, et al. Big data techniques for public health: a case study. In: Proceedings of the 2017 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies. 2017. Presented at: CHASE '17; July 17-19, 2017;222-231; Philadelphia, PA. URL: https://ieeexplore.ieee.org/document/8010636 [ CrossRef ]
  • Elbattah M, Molloy O. Data-driven patient segmentation using k-means clustering: the case of hip fracture care in Ireland. In: Proceedings of the 2017 Australasian Computer Science Week Multiconference. 2017. Presented at: ACSW '17; January 30- February 3, 2017;1-8; Geelong, Australia. URL: https://dl.acm.org/doi/10.1145/3014812.3014874 [ CrossRef ]
  • Madigan EA, Curet OL. A data mining approach in home healthcare: outcomes and service use. BMC Health Serv Res. Feb 24, 2006;6(1):18. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Armstrong JJ, Zhu M, Hirdes JP, Stolee P. K-means cluster analysis of rehabilitation service users in the home health care system of Ontario: examining the heterogeneity of a complex geriatric population. Arch Phys Med Rehabil. Dec 2012;93(12):2198-2205. [ CrossRef ] [ Medline ]
  • Islam MS, Liu D, Wang K, Zhou P, Yu L, Wu D. A case study of healthcare platform using big data analytics and machine learning. In: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference. 2019. Presented at: HPCCT '19; June 22-24, 2019;139-146; Guangzhou, China. URL: https://dl.acm.org/doi/10.1145/3341069.3342980 [ CrossRef ]
  • Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N. Supporting healthcare management decisions via robust clustering of event logs. Knowl Based Syst. Aug 2015;84:203-213. [ CrossRef ]
  • Lefèvre T, Rondet C, Parizot I, Chauvin P. Applying multivariate clustering techniques to health data: the 4 types of healthcare utilization in the Paris metropolitan area. PLoS One. Dec 15, 2014;9(12):e115064. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ahmad P, Qamar S, Qasim Afser Rizvi S. Techniques of data mining in healthcare: a review. Int J Comput Appl. Jun 18, 2015;120(15):38-50. [ CrossRef ]
  • Mahoto NA, Shaikh FK, Ansari AQ. Exploitation of clustering techniques in transactional healthcare data. Mehran Univ Res J Eng Technol. 2014;33(1):77-92.
  • Zahi S, Achchab B. Clustering of the population benefiting from health insurance using k-means. In: Proceedings of the 4th International Conference on Smart City Applications. 2019. Presented at: SCA '19; October 2-4, 2019;1-6; Casablanca, Morocco. URL: https://dl.acm.org/doi/abs/10.1145/3368756.3369103 [ CrossRef ]
  • Jain AK. Data clustering: 50 years beyond k-means. Pattern Recognit Lett. Jun 2010;31(8):651-666. [ CrossRef ]
  • Silitonga P. Clustering of patient disease data by using k-means clustering. Int J Comput Sci Inf Sec. 2017;15(7):219-221. [ FREE Full text ]
  • Shakeel PM, Baskar S, Dhulipala VR, Jaber MM. Cloud based framework for diagnosis of diabetes mellitus using k-means clustering. Health Inf Sci Syst. Dec 24, 2018;6(1):16. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Berry E, Davies M, Dempster M. Illness perception clusters and relationship quality are associated with diabetes distress in adults with type 2 diabetes. Psychol Health Med. Oct 19, 2017;22(9):1118-1126. [ CrossRef ] [ Medline ]
  • Harrison S, Robertson N, Graham C, Williams J, Steiner M, Morgan M, et al. Can we identify patients with different illness schema following an acute exacerbation of COPD: a cluster analysis. Respir Med. Feb 2014;108(2):319-328. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lopes AC, Xavier RF, Ac Pereira AC, Stelmach R, Fernandes FL, Harrison SL, et al. Identifying COPD patients at risk for worse symptoms, HRQoL, and self-efficacy: a cluster analysis. Chronic Illn. Jun 17, 2019;15(2):138-148. [ CrossRef ] [ Medline ]
  • Cikes M, Sanchez-Martinez S, Claggett B, Duchateau N, Piella G, Butakoff C, et al. Machine learning-based phenogrouping in heart failure to identify responders to cardiac resynchronization therapy. Eur J Heart Fail. Jan 17, 2019;21(1):74-85. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Violán C, Roso-Llorach A, Foguet-Boreu Q, Guisado-Clavero M, Pons-Vigués M, Pujol-Ribera E, et al. Multimorbidity patterns with K-means nonhierarchical cluster analysis. BMC Fam Pract. Jul 03, 2018;19(1):108. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McCauley CO, Bond RB, Ryan A, Mulvenna MD, Laird L, Gibson A, et al. Evaluating user engagement with a reminiscence app using cross-comparative analysis of user event logs and qualitative data. Cyberpsychol Behav Soc Netw. Aug 2019;22(8):543-551. [ CrossRef ] [ Medline ]
  • Waterman H, Tillen D, Dickson R, de Koning K. Action research: a systematic review and guidance for assessment. Health Technol Assess. 2001;5(23):iii-157. [ FREE Full text ] [ Medline ]
  • Bashshur RL, Shannon GW, Smith BR, Alverson DC, Antoniotti N, Barsan WG, et al. The empirical foundations of telemedicine interventions for chronic disease management. Telemed J E Health. Sep 2014;20(9):769-800. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dinsmore J, Hannigan C, Smith S, Murphy E, Kuiper JM, O'Byrne E, et al. A digital health platform for integrated and proactive patient-centered multimorbidity self-management and care (ProACT): protocol for an action research proof-of-concept trial. JMIR Res Protoc. Dec 15, 2021;10(12):e22125. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Doyle J, Murphy E, Kuiper J, Smith S, Hannigan C, Jacobs A, et al. Managing multimorbidity: identifying design requirements for a digital self-management tool to support older adults with multiple chronic conditions. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019. Presented at: CHI '19; May 4-9, 2019;1-14; Glasgow, Scotland. URL: https://dl.acm.org/doi/10.1145/3290605.3300629 [ CrossRef ]
  • Doyle J, Murphy E, Hannigan C, Smith S, Bettencourt-Silva J, Dinsmore J. Designing digital goal support systems for multimorbidity self-management: insights from older adults and their care network. In: Proceedings of the 12th EAI International Conference on Pervasive Computing Technologies for Healthcare. 2018. Presented at: PervasiveHealth '18; May 21-24, 2018;168-177; New York, NY. URL: https://dl.acm.org/doi/10.1145/3240925.3240982 [ CrossRef ]
  • Doyle J, Murphy E, Gavin S, Pascale A, Deparis S, Tommasi P, et al. A digital platform to support self-management of multiple chronic conditions (ProACT): findings in relation to engagement during a one-year proof-of-concept trial. J Med Internet Res. Dec 15, 2021;23(12):e22672. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Doyle J, McAleer P, van Leeuwen C, Smith S, Murphy E, Sillevis Smitt M, et al. The role of phone-based triage nurses in supporting older adults with multimorbidity to digitally self-manage - findings from the ProACT proof-of-concept study. Digit Health. Oct 09, 2022;8:20552076221131140. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ross A, Willson VL. One-way anova. In: Ross A, Willson VL, editors. Basic and Advanced Statistical Tests: Writing Results Sections and Creating Tables and Figures. Cham, Switzerland. Springer; 2017;21-24.
  • Dancey CP, Reidy J. Statistics without Maths for Psychology. Upper Saddle River, NJ. Prentice Hall; 2007.
  • Akoglu H. User's guide to correlation coefficients. Turk J Emerg Med. Sep 2018;18(3):91-93. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Master AM, Dublin LI, Marks HH. The normal blood pressure range and its clinical implications. J Am Med Assoc. Aug 26, 1950;143(17):1464-1470. [ CrossRef ] [ Medline ]
  • Cunha JP. What is a good oxygen rate by age? eMedicineHealth. URL: https://www.emedicinehealth.com/what_is_a_good_ oxygen_rate_by_age/article_em.htm [accessed 2024-01-29]
  • Echevarria C, Steer J, Wason J, Bourke S. Oxygen therapy and inpatient mortality in COPD exacerbation. Emerg Med J. Mar 26, 2021;38(3):170-177. [ CrossRef ] [ Medline ]
  • Atkinson Jr RL, Butterfield G, Dietz W, Fernstrom J, Frank A, Hansen B. Weight Management: State of the Science and Opportunities for Military Programs. Washington, DC. National Academies Press; 2003.
  • Kaplan AL, Cohen ER, Zimlichman E. Improving patient engagement in self-measured blood pressure monitoring using a mobile health technology. Health Inf Sci Syst. Dec 07, 2017;5(1):4. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mikolasek M, Witt CM, Barth J. Adherence to a mindfulness and relaxation self-care app for cancer patients: mixed-methods feasibility study. JMIR Mhealth Uhealth. Dec 06, 2018;6(12):e11271. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Harjumaa M, Halttu K, Koistinen K, Oinas-Kukkonen H. User experience of mobile coaching for stress-management to tackle prevalent health complaints. In: Proceedings of the 6th Scandinavian Conference on Information Systems. 2015. Presented at: SCIS '15; August 9-12, 2015; Oulu, Finland. URL: https:/​/cris.​vtt.fi/​en/​publications/​user-experience-of-mobile-coaching-for-stress-management-to-tackl [ CrossRef ]
  • Kannisto KA, Korhonen J, Adams CE, Koivunen MH, Vahlberg T, Välimäki MA. Factors associated with dropout during recruitment and follow-up periods of a mHealth-based randomized controlled trial for mobile.net to encourage treatment adherence for people with serious mental health problems. J Med Internet Res. Feb 21, 2017;19(2):e46. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Abel EA, Shimada SL, Wang K, Ramsey C, Skanderson M, Erdos J, et al. Dual use of a patient portal and clinical video telehealth by veterans with mental health diagnoses: retrospective, cross-sectional analysis. J Med Internet Res. Nov 07, 2018;20(11):e11350. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Peek ST, Wouters EJ, van Hoof J, Luijkx KG, Boeije HR, Vrijhoef HJ. Factors influencing acceptance of technology for aging in place: a systematic review. Int J Med Inform. Apr 2014;83(4):235-248. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Weinstock RS, Aleppo G, Bailey TS, Bergenstal RM, Fisher WA, Greenwood DA, et al. The role of blood glucose monitoring in diabetes management. Compendia. Oct 2022;2020(3):1-32. [ CrossRef ] [ Medline ]
  • Rahman QA, Janmohamed T, Pirbaglou M, Ritvo P, Heffernan JM, Clarke H, et al. Patterns of user engagement with the mobile app, manage my pain: results of a data mining investigation. JMIR Mhealth Uhealth. Jul 12, 2017;5(7):e96. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Booth FG, R Bond R, D Mulvenna M, Cleland B, McGlade K, Rankin D, et al. Discovering and comparing types of general practitioner practices using geolocational features and prescribing behaviours by means of K-means clustering. Sci Rep. Sep 14, 2021;11(1):18289. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sulistyono MT, Pane ES, Wibawa AD, Purnomo MH. Analysis of EEG-based stroke severity groups clustering using k-means. In: Proceedings of the 2021 International Seminar on Intelligent Technology and Its Applications. 2021. Presented at: ISITIA '21; July 21-22, 2021;67-74; Surabaya, Indonesia. URL: https://ieeexplore.ieee.org/document/9502250 [ CrossRef ]
  • Oskooei A, Chau SM, Weiss J, Sridhar A, Martínez MR, Michel B. DeStress: deep learning for unsupervised identification of mental stress in firefighters from heart-rate variability (HRV) data. In: Shaban-Nejad A, Michalowski M, Buckeridge DL, editors. Explainability and Interpretability: Keys to Deep Medicine. Cham, Switzerland. Springer; 2020;93-105.
  • Sheng Y, Doyle J, Bond R, Jaiswal R, Gavin S, Dinsmore J. Home-based digital health technologies for older adults to self-manage multiple chronic conditions: a data-informed analysis of user engagement from a longitudinal trial. Digit Health. Sep 22, 2022;8:20552076221125957. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Böhm AK, Jensen ML, Sørensen MR, Stargardt T. Real-world evidence of user engagement with mobile health for diabetes management: longitudinal observational study. JMIR Mhealth Uhealth. Nov 06, 2020;8(11):e22212. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kim SS, Malhotra NK. A longitudinal model of continued is use: an integrative view of four mechanisms underlying postadoption phenomena. Manag Sci. May 2005;51(5):741-755. [ CrossRef ]
  • Yuan S, Ma W, Kanthawala S, Peng W. Keep using my health apps: discover users' perception of health and fitness apps with the UTAUT2 model. Telemed J E Health. Sep 2015;21(9):735-741. [ CrossRef ] [ Medline ]
  • O'Connor Y, O'Reilly P, O'Donoghue J. M-health infusion by healthcare practitioners in the national health services (NHS). Health Policy Technol. Mar 2013;2(1):26-35. [ CrossRef ]
  • Milani RV, Lavie CJ, Bober RM, Milani AR, Ventura HO. Improving hypertension control and patient engagement using digital tools. Am J Med. Jan 2017;130(1):14-20. [ CrossRef ] [ Medline ]
  • Williams B, Mancia G, Spiering W, Agabiti Rosei E, Azizi M, Burnier M, et al. ESC Scientific Document Group. 2018 ESC/ESH guidelines for the management of arterial hypertension: the task force for the management of arterial hypertension of the European Society of Cardiology (ESC) and the European Society of Hypertension (ESH). Eur Heart J. Sep 01, 2018;39(33):3021-3104. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Quinn CC, Butler EC, Swasey KK, Shardell MD, Terrin MD, Barr EA, et al. Mobile diabetes intervention study of patient engagement and impact on blood glucose: mixed methods analysis. JMIR Mhealth Uhealth. Feb 02, 2018;6(2):e31. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sepah SC, Jiang L, Ellis RJ, McDermott K, Peters AL. Engagement and outcomes in a digital diabetes prevention program: 3-year update. BMJ Open Diabetes Res Care. Sep 07, 2017;5(1):e000422. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Carroll JK, Moorhead A, Bond R, LeBlanc WG, Petrella RJ, Fiscella K. Who uses mobile phone health apps and does use matter? a secondary data analytics approach. J Med Internet Res. Apr 19, 2017;19(4):e125. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Demark-Wahnefried W, Schmitz KH, Alfano CM, Bail JR, Goodwin PJ, Thomson CA, et al. Weight management and physical activity throughout the cancer care continuum. CA Cancer J Clin. Jan 2018;68(1):64-89. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by T Leung, T de Azevedo Cardoso; submitted 05.02.23; peer-reviewed by B Chaudhry, M Peeples, A DeVito Dabbs; comments to author 12.09.23; revised version received 25.10.23; accepted 29.01.24; published 28.03.24.

©Yiyang Sheng, Raymond Bond, Rajesh Jaiswal, John Dinsmore, Julie Doyle. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 28.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

27k Accesses

720 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

research paper qualitative and quantitative

High-speed and large-scale intrinsically stretchable integrated circuits

Donglai Zhong, Can Wu, … Zhenan Bao

research paper qualitative and quantitative

Edible mycelium bioengineered for enhanced nutritional value and sensory appeal using a modular synthetic biology toolkit

Vayu Maini Rekdal, Casper R. B. van der Luijt, … Jay D. Keasling

research paper qualitative and quantitative

The environmental price of fast fashion

Kirsi Niinimäki, Greg Peters, … Alison Gwilt

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

research paper qualitative and quantitative

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Qualitative study.

Steven Tenny ; Janelle M. Brannan ; Grace D. Brannan .

Affiliations

Last Update: September 18, 2022 .

  • Introduction

Qualitative research is a type of research that explores and provides deeper insights into real-world problems. [1] Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants' experiences, perceptions, and behavior. It answers the hows and whys instead of how many or how much. It could be structured as a stand-alone study, purely relying on qualitative data or it could be part of mixed-methods research that combines qualitative and quantitative data. This review introduces the readers to some basic concepts, definitions, terminology, and application of qualitative research.

Qualitative research at its core, ask open-ended questions whose answers are not easily put into numbers such as ‘how’ and ‘why’. [2] Due to the open-ended nature of the research questions at hand, qualitative research design is often not linear in the same way quantitative design is. [2] One of the strengths of qualitative research is its ability to explain processes and patterns of human behavior that can be difficult to quantify. [3] Phenomena such as experiences, attitudes, and behaviors can be difficult to accurately capture quantitatively, whereas a qualitative approach allows participants themselves to explain how, why, or what they were thinking, feeling, and experiencing at a certain time or during an event of interest. Quantifying qualitative data certainly is possible, but at its core, qualitative data is looking for themes and patterns that can be difficult to quantify and it is important to ensure that the context and narrative of qualitative work are not lost by trying to quantify something that is not meant to be quantified.

However, while qualitative research is sometimes placed in opposition to quantitative research, where they are necessarily opposites and therefore ‘compete’ against each other and the philosophical paradigms associated with each, qualitative and quantitative work are not necessarily opposites nor are they incompatible. [4] While qualitative and quantitative approaches are different, they are not necessarily opposites, and they are certainly not mutually exclusive. For instance, qualitative research can help expand and deepen understanding of data or results obtained from quantitative analysis. For example, say a quantitative analysis has determined that there is a correlation between length of stay and level of patient satisfaction, but why does this correlation exist? This dual-focus scenario shows one way in which qualitative and quantitative research could be integrated together.

Examples of Qualitative Research Approaches

Ethnography

Ethnography as a research design has its origins in social and cultural anthropology, and involves the researcher being directly immersed in the participant’s environment. [2] Through this immersion, the ethnographer can use a variety of data collection techniques with the aim of being able to produce a comprehensive account of the social phenomena that occurred during the research period. [2] That is to say, the researcher’s aim with ethnography is to immerse themselves into the research population and come out of it with accounts of actions, behaviors, events, etc. through the eyes of someone involved in the population. Direct involvement of the researcher with the target population is one benefit of ethnographic research because it can then be possible to find data that is otherwise very difficult to extract and record.

Grounded Theory

Grounded Theory is the “generation of a theoretical model through the experience of observing a study population and developing a comparative analysis of their speech and behavior.” [5] As opposed to quantitative research which is deductive and tests or verifies an existing theory, grounded theory research is inductive and therefore lends itself to research that is aiming to study social interactions or experiences. [3] [2] In essence, Grounded Theory’s goal is to explain for example how and why an event occurs or how and why people might behave a certain way. Through observing the population, a researcher using the Grounded Theory approach can then develop a theory to explain the phenomena of interest.

Phenomenology

Phenomenology is defined as the “study of the meaning of phenomena or the study of the particular”. [5] At first glance, it might seem that Grounded Theory and Phenomenology are quite similar, but upon careful examination, the differences can be seen. At its core, phenomenology looks to investigate experiences from the perspective of the individual. [2] Phenomenology is essentially looking into the ‘lived experiences’ of the participants and aims to examine how and why participants behaved a certain way, from their perspective . Herein lies one of the main differences between Grounded Theory and Phenomenology. Grounded Theory aims to develop a theory for social phenomena through an examination of various data sources whereas Phenomenology focuses on describing and explaining an event or phenomena from the perspective of those who have experienced it.

Narrative Research

One of qualitative research’s strengths lies in its ability to tell a story, often from the perspective of those directly involved in it. Reporting on qualitative research involves including details and descriptions of the setting involved and quotes from participants. This detail is called ‘thick’ or ‘rich’ description and is a strength of qualitative research. Narrative research is rife with the possibilities of ‘thick’ description as this approach weaves together a sequence of events, usually from just one or two individuals, in the hopes of creating a cohesive story, or narrative. [2] While it might seem like a waste of time to focus on such a specific, individual level, understanding one or two people’s narratives for an event or phenomenon can help to inform researchers about the influences that helped shape that narrative. The tension or conflict of differing narratives can be “opportunities for innovation”. [2]

Research Paradigm

Research paradigms are the assumptions, norms, and standards that underpin different approaches to research. Essentially, research paradigms are the ‘worldview’ that inform research. [4] It is valuable for researchers, both qualitative and quantitative, to understand what paradigm they are working within because understanding the theoretical basis of research paradigms allows researchers to understand the strengths and weaknesses of the approach being used and adjust accordingly. Different paradigms have different ontology and epistemologies . Ontology is defined as the "assumptions about the nature of reality” whereas epistemology is defined as the “assumptions about the nature of knowledge” that inform the work researchers do. [2] It is important to understand the ontological and epistemological foundations of the research paradigm researchers are working within to allow for a full understanding of the approach being used and the assumptions that underpin the approach as a whole. Further, it is crucial that researchers understand their own ontological and epistemological assumptions about the world in general because their assumptions about the world will necessarily impact how they interact with research. A discussion of the research paradigm is not complete without describing positivist, postpositivist, and constructivist philosophies.

Positivist vs Postpositivist

To further understand qualitative research, we need to discuss positivist and postpositivist frameworks. Positivism is a philosophy that the scientific method can and should be applied to social as well as natural sciences. [4] Essentially, positivist thinking insists that the social sciences should use natural science methods in its research which stems from positivist ontology that there is an objective reality that exists that is fully independent of our perception of the world as individuals. Quantitative research is rooted in positivist philosophy, which can be seen in the value it places on concepts such as causality, generalizability, and replicability.

Conversely, postpositivists argue that social reality can never be one hundred percent explained but it could be approximated. [4] Indeed, qualitative researchers have been insisting that there are “fundamental limits to the extent to which the methods and procedures of the natural sciences could be applied to the social world” and therefore postpositivist philosophy is often associated with qualitative research. [4] An example of positivist versus postpositivist values in research might be that positivist philosophies value hypothesis-testing, whereas postpositivist philosophies value the ability to formulate a substantive theory.

Constructivist

Constructivism is a subcategory of postpositivism. Most researchers invested in postpositivist research are constructivist as well, meaning they think there is no objective external reality that exists but rather that reality is constructed. Constructivism is a theoretical lens that emphasizes the dynamic nature of our world. “Constructivism contends that individuals’ views are directly influenced by their experiences, and it is these individual experiences and views that shape their perspective of reality”. [6] Essentially, Constructivist thought focuses on how ‘reality’ is not a fixed certainty and experiences, interactions, and backgrounds give people a unique view of the world. Constructivism contends, unlike in positivist views, that there is not necessarily an ‘objective’ reality we all experience. This is the ‘relativist’ ontological view that reality and the world we live in are dynamic and socially constructed. Therefore, qualitative scientific knowledge can be inductive as well as deductive.” [4]

So why is it important to understand the differences in assumptions that different philosophies and approaches to research have? Fundamentally, the assumptions underpinning the research tools a researcher selects provide an overall base for the assumptions the rest of the research will have and can even change the role of the researcher themselves. [2] For example, is the researcher an ‘objective’ observer such as in positivist quantitative work? Or is the researcher an active participant in the research itself, as in postpositivist qualitative work? Understanding the philosophical base of the research undertaken allows researchers to fully understand the implications of their work and their role within the research, as well as reflect on their own positionality and bias as it pertains to the research they are conducting.

Data Sampling 

The better the sample represents the intended study population, the more likely the researcher is to encompass the varying factors at play. The following are examples of participant sampling and selection: [7]

  • Purposive sampling- selection based on the researcher’s rationale in terms of being the most informative.
  • Criterion sampling-selection based on pre-identified factors.
  • Convenience sampling- selection based on availability.
  • Snowball sampling- the selection is by referral from other participants or people who know potential participants.
  • Extreme case sampling- targeted selection of rare cases.
  • Typical case sampling-selection based on regular or average participants. 

Data Collection and Analysis

Qualitative research uses several techniques including interviews, focus groups, and observation. [1] [2] [3] Interviews may be unstructured, with open-ended questions on a topic and the interviewer adapts to the responses. Structured interviews have a predetermined number of questions that every participant is asked. It is usually one on one and is appropriate for sensitive topics or topics needing an in-depth exploration. Focus groups are often held with 8-12 target participants and are used when group dynamics and collective views on a topic are desired. Researchers can be a participant-observer to share the experiences of the subject or a non-participant or detached observer.

While quantitative research design prescribes a controlled environment for data collection, qualitative data collection may be in a central location or in the environment of the participants, depending on the study goals and design. Qualitative research could amount to a large amount of data. Data is transcribed which may then be coded manually or with the use of Computer Assisted Qualitative Data Analysis Software or CAQDAS such as ATLAS.ti or NVivo. [8] [9] [10]

After the coding process, qualitative research results could be in various formats. It could be a synthesis and interpretation presented with excerpts from the data. [11] Results also could be in the form of themes and theory or model development.

Dissemination

To standardize and facilitate the dissemination of qualitative research outcomes, the healthcare team can use two reporting standards. The Consolidated Criteria for Reporting Qualitative Research or COREQ is a 32-item checklist for interviews and focus groups. [12] The Standards for Reporting Qualitative Research (SRQR) is a checklist covering a wider range of qualitative research. [13]

Examples of Application

Many times a research question will start with qualitative research. The qualitative research will help generate the research hypothesis which can be tested with quantitative methods. After the data is collected and analyzed with quantitative methods, a set of qualitative methods can be used to dive deeper into the data for a better understanding of what the numbers truly mean and their implications. The qualitative methods can then help clarify the quantitative data and also help refine the hypothesis for future research. Furthermore, with qualitative research researchers can explore subjects that are poorly studied with quantitative methods. These include opinions, individual's actions, and social science research.

A good qualitative study design starts with a goal or objective. This should be clearly defined or stated. The target population needs to be specified. A method for obtaining information from the study population must be carefully detailed to ensure there are no omissions of part of the target population. A proper collection method should be selected which will help obtain the desired information without overly limiting the collected data because many times, the information sought is not well compartmentalized or obtained. Finally, the design should ensure adequate methods for analyzing the data. An example may help better clarify some of the various aspects of qualitative research.

A researcher wants to decrease the number of teenagers who smoke in their community. The researcher could begin by asking current teen smokers why they started smoking through structured or unstructured interviews (qualitative research). The researcher can also get together a group of current teenage smokers and conduct a focus group to help brainstorm factors that may have prevented them from starting to smoke (qualitative research).

In this example, the researcher has used qualitative research methods (interviews and focus groups) to generate a list of ideas of both why teens start to smoke as well as factors that may have prevented them from starting to smoke. Next, the researcher compiles this data. The research found that, hypothetically, peer pressure, health issues, cost, being considered “cool,” and rebellious behavior all might increase or decrease the likelihood of teens starting to smoke.

The researcher creates a survey asking teen participants to rank how important each of the above factors is in either starting smoking (for current smokers) or not smoking (for current non-smokers). This survey provides specific numbers (ranked importance of each factor) and is thus a quantitative research tool.

The researcher can use the results of the survey to focus efforts on the one or two highest-ranked factors. Let us say the researcher found that health was the major factor that keeps teens from starting to smoke, and peer pressure was the major factor that contributed to teens to start smoking. The researcher can go back to qualitative research methods to dive deeper into each of these for more information. The researcher wants to focus on how to keep teens from starting to smoke, so they focus on the peer pressure aspect.

The researcher can conduct interviews and/or focus groups (qualitative research) about what types and forms of peer pressure are commonly encountered, where the peer pressure comes from, and where smoking first starts. The researcher hypothetically finds that peer pressure often occurs after school at the local teen hangouts, mostly the local park. The researcher also hypothetically finds that peer pressure comes from older, current smokers who provide the cigarettes.

The researcher could further explore this observation made at the local teen hangouts (qualitative research) and take notes regarding who is smoking, who is not, and what observable factors are at play for peer pressure of smoking. The researcher finds a local park where many local teenagers hang out and see that a shady, overgrown area of the park is where the smokers tend to hang out. The researcher notes the smoking teenagers buy their cigarettes from a local convenience store adjacent to the park where the clerk does not check identification before selling cigarettes. These observations fall under qualitative research.

If the researcher returns to the park and counts how many individuals smoke in each region of the park, this numerical data would be quantitative research. Based on the researcher's efforts thus far, they conclude that local teen smoking and teenagers who start to smoke may decrease if there are fewer overgrown areas of the park and the local convenience store does not sell cigarettes to underage individuals.

The researcher could try to have the parks department reassess the shady areas to make them less conducive to the smokers or identify how to limit the sales of cigarettes to underage individuals by the convenience store. The researcher would then cycle back to qualitative methods of asking at-risk population their perceptions of the changes, what factors are still at play, as well as quantitative research that includes teen smoking rates in the community, the incidence of new teen smokers, among others. [14] [15]

Qualitative research functions as a standalone research design or in combination with quantitative research to enhance our understanding of the world. Qualitative research uses techniques including structured and unstructured interviews, focus groups, and participant observation to not only help generate hypotheses which can be more rigorously tested with quantitative research but also to help researchers delve deeper into the quantitative research numbers, understand what they mean, and understand what the implications are.  Qualitative research provides researchers with a way to understand what is going on, especially when things are not easily categorized. [16]

  • Issues of Concern

As discussed in the sections above, quantitative and qualitative work differ in many different ways, including the criteria for evaluating them. There are four well-established criteria for evaluating quantitative data: internal validity, external validity, reliability, and objectivity. The correlating concepts in qualitative research are credibility, transferability, dependability, and confirmability. [4] [11] The corresponding quantitative and qualitative concepts can be seen below, with the quantitative concept is on the left, and the qualitative concept is on the right:

  • Internal validity--- Credibility
  • External validity---Transferability
  • Reliability---Dependability
  • Objectivity---Confirmability

In conducting qualitative research, ensuring these concepts are satisfied and well thought out can mitigate potential issues from arising. For example, just as a researcher will ensure that their quantitative study is internally valid so should qualitative researchers ensure that their work has credibility.  

Indicators such as triangulation and peer examination can help evaluate the credibility of qualitative work.

  • Triangulation: Triangulation involves using multiple methods of data collection to increase the likelihood of getting a reliable and accurate result. In our above magic example, the result would be more reliable by also interviewing the magician, back-stage hand, and the person who "vanished." In qualitative research, triangulation can include using telephone surveys, in-person surveys, focus groups, and interviews as well as surveying an adequate cross-section of the target demographic.
  • Peer examination: Results can be reviewed by a peer to ensure the data is consistent with the findings.

‘Thick’ or ‘rich’ description can be used to evaluate the transferability of qualitative research whereas using an indicator such as an audit trail might help with evaluating the dependability and confirmability.

  • Thick or rich description is a detailed and thorough description of details, the setting, and quotes from participants in the research. [5] Thick descriptions will include a detailed explanation of how the study was carried out. Thick descriptions are detailed enough to allow readers to draw conclusions and interpret the data themselves, which can help with transferability and replicability.
  • Audit trail: An audit trail provides a documented set of steps of how the participants were selected and the data was collected. The original records of information should also be kept (e.g., surveys, notes, recordings).

One issue of concern that qualitative researchers should take into consideration is observation bias. Here are a few examples:

  • Hawthorne effect: The Hawthorne effect is the change in participant behavior when they know they are being observed. If a researcher was wanting to identify factors that contribute to employee theft and tells the employees they are going to watch them to see what factors affect employee theft, one would suspect employee behavior would change when they know they are being watched.
  • Observer-expectancy effect: Some participants change their behavior or responses to satisfy the researcher's desired effect. This happens in an unconscious manner for the participant so it is important to eliminate or limit transmitting the researcher's views.
  • Artificial scenario effect: Some qualitative research occurs in artificial scenarios and/or with preset goals. In such situations, the information may not be accurate because of the artificial nature of the scenario. The preset goals may limit the qualitative information obtained.
  • Clinical Significance

Qualitative research by itself or combined with quantitative research helps healthcare providers understand patients and the impact and challenges of the care they deliver. Qualitative research provides an opportunity to generate and refine hypotheses and delve deeper into the data generated by quantitative research. Qualitative research does not exist as an island apart from quantitative research, but as an integral part of research methods to be used for the understanding of the world around us. [17]

  • Enhancing Healthcare Team Outcomes

Qualitative research is important for all members of the health care team as all are affected by qualitative research. Qualitative research may help develop a theory or a model for health research that can be further explored by quantitative research.  Much of the qualitative research data acquisition is completed by numerous team members including social works, scientists, nurses, etc.  Within each area of the medical field, there is copious ongoing qualitative research including physician-patient interactions, nursing-patient interactions, patient-environment interactions, health care team function, patient information delivery, etc. 

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Disclosure: Steven Tenny declares no relevant financial relationships with ineligible companies.

Disclosure: Janelle Brannan declares no relevant financial relationships with ineligible companies.

Disclosure: Grace Brannan declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Tenny S, Brannan JM, Brannan GD. Qualitative Study. [Updated 2022 Sep 18]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • Suicidal Ideation. [StatPearls. 2024] Suicidal Ideation. Harmer B, Lee S, Duong TVH, Saadabadi A. StatPearls. 2024 Jan
  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. [Cochrane Database Syst Rev. 2022] Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, et al. Cochrane Database Syst Rev. 2022 Feb 1; 2(2022). Epub 2022 Feb 1.
  • Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012). [Phys Biol. 2013] Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012). Foffi G, Pastore A, Piazza F, Temussi PA. Phys Biol. 2013 Aug; 10(4):040301. Epub 2013 Aug 2.
  • Review Evidence Brief: The Effectiveness Of Mandatory Computer-Based Trainings On Government Ethics, Workplace Harassment, Or Privacy And Information Security-Related Topics [ 2014] Review Evidence Brief: The Effectiveness Of Mandatory Computer-Based Trainings On Government Ethics, Workplace Harassment, Or Privacy And Information Security-Related Topics Peterson K, McCleery E. 2014 May
  • Review Public sector reforms and their impact on the level of corruption: A systematic review. [Campbell Syst Rev. 2021] Review Public sector reforms and their impact on the level of corruption: A systematic review. Mugellini G, Della Bella S, Colagrossi M, Isenring GL, Killias M. Campbell Syst Rev. 2021 Jun; 17(2):e1173. Epub 2021 May 24.

Recent Activity

  • Qualitative Study - StatPearls Qualitative Study - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

IMAGES

  1. Qualitative vs Quantitative Research: What's the Difference?

    research paper qualitative and quantitative

  2. Qualitative vs. Quantitative Research

    research paper qualitative and quantitative

  3. Qualitative vs. Quantative

    research paper qualitative and quantitative

  4. Qualitative vs. Quantitative Research: Definition and Types

    research paper qualitative and quantitative

  5. Qualitative vs. Quantitative research: Which is the better method for

    research paper qualitative and quantitative

  6. Quantitative vs. Qualitative Research

    research paper qualitative and quantitative

VIDEO

  1. Quantitative research process

  2. Different types of Research Designs|Quantitative|Qualitative|English| part 1|

  3. Quantitative Research Paper Review

  4. Quantitative Research

  5. Quantitative Research

  6. Quantitative Research, Types and Examples Latest

COMMENTS

  1. Qualitative vs. Quantitative Research

    Use quantitative research if you want to confirm or test something (a theory or hypothesis) Use qualitative research if you want to understand something (concepts, thoughts, experiences) For most research topics you can choose a qualitative, quantitative or mixed methods approach. Which type you choose depends on, among other things, whether ...

  2. A Practical Guide to Writing Quantitative and Qualitative Research

    INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...

  3. Quantitative and Qualitative Approaches to Generalization and

    Qualitative and quantitative research strategies have long been treated as opposing paradigms. In recent years, there have been attempts to integrate both strategies. ... In this paper, we argue that the latter is a direct consequence of how the concept of generalizability is conceived in the two approaches. Whereas most of quantitative ...

  4. Qualitative vs Quantitative Research: What's the Difference?

    The main difference between quantitative and qualitative research is the type of data they collect and analyze. Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms. Quantitative research is often used to ...

  5. Qualitative vs Quantitative Research

    Quantitative research. Quantitative research is expressed in numbers and graphs. It is used to test or confirm theories and assumptions. This type of research can be used to establish generalisable facts about a topic. Common quantitative methods include experiments, observations recorded as numbers, and surveys with closed-ended questions.

  6. Understanding quantitative and qualitative research methods: A

    The main aim of this paper is to identify the differences between quantitative and qualitative research methods and to evaluate the bright discrepancies between these two factors. Besides, this ...

  7. Synthesising quantitative and qualitative evidence to inform guidelines

    Qualitative and quantitative research is collected and analysed at the same time in a parallel or complementary manner. Integration can occur at three points: a. Data-based convergent synthesis design ... This paper has presented how qualitative and quantitative evidence, combined in mixed-method reviews, can help understand aspects of complex ...

  8. Qualitative vs. quantitative research

    The main aim of qualitative research is to get a better understanding and insights into concepts, topics, and subjects. Quantitative research focuses on collecting numerical data and using it to measure variables. As such, quantitative research and data are typically expressed in numbers and graphs.

  9. Difference Between Qualitative and Qualitative Research

    At a Glance. Psychologists rely on quantitative and quantitative research to better understand human thought and behavior. Qualitative research involves collecting and evaluating non-numerical data in order to understand concepts or subjective opinions. Quantitative research involves collecting and evaluating numerical data.

  10. Quantitative and qualitative research methods: Considerations and

    Quantitative and qualitative research design represent the two sides of a coin in research project and Hammed (2020) citing Guba (1982) illustrated the axiomatic differences between the two ...

  11. (PDF) Qualitative Versus Quantitative Research

    The study employed a cross-sectional technique and combined qualitative and quantitative research design. Qualitative research utilizes interviews, descriptive observation, and words to understand ...

  12. Qualitative and Quantitative Research: Differences and Similarities

    The information generated through qualitative research can provide new hypotheses to test through quantitative research. Quantitative research studies are typically more focused and less exploratory, involve a larger sample size, and by definition produce numerical data. Dr. Goodall's qualitative research clearly established periods of ...

  13. 'Qualitative' and 'quantitative' methods and approaches ...

    There is considerable literature showing the complexity, connectivity and blurring of 'qualitative' and 'quantitative' methods in research. Yet these concepts are often represented in a binary way as independent dichotomous categories. This is evident in many key textbooks which are used in research methods courses to guide students and newer researchers in their research training. This paper ...

  14. When Does a Researcher Choose a Quantitative, Qualitative, or Mixed

    In educational studies, the paradigm war over quantitative and qualitative research approaches has raged for more than half a century. The focus in the late twentieth century was on the distinction between the two approaches, and the motivation was to retain one of the approaches' supremacy. Since the early twenty-first century, there has been a growing interest in situating in the middle ...

  15. Quantitative and qualitative research

    Some clinicians still believe that qualitative research is a "soft" science and of lesser value to clinical decision making, ... This paper provides an overview of the history of science to help readers appreciate the basic epistemological commonalities and differences between qualitative and quantitative approaches to research. Age of ...

  16. Qualitative Study

    Qualitative research is a type of research that explores and provides deeper insights into real-world problems. Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data.

  17. (PDF) Mixing quantitative and qualitative research

    Mixing methods is often a. strategy that qualitative researchers are tempted to deploy for rhetorical reasons in a field (management) where qualit ative work i s stil l in the minority. That is ...

  18. LibGuides: Quantitative and Qualitative Research: Find Help

    Find quantitative or qualitative research in PsycINFO; Relevant book titles; Find Help; Need Help? Search FAQs or Ask a Question. Search. Related Guide. Student Services. Need Help? To get help from a librarian, students can: 1- Call the reference desk at: 516.463.5962.

  19. JMIR Formative Research

    This paper is in the following e-collection/theme issue: Formative Evaluation of Digital Health Interventions (2023) E-Health / Health Services Research and New Models of Care (387) Telehealth and Telemonitoring (1464) Adoption and Change Management of eHealth Systems (629) Clinical Communication, Electronic Consultation and Telehealth (499) Health Services Research (361)

  20. Journal of Medical Internet Research

    This paper is in the following e-collection/theme issue: Telehealth and Telemonitoring (1465) mHealth for Telemedicine and Homecare (324) Clinical Communication, Electronic Consultation and Telehealth (500) Health Services Research (362) mHealth for Health Administration (130) Chronic Conditions (172) mHealth for Symptom and Disease Monitoring, Chronic Disease Management (1219)

  21. How to use and assess qualitative research methods

    Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

  22. Journal of Medical Internet Research

    This paper is in the following e-collection/theme issue: Engagement with and Adherence to Digital Health Interventions, Law of Attrition (429) New Methods (854) E-Health / Health Services Research and New Models of Care (385) Aging with Chronic Disease (52) Chronic Conditions (170) Aging in Place (208) Health Services Research and Health Care Utilization in Older Patients (61) Artificial ...

  23. Strengths and Limitations of Qualitative and Quantitative Research Methods

    This paper concisely maps a total of seven qualitative methods and five quantitative methods. A comparative analysis of the most relevant and adopted methods is done to understand the main ...

  24. Studocu

    301 Moved Permanently. openresty

  25. Combining qualitative and quantitative research within mixed method

    However, the integration of qualitative and quantitative approaches continues to be one of much debate and there is a need for a rigorous framework for designing and interpreting mixed methods research. This paper explores the analytical approaches (i.e. parallel, concurrent or sequential) used in mixed methods studies within healthcare and ...

  26. Predicting and improving complex beer flavor through machine ...

    For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 ...

  27. paper about qualitative and quantitative research methods

    Abstract. This paper gives answer three questions: one is about Qualitative and the other two are about Quantitative methods. The main portion is about "School dropout" problem in Pakistan while ...

  28. Qualitative Study

    Qualitative research is a type of research that explores and provides deeper insights into real-world problems.[1] Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants ...