U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.37(16); 2022 Apr 25

Logo of jkms

A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

Edward barroga.

1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.

Glafera Janet Matanguihan

2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.

The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.

INTRODUCTION

Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6

It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4

There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.

DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES

A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5

On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4

Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8

Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12

CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES

Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13

There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10

TYPES OF RESEARCH QUESTIONS AND HYPOTHESES

Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .

Research questions in quantitative research

In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .

Hypotheses in quantitative research

In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .

Research questions in qualitative research

Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15

There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .

Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15

Hypotheses in qualitative research

Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1

FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES

Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14

The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14

As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.

a These statements were composed for comparison and illustrative purposes only.

b These statements are direct quotes from Higashihara and Horiuchi. 16

a This statement is a direct quote from Shimoda et al. 17

The other statements were composed for comparison and illustrative purposes only.

CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES

To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g001.jpg

Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.

Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12

In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g002.jpg

EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES

  • EXAMPLE 1. Descriptive research question (quantitative research)
  • - Presents research variables to be assessed (distinct phenotypes and subphenotypes)
  • “BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
  • RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
  • EXAMPLE 2. Relationship research question (quantitative research)
  • - Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
  • “Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
  • Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
  • EXAMPLE 3. Comparative research question (quantitative research)
  • - Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
  • “BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
  • RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
  • STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
  • EXAMPLE 4. Exploratory research question (qualitative research)
  • - Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
  • “Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
  • EXAMPLE 5. Relationship research question (quantitative research)
  • - Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
  • “Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23

EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES

  • EXAMPLE 1. Working hypothesis (quantitative research)
  • - A hypothesis that is initially accepted for further research to produce a feasible theory
  • “As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
  • “In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
  • EXAMPLE 2. Exploratory hypothesis (qualitative research)
  • - Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
  • “We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
  • “Conclusion
  • Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
  • EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
  • “We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
  • Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
  • EXAMPLE 4. Statistical hypothesis (quantitative research)
  • - An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
  • “Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
  • “Statistical Analysis
  • ( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27

EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS

  • EXAMPLE 1. Background, hypotheses, and aims are provided
  • “Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
  • “ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
  • “This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
  • EXAMPLE 2. Background, hypotheses, and aims are provided
  • “We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
  • “ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
  • EXAMPLE 3. Background, aim, and hypothesis are provided
  • “In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities [1]. BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times [4]. Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
  • “This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
  • “ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30

Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

  • Conceptualization: Barroga E, Matanguihan GJ.
  • Methodology: Barroga E, Matanguihan GJ.
  • Writing - original draft: Barroga E, Matanguihan GJ.
  • Writing - review & editing: Barroga E, Matanguihan GJ.
  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • Quantitative Methods
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Quantitative methods emphasize objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls, questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques . Quantitative research focuses on gathering numerical data and generalizing it across groups of people or to explain a particular phenomenon.

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Muijs, Daniel. Doing Quantitative Research in Education with SPSS . 2nd edition. London: SAGE Publications, 2010.

Need Help Locating Statistics?

Resources for locating data and statistics can be found here:

Statistics & Data Research Guide

Characteristics of Quantitative Research

Your goal in conducting quantitative research study is to determine the relationship between one thing [an independent variable] and another [a dependent or outcome variable] within a population. Quantitative research designs are either descriptive [subjects usually measured once] or experimental [subjects measured before and after a treatment]. A descriptive study establishes only associations between variables; an experimental study establishes causality.

Quantitative research deals in numbers, logic, and an objective stance. Quantitative research focuses on numeric and unchanging data and detailed, convergent reasoning rather than divergent reasoning [i.e., the generation of a variety of ideas about a research problem in a spontaneous, free-flowing manner].

Its main characteristics are :

  • The data is usually gathered using structured research instruments.
  • The results are based on larger sample sizes that are representative of the population.
  • The research study can usually be replicated or repeated, given its high reliability.
  • Researcher has a clearly defined research question to which objective answers are sought.
  • All aspects of the study are carefully designed before data is collected.
  • Data are in the form of numbers and statistics, often arranged in tables, charts, figures, or other non-textual forms.
  • Project can be used to generalize concepts more widely, predict future results, or investigate causal relationships.
  • Researcher uses tools, such as questionnaires or computer software, to collect numerical data.

The overarching aim of a quantitative research study is to classify features, count them, and construct statistical models in an attempt to explain what is observed.

  Things to keep in mind when reporting the results of a study using quantitative methods :

  • Explain the data collected and their statistical treatment as well as all relevant results in relation to the research problem you are investigating. Interpretation of results is not appropriate in this section.
  • Report unanticipated events that occurred during your data collection. Explain how the actual analysis differs from the planned analysis. Explain your handling of missing data and why any missing data does not undermine the validity of your analysis.
  • Explain the techniques you used to "clean" your data set.
  • Choose a minimally sufficient statistical procedure ; provide a rationale for its use and a reference for it. Specify any computer programs used.
  • Describe the assumptions for each procedure and the steps you took to ensure that they were not violated.
  • When using inferential statistics , provide the descriptive statistics, confidence intervals, and sample sizes for each variable as well as the value of the test statistic, its direction, the degrees of freedom, and the significance level [report the actual p value].
  • Avoid inferring causality , particularly in nonrandomized designs or without further experimentation.
  • Use tables to provide exact values ; use figures to convey global effects. Keep figures small in size; include graphic representations of confidence intervals whenever possible.
  • Always tell the reader what to look for in tables and figures .

NOTE:   When using pre-existing statistical data gathered and made available by anyone other than yourself [e.g., government agency], you still must report on the methods that were used to gather the data and describe any missing data that exists and, if there is any, provide a clear explanation why the missing data does not undermine the validity of your final analysis.

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Brians, Craig Leonard et al. Empirical Political Analysis: Quantitative and Qualitative Research Methods . 8th ed. Boston, MA: Longman, 2011; McNabb, David E. Research Methods in Public Administration and Nonprofit Management: Quantitative and Qualitative Approaches . 2nd ed. Armonk, NY: M.E. Sharpe, 2008; Quantitative Research Methods. Writing@CSU. Colorado State University; Singh, Kultar. Quantitative Social Research Methods . Los Angeles, CA: Sage, 2007.

Basic Research Design for Quantitative Studies

Before designing a quantitative research study, you must decide whether it will be descriptive or experimental because this will dictate how you gather, analyze, and interpret the results. A descriptive study is governed by the following rules: subjects are generally measured once; the intention is to only establish associations between variables; and, the study may include a sample population of hundreds or thousands of subjects to ensure that a valid estimate of a generalized relationship between variables has been obtained. An experimental design includes subjects measured before and after a particular treatment, the sample population may be very small and purposefully chosen, and it is intended to establish causality between variables. Introduction The introduction to a quantitative study is usually written in the present tense and from the third person point of view. It covers the following information:

  • Identifies the research problem -- as with any academic study, you must state clearly and concisely the research problem being investigated.
  • Reviews the literature -- review scholarship on the topic, synthesizing key themes and, if necessary, noting studies that have used similar methods of inquiry and analysis. Note where key gaps exist and how your study helps to fill these gaps or clarifies existing knowledge.
  • Describes the theoretical framework -- provide an outline of the theory or hypothesis underpinning your study. If necessary, define unfamiliar or complex terms, concepts, or ideas and provide the appropriate background information to place the research problem in proper context [e.g., historical, cultural, economic, etc.].

Methodology The methods section of a quantitative study should describe how each objective of your study will be achieved. Be sure to provide enough detail to enable the reader can make an informed assessment of the methods being used to obtain results associated with the research problem. The methods section should be presented in the past tense.

  • Study population and sampling -- where did the data come from; how robust is it; note where gaps exist or what was excluded. Note the procedures used for their selection;
  • Data collection – describe the tools and methods used to collect information and identify the variables being measured; describe the methods used to obtain the data; and, note if the data was pre-existing [i.e., government data] or you gathered it yourself. If you gathered it yourself, describe what type of instrument you used and why. Note that no data set is perfect--describe any limitations in methods of gathering data.
  • Data analysis -- describe the procedures for processing and analyzing the data. If appropriate, describe the specific instruments of analysis used to study each research objective, including mathematical techniques and the type of computer software used to manipulate the data.

Results The finding of your study should be written objectively and in a succinct and precise format. In quantitative studies, it is common to use graphs, tables, charts, and other non-textual elements to help the reader understand the data. Make sure that non-textual elements do not stand in isolation from the text but are being used to supplement the overall description of the results and to help clarify key points being made. Further information about how to effectively present data using charts and graphs can be found here .

  • Statistical analysis -- how did you analyze the data? What were the key findings from the data? The findings should be present in a logical, sequential order. Describe but do not interpret these trends or negative results; save that for the discussion section. The results should be presented in the past tense.

Discussion Discussions should be analytic, logical, and comprehensive. The discussion should meld together your findings in relation to those identified in the literature review, and placed within the context of the theoretical framework underpinning the study. The discussion should be presented in the present tense.

  • Interpretation of results -- reiterate the research problem being investigated and compare and contrast the findings with the research questions underlying the study. Did they affirm predicted outcomes or did the data refute it?
  • Description of trends, comparison of groups, or relationships among variables -- describe any trends that emerged from your analysis and explain all unanticipated and statistical insignificant findings.
  • Discussion of implications – what is the meaning of your results? Highlight key findings based on the overall results and note findings that you believe are important. How have the results helped fill gaps in understanding the research problem?
  • Limitations -- describe any limitations or unavoidable bias in your study and, if necessary, note why these limitations did not inhibit effective interpretation of the results.

Conclusion End your study by to summarizing the topic and provide a final comment and assessment of the study.

  • Summary of findings – synthesize the answers to your research questions. Do not report any statistical data here; just provide a narrative summary of the key findings and describe what was learned that you did not know before conducting the study.
  • Recommendations – if appropriate to the aim of the assignment, tie key findings with policy recommendations or actions to be taken in practice.
  • Future research – note the need for future research linked to your study’s limitations or to any remaining gaps in the literature that were not addressed in your study.

Black, Thomas R. Doing Quantitative Research in the Social Sciences: An Integrated Approach to Research Design, Measurement and Statistics . London: Sage, 1999; Gay,L. R. and Peter Airasain. Educational Research: Competencies for Analysis and Applications . 7th edition. Upper Saddle River, NJ: Merril Prentice Hall, 2003; Hector, Anestine. An Overview of Quantitative Research in Composition and TESOL . Department of English, Indiana University of Pennsylvania; Hopkins, Will G. “Quantitative Research Design.” Sportscience 4, 1 (2000); "A Strategy for Writing Up Research Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper." Department of Biology. Bates College; Nenty, H. Johnson. "Writing a Quantitative Research Thesis." International Journal of Educational Science 1 (2009): 19-32; Ouyang, Ronghua (John). Basic Inquiry of Quantitative Research . Kennesaw State University.

Strengths of Using Quantitative Methods

Quantitative researchers try to recognize and isolate specific variables contained within the study framework, seek correlation, relationships and causality, and attempt to control the environment in which the data is collected to avoid the risk of variables, other than the one being studied, accounting for the relationships identified.

Among the specific strengths of using quantitative methods to study social science research problems:

  • Allows for a broader study, involving a greater number of subjects, and enhancing the generalization of the results;
  • Allows for greater objectivity and accuracy of results. Generally, quantitative methods are designed to provide summaries of data that support generalizations about the phenomenon under study. In order to accomplish this, quantitative research usually involves few variables and many cases, and employs prescribed procedures to ensure validity and reliability;
  • Applying well established standards means that the research can be replicated, and then analyzed and compared with similar studies;
  • You can summarize vast sources of information and make comparisons across categories and over time; and,
  • Personal bias can be avoided by keeping a 'distance' from participating subjects and using accepted computational techniques .

Babbie, Earl R. The Practice of Social Research . 12th ed. Belmont, CA: Wadsworth Cengage, 2010; Brians, Craig Leonard et al. Empirical Political Analysis: Quantitative and Qualitative Research Methods . 8th ed. Boston, MA: Longman, 2011; McNabb, David E. Research Methods in Public Administration and Nonprofit Management: Quantitative and Qualitative Approaches . 2nd ed. Armonk, NY: M.E. Sharpe, 2008; Singh, Kultar. Quantitative Social Research Methods . Los Angeles, CA: Sage, 2007.

Limitations of Using Quantitative Methods

Quantitative methods presume to have an objective approach to studying research problems, where data is controlled and measured, to address the accumulation of facts, and to determine the causes of behavior. As a consequence, the results of quantitative research may be statistically significant but are often humanly insignificant.

Some specific limitations associated with using quantitative methods to study research problems in the social sciences include:

  • Quantitative data is more efficient and able to test hypotheses, but may miss contextual detail;
  • Uses a static and rigid approach and so employs an inflexible process of discovery;
  • The development of standard questions by researchers can lead to "structural bias" and false representation, where the data actually reflects the view of the researcher instead of the participating subject;
  • Results provide less detail on behavior, attitudes, and motivation;
  • Researcher may collect a much narrower and sometimes superficial dataset;
  • Results are limited as they provide numerical descriptions rather than detailed narrative and generally provide less elaborate accounts of human perception;
  • The research is often carried out in an unnatural, artificial environment so that a level of control can be applied to the exercise. This level of control might not normally be in place in the real world thus yielding "laboratory results" as opposed to "real world results"; and,
  • Preset answers will not necessarily reflect how people really feel about a subject and, in some cases, might just be the closest match to the preconceived hypothesis.

Research Tip

Finding Examples of How to Apply Different Types of Research Methods

SAGE publications is a major publisher of studies about how to design and conduct research in the social and behavioral sciences. Their SAGE Research Methods Online and Cases database includes contents from books, articles, encyclopedias, handbooks, and videos covering social science research design and methods including the complete Little Green Book Series of Quantitative Applications in the Social Sciences and the Little Blue Book Series of Qualitative Research techniques. The database also includes case studies outlining the research methods used in real research projects. This is an excellent source for finding definitions of key terms and descriptions of research design and practice, techniques of data gathering, analysis, and reporting, and information about theories of research [e.g., grounded theory]. The database covers both qualitative and quantitative research methods as well as mixed methods approaches to conducting research.

SAGE Research Methods Online and Cases

  • << Previous: Qualitative Methods
  • Next: Insiderness >>
  • Last Updated: Mar 26, 2024 10:40 AM
  • URL: https://libguides.usc.edu/writingguide

Logo for VIVA Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

20 16. Reporting quantitative results

Chapter outline.

  • Reporting quantitative results (8 minute read time)

Content warning: Brief discussion of violence against women.

16.1 Reporting quantitative results

Learning objectives.

Learners will be able to…

  • Execute a quantitative research report using key elements for accuracy and openness

So you’ve completed your quantitative analyses and are ready to report your results. We’re going to spend some time talking about what matters in quantitative research reports, but the very first thing to understand is this: openness with your data and analyses is key. You should never hide what you did to get to a particular conclusion and, if someone wanted to and could ethically access your data, they should be able to replicate more or less exactly what you did. While your quantitative report won’t have every single step you took to get to your conclusion, it should have plenty of detail so someone can get the picture.

Below, I’m going to take you through the key elements of a quantitative research report. This overview is pretty general and conceptual, and it will be helpful for you to look at existing scholarly articles that deal with quantitative research (like ones in your literature review) to see the structure applied. Also keep in mind that your instructor may want the sections broken out slightly differently; nonetheless, the content I outline below should be in your research report.

Introduction and literature review

These are what you’re working on building with your research proposal this semester. They should be included as part of your research report so that readers have enough information to evaluate your research for themselves. What’s here should be very similar to the introduction and literature review from your research proposal, where you described the literature relevant to the study you wanted to do. With your results in hand, though, you may find that you have to add information to the literature you wrote previously to help orient the reader of the report to important topics needed to understand the results of your study.

In this section, you should explicitly lay out your study design – for instance, if it was experimental, be specific about the type of experimental design. Discuss the type of sampling that you used, if that’s applicable to your project. You should also go into a general description of your data, including the time period, any exclusions you made from the original data set and the source – i.e., did you collect it yourself or was it secondary data?  Next, talk about the specific statistical methods you used, like t- tests, Chi-square tests, or regression analyses. For descriptive statistics, you can be relatively general – you don’t need to say “I looked at means and medians,” for instance. You need to provide enough information here that someone could replicate what you did.

In this section, you should also discuss how you operationalized your variables. What did you mean when you asked about educational attainment – did you ask for a grade number, or did you ask them to pick a range that you turned into a category? This is key information for readers to understand your research. Remember when you were looking for ways to operationalize your variables? Be the kind of author who provides enough information on operationalization so people can actually understand what they did.

You’re going to run lots of different analyses to settle on what finally makes sense to get a result – positive or negative – for your study. For this section, you’re going to provide tables with descriptions of your sample, including, but not limited to, sample size, frequencies of sample characteristics like race and gender, levels of measurement, appropriate measures of central tendency, standard deviations and variances. Here you will also want to focus on the analyses you used to actually draw whatever conclusion you settled on, both descriptive and inferential (i.e., bivariate or multivariate).

The actual statistics you report depend entirely on the kind of statistical analysis you do. For instance, if you’re reporting on a logistic regression, it’s going to look a little different than reporting on an ANOVA. In the previous chapter, we provided links to open textbooks that detail how to conduct quantitative data analysis. You should look at these resources and consult with your research professor to help you determine what is expected in a report about the particular statistical method you used.

The important thing to remember here – as we mentioned above – is that you need to be totally transparent about your results, even and especially if they don’t support your hypothesis. There is value in a disproved hypothesis, too – you now know something about how the state of the world is not .

In this section, you’re going to connect your statistical results back to your hypothesis and discuss whether your results support your hypothesis or not. You are also going to talk about what the results mean for the larger field of study of which your research is a part, the implications of your findings if you’re evaluating some kind of intervention, and how your research relates to what is already out there in this field. When your research doesn’t pan out the way you expect, if you’re able to make some educated guesses as to why this might be (supported by literature if possible, but practice wisdom works too), share those as well.

Let’s take a minute to talk about what happens when your findings disprove your hypothesis or actually indicate something negative about the group you are studying. The discussion section is where you can contextualize “negative” findings. For example, say you conducted a study that indicated that a certain group is more likely to commit violent crime. Here, you have an opportunity to talk about why this might be the case outside of their membership in that group, and how membership in that group does not automatically mean someone will commit a violent crime. You can present mitigating factors, like a history of personal and community trauma. It’s extremely important to provide this relevant context so that your results are more difficult to use against a group you are studying in a way that doesn’t reflect your actual findings.

Limitations

In this section, you’re going to critique your own study. What are the advantages, disadvantages, and trade-offs of what you did to define and analyze your variables? Some questions you might consider include:  What limits the study’s applicability to the population at large? Were there trade-offs you had to make between rigor and available data? Did the statistical analyses you used mean that you could only get certain types of results? What would have made the study more widely applicable or more useful for a certain group? You should be thinking about this throughout the analysis process so you can properly contextualize your results.

In this section, you may also consider discussing any threats to internal validity that you identified and whether you think you can generalize your research. Finally, if you used any measurement tools that haven’t been validated yet, discuss how this could have affected your results.

Significance and conclusions

Finally, you want to use the conclusions section to bring it full circle for your reader – why did this research matter? Talk about how it contributed to knowledge around the topic and how might it be used to further practice. Identify and discuss ethical implications of your findings for social workers and social work research. Finally, make sure to talk about the next steps for you, other researchers, or policy-makers based on your research findings.

Key Takeaways

  • Your quantitative research report should provide the reader with transparent, replicable methods and put your research into the context of existing literature, real-world practice and social work ethics.
  • Think about the research project you are building now. What could a negative finding be, and how might you provide your reader with context to ensure that you are not harming your study population?

The process of determining how to measure a construct that cannot be directly observed

Ability to say that one variable "causes" something to happen to another variable. Very important to assess when thinking about studies that examine causation such as experimental or quasi-experimental designs.

Graduate research methods in social work Copyright © 2020 by Matthew DeCarlo, Cory Cummings, Kate Agnelli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Improving quantitative writing one sentence at a time

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Biology Department, Santa Clara University, Santa Clara, California, United States of America

ORCID logo

Roles Formal analysis, Writing – original draft

Roles Data curation, Funding acquisition, Validation, Writing – review & editing

  • Tracy Ruscetti, 
  • Katherine Krueger, 
  • Christelle Sabatier

PLOS

  • Published: September 12, 2018
  • https://doi.org/10.1371/journal.pone.0203109
  • Reader Comments

Fig 1

Scientific writing, particularly quantitative writing, is difficult to master. To help undergraduate students write more clearly about data, we sought to deconstruct writing into discrete, specific elements. We focused on statements typically used to describe data found in the results sections of research articles (quantitative comparative statements, QC). In this paper, we define the essential components of a QC statement and the rules that govern those components. Clearly defined rules allowed us to quantify writing quality of QC statements (4C scoring). Using 4C scoring, we measured student writing gains in a post-test at the end of the term compared to a pre-test (37% improvement). In addition to overall score, 4C scoring provided insight into common writing mistakes by measuring presence/absence of each essential component. Student writing quality in lab reports improved when they practiced writing isolated QC statements. Although we observed a significant increase in writing quality in lab reports describing a simple experiment, we noted a decrease in writing quality when the complexity of the experimental system increased. Our data suggest a negative correlation of writing quality with complexity. We discuss how our data aligns with existing cognitive theories of writing and how science instructors might improve the scientific writing of their students.

Citation: Ruscetti T, Krueger K, Sabatier C (2018) Improving quantitative writing one sentence at a time. PLoS ONE 13(9): e0203109. https://doi.org/10.1371/journal.pone.0203109

Editor: Mitchell Rabinowitz, Fordham University, UNITED STATES

Received: August 26, 2017; Accepted: August 15, 2018; Published: September 12, 2018

Copyright: © 2018 Ruscetti et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The authors received financial support from Santa Clara University through the Faculty Development Office (T.R.) and the Office of Assessment (T.R. and C.S.).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Written communication of data is at the core of scholarly discourse among scientists and is an important learning goal for science students in undergraduate education [ 1 ]. For scientists, the currency of scientific dialogue is the research article, which presents essential information required to convince an audience that data are compelling, findings are relevant, and interpretations are valid [ 2 , 3 ]. Writing lab reports that contain the elements of a research article is a widely used method to help students develop critical thinking and quantitative reasoning skills. In our introductory, lab-intensive Cell and Molecular Biology course, we focus on helping students develop the “results” section of their lab report. Students integrate tables, graphs, and text to present and interpret data they have generated in the laboratory. In the text portion, students cannot simply restate previously learned information (“knowledge telling;” [ 4 , 5 ]) or narrate through the data presented visually. Rather, students must mimic the actions of professional researchers by transforming data into knowledge and structuring their arguments to support specific claims/conclusions. This type of inquiry-based writing encourages active participation in the scientific process, enhancing engagement and learning [ 6 , 7 ].

While science instructors recognize the importance of writing in their courses, many do not provide explicit writing instruction [ 8 ]. Instructors may fear that teaching writing skills diverts time from teaching required science concepts, expect that writing is covered in composition courses, or lack the tools and resources to teach writing [ 8 , 9 , 10 ]. We wanted to support writing in our course without diverting focus from the conceptual and discipline-specific content of the course. We examined available writing resources (e.g., books, websites) and found substantial resources regarding the macro structure of the report (e.g., describing the sections and broad organization of the lab reports, [ 11 , 12 ]. We also found resources for sentence level support related to emphasis and voice [ 13 ]. However, these resources do not give students explicit guidance as to how to write about quantitative information. Thus, it is not surprising that many students struggle to both construct appropriate quantitative evidence statements and express them in writing [ 14 ].

There are, however, a few important resources that explore the structure of writing about quantitative information. Each describe comparisons as a primary mode of providing quantitative evidence, (e.g., The lifespan of cells grown in the presence of drug is 25% shorter than the lifespan of control cells .). In her book about writing about numbers, Miller discusses “quantitative comparisons” as a fundamental skill in quantitative writing [ 15 ]. Jessica Polito states that many disciplines use comparisons as the basis of quantitative evidence statements that support conclusions [ 14 ], and Grawe uses the presence of a comparison as a measure of sophisticated quantitative writing [ 16 ]. We focused on these types of comparative evidence statements and called them Quantitative Comparative statements (QC). We found this type of statement was commonly used to describe data in the scientific literature, and we decided to emphasize the correct construction of these statements in student writing.

We analyzed over a thousand QC statements from student and professional scientific writing to discover the critical elements of a QC statement and the rules that govern those elements. We found that a QC statement needs to have a comparison, a quantitative relational phrase, and at least one contextual element. These essential elements of the QC statement can be thought of as sentence-level syntax. We then developed a metric to measure writing syntax of the QC statement and by proxy, quantitative writing quality. We examined the effectiveness of different approaches to support writing in a course setting and show that practice writing QC statements with feedback can improve student writing. We also investigated how the circumstances of the writing assignment can change the quality of quantitative writing. Together, these data provide insight into how we might improve undergraduate science writing instruction and the clarity of scientific writing.

Methods and materials

Student population and course structure.

We collected data at Santa Clara University (SCU), a private liberal arts university that is a primarily undergraduate institution. Participants were recruited from BIOL25 –Investigations in Cell and Molecular Biology, a lower-division biology course. Prerequisites include a quarter of introductory physiology, a year (3 quarters) of general chemistry and one quarter of organic chemistry. BIOL25 consists of three interactive lecture periods (65 minutes) and one laboratory period (165 minutes) per week. The lecture periods focus on preparing for the laboratory experience, analysis, interpretation, and presentation of data. Laboratory sessions focus on data collection, data analysis and peer feedback activities. During the 10-week quarter, two experimental modules (Enzyme Kinetics and Transcription Regulation) culminate in a lab report. Students organize and communicate their analyzed data in tables and graphs and communicate their conclusions and reasoning in written form. We provide a detailed rubric for the lab reports and a set of explicit instructions for each lab report ( S2 Fig ). In addition, students participate in peer feedback activities with an opportunity to revise prior to submission.

The basic structure of the course was unchanged between 2014 and 2016. The students were distributed among two lecture sections taught by the same instructors and 13 laboratory sections led by 5 different instructors. All students included in this study signed an informed consent form (213 of 214). This study was reviewed and approved by the Santa Clara University Institutional Review Board (project #15-09-700).

Instructional support

General writing feedback (2014–2016)..

In all iterations of the course discussed in this article, students received general writing feedback after each lab report. In each lab report, students wrote paragraphs in response to prompting questions regarding the data. Writing feedback was holistic and included phrases such as “not quantitative”, or “inappropriate comparison,” but was not specific to any type of sentence.

Calculation support (2015–2016).

In 2015 and 2016, students were explicitly introduced to strategies for quantifying relational differences between data points such as percent difference and fold change. Students were given opportunities to practice calculating these values during in class activities prior to writing their lab reports. We stressed that phrases such as more than, drastically higher, and vanishingly small were not quantitative.

Explicit QC statement writing support (2016).

In 2016, we introduced and practiced using quantitative comparative statement as the means to communicate quantitative results. In class, we discussed including an explicit comparison of two conditions and the quantitative relationship between them. Before each lab report, we asked students to write quantitative comparative statements related to the data. We provided formative feedback on the accuracy of the statement and general feedback such as, “not quantitative”, or “inappropriate comparison”. Students in this study were never exposed to the concept of 4C annotation or scoring. We used the scoring strategy exclusively to measure their writing progress.

Identification of quantitative comparative statements (QC)

Quantitative comparative statements are a subset of evidence statements. In native writing (scientific articles or student lab reports), we identified QC statements by the presence of 1) a relational preposition (between, among, etc.), or 2) prepositional phrase ("compared to", "faster/slower than", etc.), 3) a statistical reference (p value), or 4) the presence of quantified change (3 fold, 10% different).

Syntactic elements of QC statements

We examined a corpus of over 1000 QC statements to identify and characterize the essential elements of a QC statement and the rules that govern those elements. Quantitative comparative statements generally take the form of “ The activity of the enzyme is 30% higher in condition X compared to condition Y ”. We identified three critical elements of the quantitative comparative statement: the things being compared (Comparison, condition X and condition Y ), the quantitative relationship between those conditions (Calculation, 30% higher ), and the measurement that gave rise to the compared values (Context, enzyme activity ). Finally, all three elements must be in the same sentence with no redundancy or contradiction (Clarity). These rules are collectively called “4C”.

Syntactic rules for quantitative comparative statements

The Calculation must quantify the relationship between the two compared elements and include both magnitude and direction. Fold change or percent difference are common methods of describing quantitative relationships [ 15 ]. Using absolute or raw values are not sufficient to describe the relationship between the compared elements and are not sufficient. If there is no significant difference between the compared elements, then statistical data must be cited. Context provides additional information about the measurement from which the quantitative comparison was derived, such as growth rate, enzyme activity, etc., or the time at which the comparison was made. The context should be the same for both of the compared elements. Comparisons are usually between like elements (e.g. time vs. time, condition vs. condition) and there should be two and only two in a single sentence. Both compared elements must be explicitly stated so that the reader is not guessing the intended comparison of the writer. A QC statement has Clarity when all three elements are present and in the same sentence. We consider a statement to be “unclear” if it contains inconsistencies or redundancies.

Annotation and scoring of QC statements

We use “annotation” to describe the visual marking of the critical elements of the quantitative comparative statement. We use “scoring” to mean the assignment of a score to a quantitative comparative statement. 4C annotation and 4C scoring do not reflect whether the statement or any of its components are correct, but rather they highlight the syntactic structure of the quantitative comparative statement ( Fig 1 ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(A) Original quantitative comparative statement. (B) Identify and box the relational phrase with both magnitude and direction. (C) Circle what the relational phrase refers to (context). (D) Underline the comparison. (E) Fully 4C annotated quantitative comparative statement.

https://doi.org/10.1371/journal.pone.0203109.g001

Annotation process.

We scanned the results sections of published primary journal articles or student lab reports for relational phrases such as faster than, increased, more than, lower than, etc., and drew a box around the relational phrase , or calculation ( Fig 1B ). If the calculation is an absolute value, a raw value, refers to no particular value, or is missing the magnitude or direction, we would strike through the box. Context . Once the relational phrase, or calculation, was identified, we drew a circle around the information, or context , referred to by the relational phrase ( Fig 1C ). Comparison . The relational phrase and the context helped us identify the comparison and we underlined the compared elements ( Fig 1D ).

4C scoring strategy.

To score an annotated statement, a “1” or a “0” is given to each of the three critical components of the quantitative comparative statement. If all the elements are present in a single sentence, there are no redundancies or inconsistencies, a fourth “1” is awarded for clarity. We call this annotation and scoring strategy “4C” to reflect each of the three critical components and the overall clarity of the statement ( Table 1 ).

thumbnail

https://doi.org/10.1371/journal.pone.0203109.t001

Student writing samples

Pre-test/post-test..

In 2016, student writing was assessed using identical pre- and post-tests. The pre-test was administered on the first day of class prior to any writing support. The post-test was administered as part of the final exam. The pre/post assessment consisted of a graph and data table ( S1 Fig ). The prompts asked the students to analyze the data to answer a specific question related to the data and to use quantitative comparative statements.

Student sampling for lab report analysis.

For the lab reports in 2016, we sampled 40 students from a stratified student population (based on overall grade in the course) and 4C scored all of their quantitative comparative statements in each lab report. On average, students wrote 5–6 quantitative comparative statements per results section for a total of over 200 4C scored statements for each lab report. We scored over 100 statements from 17–20 lab reports in 2014 and 2015.

Complexity index

We based complexity on the number of values (data points) students would have to parse to develop a QC statement. The complexity of a given experiment is in part determined by number of conditions tested in an experiment and the different types of measurements used. For example, in lab report #1 (Enzyme Kinetics) students consider 3 experimental conditions (control and two separate variables) and 2 measurements (K m and V max ). Thus we calculated a complexity index of 6 (3 conditions x 2 measurements) for lab report #1. In this measure of complexity index, we assumed that all parameters contributed equally to the complexity of the experiment, and that all parameters were equally likely to be considered by students as they developed their written conclusions. However, by designing specific writing prompts, we could guide students to examine a smaller subset of data points and reduce complexity of the situation. In lab report #1 for example, we can prompt students to consider only the effect of the treatment on a single variable such that they only consider 2 conditions (the control and the single experimental variable described in the prompt) and 2 measurements. Now, students are focused on a subset of data and the complexity of the situation could be considered “4”.

Quantitative comparative statements are universally used to describe data

Having decided to focus on QC statements in student writing, we first wanted to quantify their occurrence in professional writing. We examined the results sections in all the research articles from three issues of pan-scientific journals: Science, Nature, PLOS-One, and PNAS. We identified an average of 7–15 QC statements in each research article, with no significant difference in the mean number of QC statements among the different journals ( Fig 2 , ANOVA, p = 0.194). There was also no difference of the number of QC statements among the different disciplines (Kruskal-Wallis, p = 0.302). Out of the 60 articles examined, we found only one article that did not have a single QC statement to describe the data ( Fig 2 , Nature). These data suggest that QC statements are used in professional forms of quantitative writing to describe data in many different disciplines.

thumbnail

The mean (middle vertical line) ± SD are shown. Physical science papers are denoted in red, Biological sciences are in blue, and Social sciences are in green.

https://doi.org/10.1371/journal.pone.0203109.g002

4C scoring used to measure quantitative writing

In 2016, students practiced writing QC statements related to their data and we provided feedback (see Methods ). We measured the effectiveness of the focused writing practice using 4C scoring of QC statements from a pre- and post-test (see Methods and Table 1 ). We observed a 37% increase in student 4C scores on the post-test assessment compared to the pre-test (p < 0.001, Fig 3A ). In addition, we used 4C scoring to interrogate the impact of the writing intervention on each of the required components of the QC statement ( Fig 3B ). We observed improvements in each of the components of QC statements ( Fig 3C ). In the post-test, over 80% of students included a calculation (magnitude and direction), referred explicitly to both items being compared, and referenced the measurement context for their comparison. Only 25% of students produced completely clear statements, meaning that they were not missing any elements, and did not contain redundant or contradictory phrases. Despite the low post-test clarity score, we observed a 40% improvement in students writing completely clear statements in the post-test compared to the pre-test score ( Fig 3C ).

thumbnail

( A) Mean 4C scores of quantitative comparative statements on an identical pre- and post- test. (B) Percent of statements that contain each of the essential components of a QC statement. (C) Percent difference between the pre-test and post-test broken down by essential components of QC statements. (***t-test, p < 0.001) Error bars in A represent Standard Error of the Mean (SEM).

https://doi.org/10.1371/journal.pone.0203109.g003

We next asked if we could measure student learning gains in quantitative writing within the context of a lab report. Students write 2 lab reports per term and we provided varying forms of writing feedback over several iterations of the course (see Methods ). We scored QC statements in two lab reports from 2014 (general writing feedback only), 2015 (general writing feedback and calculation support) and 2016 (general writing feedback, calculation support, and sentence-level writing practice) ( Fig 4A ). There was no appreciable impact on writing quality when we added calculation support to general feedback in 2015 compared to feedback alone in 2014 (t test, p = 0.55, Fig 4A ). However, the addition of sentence-level QC writing support in 2016 resulted in a 22% increase in student mean 4C scores on lab report #1 compared to the same report in 2015 ( Fig 4A , t test, p < 0.05). We noticed the same trends in lab report #2 ( Fig 4B ): general writing feedback and calculation support did not improve scores as compared to general feedback alone (t test, p = 0.88). However, we observed an 80% increase in 4C scores on lab report #2 when we provided sentence-level writing practice compared to feedback alone ( Fig 4B , t test, p < 0.001). The mean 4C scores in each year for each assessment, as well as the forms of writing support employed, are summarized in Table 2 . Overall, these data suggest that sentence-level writing practice with feedback is important in helping students improve the syntax of quantitative writing.

thumbnail

(A) Mean 4C scores of QC statements from lab reports (enzyme kinetics). (B) Mean 4C scores of QC statements from second lab reports (transcriptional regulation). (C) Percent difference between the two lab reports within a given year, broken down by essential components (*p < 0.05, ***p < 0.001) Error bars in A and B represent SEM.

https://doi.org/10.1371/journal.pone.0203109.g004

thumbnail

https://doi.org/10.1371/journal.pone.0203109.t002

We were surprised to find that although the trends in the data were similar between the two lab reports, the mean 4C scores of QC statements in lab report #2 were 40% lower than in lab report #1 in both 2014 and 2015 (t test, p < 0.0001, Fig 4A versus 4B ). We predicted that writing skills would either improve with focused practice, or not change over the course of the quarter. To understand which components of the quantitative comparative statement were differentially impacted in the two lab reports, we calculated the relative frequency with which each component was included in a QC statement. Then, we calculated the difference of those frequencies between the first and second lab report for each year ( Fig 4C ). A column below the x-axis indicates that students made particular mistakes more often in lab report #2 ( Fig 4C ). In 2014, students were able to make comparisons equally well between both lab reports, but students struggled to include a quantitative difference or provide context in their evidence statements ( Fig 4C ). In 2015, in addition to general writing feedback, we also provided instructional support to calculate relative differences. We noted that students were able to incorporate both comparisons and calculations into their QC statements in both reports. However, they often omitted the context ( Fig 4C ). The frequency of mistakes made by students is significantly different between lab report #1 and lab report #2 (Chi squared, p < 0.001). These data suggest that feedback alone is not sufficient to improve quantitative writing. In 2016, we provided targeted practice at the sentence level and observed no significant difference in mean 4C scores between the two lab reports ( Fig 4B , t test, p = 0.0596), suggesting that the writing skills of students did not decrease from one lab report to the next. Additionally, students included the four elements of the QC statement equally well between the two lab reports (Chi squared, p = 0.6530, Fig 4C , 2016). Thus, when students receive targeted, sentence-level writing practice, their ability to write QC statements improves.

Quantitative writing quality is negatively impacted by complexity

We were perplexed as to why quantitative writing syntax (as measured by mean 4C scores) declined in lab report #2 compared to lab report #1 in both 2014 and 2015 ( Fig 4A and 4B ). Because we view the essential components of QC statements as analogous to syntactic rules that govern writing of QC statements, we can apply principles and theories that govern writing skills writ large. Research from writing in English Composition shows that writing ability, as measured by sentence level syntax, deteriorates when the writer is struggling with basic comprehension [ 17 , 18 ]. We hypothesized that students’ ability to write about data also might be negatively impacted when students struggled to comprehend the conceptual system they were asked to interrogate. However, we found no correlation between mean 4C scores and any assessment of conceptual material (data not shown). Nor was there an association between mean 4C scores on the lab reports and the related sections of the final (data not shown). Together, these data suggest that conceptual comprehension does not impact writing of a QC statement.

In addition to conceptual understanding, QC statements require that the writer parse through the data set to select the relevant data points to interrogate. We hypothesized that the number of data points (values) in the data set may negatively impact QC statement syntax. We calculated the complexity of different assignments (see methods ) and plotted mean 4C scores as a function of complexity index. We performed linear regression analysis on those mean 4C scores from writing samples occurring prior to formal writing intervention (2014 and 2015 lab reports, and the 2016 pre-test, Fig 5A , closed circles) and those that occur after specific writing intervention (2016 lab reports and 2016 post-test, Fig 5A , open circles). There is a strong inverse correlation between writing as measured by mean 4C scores and complexity (r 2 = 0.9471 for supported and r 2 = 0.9644 for unsupported writing, Fig 5A ). Moreover, the slopes of the lines generated from the regression analysis of mean 4C scores do not vary significantly despite writing interventions (p = 0.3449). Although the task complexity in 2016 was reduced relative to 2015, the negative impact of complexity on writing persisted. Thus, as the complexity of experimental data sets increases, the ability to write clearly decreases regardless of the writing intervention.

thumbnail

(A) Writing syntax as a function of complexity measured by 4C scoring and reported as either unsupported (closed circles) or supported (open circles) by instructional intervention. Linear regression lines are shown (unsupported, R 2 = 0.9644, supported R 2 = 0.9471). (B) Students were stratified based on overall performance in the course. Statements from students within the group were averaged and reported. Error bars represent SEM.

https://doi.org/10.1371/journal.pone.0203109.g005

Complexity differentially impacts specific populations of students

Part of the developmental process of analytical reasoning is parsing relevant from irrelevant data [ 1 ]. We asked if subpopulations of our students were more capable of parsing information from larger data sets than others. We stratified 2016 students into quartiles based on overall performance in the course. We measured the mean 4C scores from the post-test and both lab reports, and plotted mean 4C score as a function of “constrained” complexity ( Fig 5B ). At lower complexity levels, there is no significant difference between the highest performing students and the lowest performing students (t test, p >0.05). Increasing complexity also had a negative impact on most of our students. However, students in the top quartile were less affected by increased complexity than the lower 75% of the class (t test, p <0.05, Fig 5B ). These data suggest there are students who are developmentally capable of controlling the complexity of the task to focus on the skill of writing.

We set out to help STEM students write more clearly and we focused on writing a specific but universal form of evidence statement, the quantitative comparative statement ([ 14 , 15 ], Fig 2 ). By analyzing text from student lab reports and professional scientific articles, we defined the syntax of quantitative comparative statements ( Fig 1 , Table 1 ). Based on the syntactic rules we established, we scored individual quantitative comparative statements and measured writing quality (Figs 3 – 5 ). Our data show that writing quality (measured by 4C scoring) can be improved with focused practice and feedback (Figs 3 and 4 ). Finally, our data show that the circumstance, i.e., the complexity of the writing task, influences writing quality. For example, writing quality decreased when students interrogated larger data sets (Figs 4 and 5 ), but was improved when students were directed by the writing prompt to focus on a subset of the data ( Fig 5 and data not shown).

Our findings are consistent with previous research in Writing Studies and English Composition showing that syntax suffers when writers are confronted with complex and unfamiliar conceptual material [ 17 , 18 , 19 ]. The Cognitive Process Theory of Writing states that writing is a cognitive endeavor and that three main cognitive activities impact writing, the process of writing (syntax, grammar, spelling, organization, etc.), the task environment (the purpose of the writing task), and knowledge of the writing topic [ 17 , 18 , 19 ]. The theory posits that cognitive overload in any of these areas will negatively impact writing quality [ 17 , 18 ]. Consistent with the theory, our data show that writing quality is a function of explicit writing practice ( Fig 3 ), the size of the data set ( Fig 4A compared to 4B ) and scope of the writing prompts ( Fig 4B 2015 compared to 2016).

Explicit sentence level practice improves writing quality

Our data suggest that practicing isolated sentence construction improves writing quality (Figs 3 and 4 ). In every year of this study, we provided students with generalized feedback about their quantitative comparative statements (e.g., “needs quantitation” or “needs a comparison”) within the context of their lab report. In 2016, students practiced writing a QC statement related to their data but separate from the lab report. Although our feedback was the same, we observed improvement only when the feedback was given to QC statements practiced out of the lab report context ( Fig 4A compared to 4B ). Consistent with our data, the Cognitive Process Theory of Writing predicts that practicing specific syntax will increase fluency, lower the cognitive load on the writer’s working memory, and improve writing [ 17 , 18 ]. Our data are also consistent with research in English Composition demonstrating that when instructors support sentence-level syntax, they observe improved sentence level construction, improved overall composition, and higher level critical thinking [ 20 ]. In addition to improved sentence level syntax, we also observed overall quality of lab reports improved 12% in 2016 compared to the same lab report in 2015 (based on rubric scores, data not shown). If students develop a greater facility with the process of writing by practicing sentence level syntax, they have more cognitive resources available to develop and communicate their reasoning (our data, [ 20 , 21 ]).

Complexity of the writing task affects writing quality

We defined the complexity of the writing assignment as the landscape of information students must sample to interpret and communicate their data. In the case of lab reports, that information is the collected and analyzed data set ( Table 2 ). Students interrogating a larger data set produced lower quality QC statements than when they interrogated a smaller data set (compare lab report #2 to lab report #1 in both 2014 and 2015 cohorts, Fig 4 ). In lab report #2, students not only contended with a larger number of values in the dataset compared to lab report #1, but also with two different measurements. These data are consistent with the Cognitive Process Theory of Writing that suggests that when demands on the writer’s knowledge of the topic increase, the writer cannot devote as many cognitive resources to the task environment or process of writing [ 17 , 18 ]. However, we observed that the negative effect of experimental complexity on writing quality can be mitigated by writing prompts that focus students on a smaller, specific subset of the data ( Fig 5A ). More focused writing prompts and smaller data sets reduce the task environment of the assignment and allow more cognitive load to be devoted to the process of writing.

Model for writing quality as a function of complexity

Interestingly, the writing quality of students who finished the course with higher final grades (top quartile) was more resistant to increases in complexity compared to their classmates ( Fig 5B ). These data are consistent with the ideas of McCutchen who posits that as writers become more expert in their field, they have more cognitive resources to devote to clear communication. McCutchen suggests that expert writers have 1) more knowledge of their discipline, 2) more familiarity with the genres of science writing (task environment), and 3) more practice with the process of writing [ 19 ]. Based on research in Writing Studies, the Cognitive Process Theory of Writing, and the data presented here, we developed a predictive model of the impact of complexity (cognitive load) on writing quality ( Fig 6 ). We have hypothesized a linear model in which any increase in complexity negatively impacts writing quality ( Fig 6A ) and a “breakpoint” model in which writers maintain a constant level of writing quality at lower complexity levels writing quality but decline at higher levels of complexity ( Fig 6B ). We hypothesize that our top performing students have moved into a more expert space in the model by developing strategies to parse a complex task environment and ignore irrelevant information. Effectively, these skills allow them to minimize the impact of complexity on their cognitive load and maintain their writing quality even in the face of complex data sets ( Fig 5B ).

thumbnail

(A) Simple linear model of the relationship between writing quality and complexity (cognitive load). (B) Model of the relationship between writing quality and complexity in which low complexity has minimal impact on writing quality but higher complexity negatively impacts writing quality.

https://doi.org/10.1371/journal.pone.0203109.g006

4C instruction as a writing intervention

In addition to altering the writing assignment to decrease cognitive load on the students, we also think it will be important to provide students with syntactic structures at the sentence level. In this study, we did not use 4C annotation as an instructional intervention so that 4C scoring would be a more objective measure of writing quality. But, subsequent to this study, we and others have used 4C annotation as an instructional tool and found that student writing improves dramatically (data not shown). Although some argue that using overly structured or templated sentences can stifle creativity, providing basic structure does not necessarily lead to pedantic writing [ 22 ]. A commonly used text in college writing, “They say, I say,” determined that providing templates for constructing opinions and arguments gives students a greater ability to express their thoughts [ 23 ]. Specifically, weaker writers who lack intuitive understanding of how to employ these writing structures benefit from the use of explicit templates, while more advanced writers already employ these writing structures in a fluid and nuanced manner [ 23 ].

4C template as a foundation of quantitative writing

As students become more expert writers and write more complex and sophisticated sentences, they may choose to deviate from the proscribed sentence structure and make editorial decisions about the elements of the quantitative comparison in the context of their argument [ 23 ]. In fact, when we examined the 4C scores of quantitative comparative statements in published literature, we found that, on average, professional scientists write comparisons that are missing one of the three elements (4C score = 1.89 +/- 0.05, n = 281). The expert writer may eliminate an element of the evidence statement because he/she presumes a more sophisticated audience is capable of inferring the missing element from prior knowledge or within the context of the argument. Or, the author may provide all elements of quantitative comparison in their argument but not within a single sentence.

Helping students become expert writers

Based on our research, we think novice writers should write for novice readers and include all of the syntactic elements of a QC statement. As students develop their professional voice, the 4C template will serve as a touchstone to frame their quantitative arguments, and the editorial choices they make will depend on the sophistication of their audience. Students will write clear arguments even if those elements no longer reside within the rigid structure of a single QC statement with a perfect 4C score. We are confident that by supporting student writing at the level of syntax, we are building a solid foundation that will give students greater capacity for reasoning in the face of increasing experimental complexity.

Supporting information

S1 fig. pre test / post test..

Example of the pre- and post-test used to assess the ability to interpret graphical and tabular data and write a quantitative comparative statement.

https://doi.org/10.1371/journal.pone.0203109.s001

S2 Fig. Lab Report Rubric.

A detailed rubric provides students with explicit guidance for each lab report. This rubric corresponds with the experiment exploring enzyme kinetics of β-galactosidase.

https://doi.org/10.1371/journal.pone.0203109.s002

Acknowledgments

The authors thank Dr. Jessica Santangelo for critical feedback on the manuscript and unwavering support for this project. This study was initially developed as part of the Biology Scholars Program (Research Residency) through the American Society for Microbiology and the National Science Foundation (T.R.)

  • 1. American Association for the Advancement of Science. Vision and change in undergraduate biology education: a call to action. Brewer Cand Smith D., Eds. American Association for the Advancement of Science. 2011. 1–100. http://visionandchange.org/files/2013/11/aaas-VISchange-web1113.pdf
  • View Article
  • Google Scholar
  • 3. Bazerman C. Shaping Written Knowledge: The Genre and Activity of the Experimental Article in Science. University of Wisconsin Press; 1988.
  • PubMed/NCBI
  • 15. Miller JE. The Chicago Guide to Writing about Numbers, Second Edition. 2nd ed. Chicago: Chicago University Press; 2015
  • 20. Languis ML, Buffer JJ Martin D, Naour PJ. Cognitive Science: Contributions to Educational Practice Routledge; 2012. 304 p.
  • 23. Graff G, Cathy Birkenstein. They say / I say: the moves that matter in academic writing. New York: W.W. Norton & Co.; 2010.
  • Research Report: Definition, Types + [Writing Guide]

busayo.longe

One of the reasons for carrying out research is to add to the existing body of knowledge. Therefore, when conducting research, you need to document your processes and findings in a research report. 

With a research report, it is easy to outline the findings of your systematic investigation and any gaps needing further inquiry. Knowing how to create a detailed research report will prove useful when you need to conduct research.  

What is a Research Report?

A research report is a well-crafted document that outlines the processes, data, and findings of a systematic investigation. It is an important document that serves as a first-hand account of the research process, and it is typically considered an objective and accurate source of information.

In many ways, a research report can be considered as a summary of the research process that clearly highlights findings, recommendations, and other important details. Reading a well-written research report should provide you with all the information you need about the core areas of the research process.

Features of a Research Report 

So how do you recognize a research report when you see one? Here are some of the basic features that define a research report. 

  • It is a detailed presentation of research processes and findings, and it usually includes tables and graphs. 
  • It is written in a formal language.
  • A research report is usually written in the third person.
  • It is informative and based on first-hand verifiable information.
  • It is formally structured with headings, sections, and bullet points.
  • It always includes recommendations for future actions. 

Types of Research Report 

The research report is classified based on two things; nature of research and target audience.

Nature of Research

  • Qualitative Research Report

This is the type of report written for qualitative research . It outlines the methods, processes, and findings of a qualitative method of systematic investigation. In educational research, a qualitative research report provides an opportunity for one to apply his or her knowledge and develop skills in planning and executing qualitative research projects.

A qualitative research report is usually descriptive in nature. Hence, in addition to presenting details of the research process, you must also create a descriptive narrative of the information.

  • Quantitative Research Report

A quantitative research report is a type of research report that is written for quantitative research. Quantitative research is a type of systematic investigation that pays attention to numerical or statistical values in a bid to find answers to research questions. 

In this type of research report, the researcher presents quantitative data to support the research process and findings. Unlike a qualitative research report that is mainly descriptive, a quantitative research report works with numbers; that is, it is numerical in nature. 

Target Audience

Also, a research report can be said to be technical or popular based on the target audience. If you’re dealing with a general audience, you would need to present a popular research report, and if you’re dealing with a specialized audience, you would submit a technical report. 

  • Technical Research Report

A technical research report is a detailed document that you present after carrying out industry-based research. This report is highly specialized because it provides information for a technical audience; that is, individuals with above-average knowledge in the field of study. 

In a technical research report, the researcher is expected to provide specific information about the research process, including statistical analyses and sampling methods. Also, the use of language is highly specialized and filled with jargon. 

Examples of technical research reports include legal and medical research reports. 

  • Popular Research Report

A popular research report is one for a general audience; that is, for individuals who do not necessarily have any knowledge in the field of study. A popular research report aims to make information accessible to everyone. 

It is written in very simple language, which makes it easy to understand the findings and recommendations. Examples of popular research reports are the information contained in newspapers and magazines. 

Importance of a Research Report 

  • Knowledge Transfer: As already stated above, one of the reasons for carrying out research is to contribute to the existing body of knowledge, and this is made possible with a research report. A research report serves as a means to effectively communicate the findings of a systematic investigation to all and sundry.  
  • Identification of Knowledge Gaps: With a research report, you’d be able to identify knowledge gaps for further inquiry. A research report shows what has been done while hinting at other areas needing systematic investigation. 
  • In market research, a research report would help you understand the market needs and peculiarities at a glance. 
  • A research report allows you to present information in a precise and concise manner. 
  • It is time-efficient and practical because, in a research report, you do not have to spend time detailing the findings of your research work in person. You can easily send out the report via email and have stakeholders look at it. 

Guide to Writing a Research Report

A lot of detail goes into writing a research report, and getting familiar with the different requirements would help you create the ideal research report. A research report is usually broken down into multiple sections, which allows for a concise presentation of information.

Structure and Example of a Research Report

This is the title of your systematic investigation. Your title should be concise and point to the aims, objectives, and findings of a research report. 

  • Table of Contents

This is like a compass that makes it easier for readers to navigate the research report.

An abstract is an overview that highlights all important aspects of the research including the research method, data collection process, and research findings. Think of an abstract as a summary of your research report that presents pertinent information in a concise manner. 

An abstract is always brief; typically 100-150 words and goes straight to the point. The focus of your research abstract should be the 5Ws and 1H format – What, Where, Why, When, Who and How. 

  • Introduction

Here, the researcher highlights the aims and objectives of the systematic investigation as well as the problem which the systematic investigation sets out to solve. When writing the report introduction, it is also essential to indicate whether the purposes of the research were achieved or would require more work.

In the introduction section, the researcher specifies the research problem and also outlines the significance of the systematic investigation. Also, the researcher is expected to outline any jargons and terminologies that are contained in the research.  

  • Literature Review

A literature review is a written survey of existing knowledge in the field of study. In other words, it is the section where you provide an overview and analysis of different research works that are relevant to your systematic investigation. 

It highlights existing research knowledge and areas needing further investigation, which your research has sought to fill. At this stage, you can also hint at your research hypothesis and its possible implications for the existing body of knowledge in your field of study. 

  • An Account of Investigation

This is a detailed account of the research process, including the methodology, sample, and research subjects. Here, you are expected to provide in-depth information on the research process including the data collection and analysis procedures. 

In a quantitative research report, you’d need to provide information surveys, questionnaires and other quantitative data collection methods used in your research. In a qualitative research report, you are expected to describe the qualitative data collection methods used in your research including interviews and focus groups. 

In this section, you are expected to present the results of the systematic investigation. 

This section further explains the findings of the research, earlier outlined. Here, you are expected to present a justification for each outcome and show whether the results are in line with your hypotheses or if other research studies have come up with similar results.

  • Conclusions

This is a summary of all the information in the report. It also outlines the significance of the entire study. 

  • References and Appendices

This section contains a list of all the primary and secondary research sources. 

Tips for Writing a Research Report

  • Define the Context for the Report

As is obtainable when writing an essay, defining the context for your research report would help you create a detailed yet concise document. This is why you need to create an outline before writing so that you do not miss out on anything. 

  • Define your Audience

Writing with your audience in mind is essential as it determines the tone of the report. If you’re writing for a general audience, you would want to present the information in a simple and relatable manner. For a specialized audience, you would need to make use of technical and field-specific terms. 

  • Include Significant Findings

The idea of a research report is to present some sort of abridged version of your systematic investigation. In your report, you should exclude irrelevant information while highlighting only important data and findings. 

  • Include Illustrations

Your research report should include illustrations and other visual representations of your data. Graphs, pie charts, and relevant images lend additional credibility to your systematic investigation.

  • Choose the Right Title

A good research report title is brief, precise, and contains keywords from your research. It should provide a clear idea of your systematic investigation so that readers can grasp the entire focus of your research from the title. 

  • Proofread the Report

Before publishing the document, ensure that you give it a second look to authenticate the information. If you can, get someone else to go through the report, too, and you can also run it through proofreading and editing software. 

How to Gather Research Data for Your Report  

  • Understand the Problem

Every research aims at solving a specific problem or set of problems, and this should be at the back of your mind when writing your research report. Understanding the problem would help you to filter the information you have and include only important data in your report. 

  • Know what your report seeks to achieve

This is somewhat similar to the point above because, in some way, the aim of your research report is intertwined with the objectives of your systematic investigation. Identifying the primary purpose of writing a research report would help you to identify and present the required information accordingly. 

  • Identify your audience

Knowing your target audience plays a crucial role in data collection for a research report. If your research report is specifically for an organization, you would want to present industry-specific information or show how the research findings are relevant to the work that the company does. 

  • Create Surveys/Questionnaires

A survey is a research method that is used to gather data from a specific group of people through a set of questions. It can be either quantitative or qualitative. 

A survey is usually made up of structured questions, and it can be administered online or offline. However, an online survey is a more effective method of research data collection because it helps you save time and gather data with ease. 

You can seamlessly create an online questionnaire for your research on Formplus . With the multiple sharing options available in the builder, you would be able to administer your survey to respondents in little or no time. 

Formplus also has a report summary too l that you can use to create custom visual reports for your research.

Step-by-step guide on how to create an online questionnaire using Formplus  

  • Sign into Formplus

In the Formplus builder, you can easily create different online questionnaires for your research by dragging and dropping preferred fields into your form. To access the Formplus builder, you will need to create an account on Formplus. 

Once you do this, sign in to your account and click on Create new form to begin. 

  • Edit Form Title : Click on the field provided to input your form title, for example, “Research Questionnaire.”
  • Edit Form : Click on the edit icon to edit the form.
  • Add Fields : Drag and drop preferred form fields into your form in the Formplus builder inputs column. There are several field input options for questionnaires in the Formplus builder. 
  • Edit fields
  • Click on “Save”
  • Form Customization: With the form customization options in the form builder, you can easily change the outlook of your form and make it more unique and personalized. Formplus allows you to change your form theme, add background images, and even change the font according to your needs. 
  • Multiple Sharing Options: Formplus offers various form-sharing options, which enables you to share your questionnaire with respondents easily. You can use the direct social media sharing buttons to share your form link to your organization’s social media pages.  You can also send out your survey form as email invitations to your research subjects too. If you wish, you can share your form’s QR code or embed it on your organization’s website for easy access. 

Conclusion  

Always remember that a research report is just as important as the actual systematic investigation because it plays a vital role in communicating research findings to everyone else. This is why you must take care to create a concise document summarizing the process of conducting any research. 

In this article, we’ve outlined essential tips to help you create a research report. When writing your report, you should always have the audience at the back of your mind, as this would set the tone for the document. 

Logo

Connect to Formplus, Get Started Now - It's Free!

  • ethnographic research survey
  • research report
  • research report survey
  • busayo.longe

Formplus

You may also like:

How to Write a Problem Statement for your Research

Learn how to write problem statements before commencing any research effort. Learn about its structure and explore examples

quantitative research report writing

Assessment Tools: Types, Examples & Importance

In this article, you’ll learn about different assessment tools to help you evaluate performance in various contexts

Ethnographic Research: Types, Methods + [Question Examples]

Simple guide on ethnographic research, it types, methods, examples and advantages. Also highlights how to conduct an ethnographic...

21 Chrome Extensions for Academic Researchers in 2022

In this article, we will discuss a number of chrome extensions you can use to make your research process even seamless

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

  • How to Cite
  • Language & Lit
  • Rhyme & Rhythm
  • The Rewrite
  • Search Glass

How to Write a Quantitative Analysis Report

A quantitative analysis can give people the necessary information to make decisions about policy and planning for a program or organization. A good quantitative analysis leaves no questions about the quality of data and the authority of the conclusions. Whether in school completing a project or at the highest levels of government evaluating programs, knowing how to write a quality quantitative analysis is helpful. A quantitative analysis uses hard data, such as survey results, and generally requires the use of computer spreadsheet applications and statistical know-how.

Explain why the report is being written in the introduction. Point out the need that is being filled and describe any prior research that has been conducted in the same field. The introduction should also say what future research should be done to thoroughly answer the questions you set out to research. You should also state for whom the report is being prepared.

Describe the methods used in collecting data for the report. Discuss how the data was collected. If a survey was used to collect data, tell the reader how it was designed. You should let the reader know if a survey pilot test was distributed first. Detail the target population, or the group of people being studied. Provide the sample size, or the number of people surveyed. Tell the reader if the sample was representative of the target population, and explain whether you collected enough surveys. Break down the data by gender, race, age and any other pertinent subcategory. Tell the reader about any problems with data collection, including any biases in the survey, missing results or odd responses from people surveyed.

Create graphs showing visual representations of the results. You can use bar graphs, line graphs or pie charts depending to convey the data. Only write about the pertinent findings, or the ones you think matter most, in the body of the report. Any other results can be attached in the appendices at the end of the report. The raw data, along with copies of a blank survey should be in the appendices as well. The reader can refer to all the data to inform his own opinions about the findings.

Write conclusions after evaluating all the data. The conclusion can include an action item for the reader to accomplish. It can also advise that more research needs to be done before any solid conclusions can be made. Only conclusions that can be made based on the findings should be included in the report.

Write an executive summary to attach at the beginning of the report. Executive summaries are quick one to two page recaps of what is in the report. They include shorter versions of the introductions, methods, findings and conclusions. Executive summaries serve to allow readers to quickly understand what is said in the report.

Things You'll Need

  • Syracuse University: Practicum in Public Policy
  • Georgia Tech: Questionnaire Design

This article was written by the CareerTrend team, copy edited and fact checked through a multi-point auditing system, in efforts to ensure our readers only receive the best information. To submit your questions or ideas, or to simply learn more about CareerTrend, contact us [here](http://careertrend.com/about-us).

quantitative research report writing

Research Rockstar Training Portal

Writing quantitative research reports.

What makes for a great quantitative research report? In this class taught by Kathryn Korostoff, you learn how to write a great quantitative market research report—even if you are new to report writing—in a fun practical way.

As a student, you get lots of helpful videos, readings and reference material.

quantitative research report writing

Take a look inside the course

Course Curriculum

Getting Ready to Rock!

Access Your Live Sessions Here

Pre-assignment

Optional Pre-assignment [Slide File Download]

View the pre-assignment slides online

Know Your Audience, Planning for Impact

Lesson Plan

Know Your Audience, Planning for Impact [Lecture]

Know Your Audience, Planning for Impact [Slide Viewer]

Know Your Audience, Planning for Impact [Slide File Download]

Rockstar Practice: Comparing Two Report Styles

Research Report Example A

Research Report Example B

Know Your Audience [Downloadable Job Aid]

Part 1 Quiz: What Makes a PowerPoint Report “Modern” versus "Dated"?

Writing Tips: Style, Voice, Powerful Words

Bonus: Free Rockstar Report Template (PPT)

Market Research Report Outline and Tips [Downloadable Job Aid]

Reading & Summarizing Data

Reading & Summarizing Data [Lecture]

Reading & Summarizing Data [Slide Viewer]

Reading & Summarizing Data [Slide File Download]

Scope & Methodology Checklist [Downloadable Job Aid]

Rockstar Practice: Summarizing Survey Data

Management Summaries That Ignite Insights

Management Summaries That Ignite Insights [Lecture]

Management Summaries That Ignite Insights [Slide Viewer]

Management Summaries That Ignite Insights [Slide File Download]

Final Steps That Boost Credibility & Impact

Final Steps That Boost Credibility & Impact [Lecture]

Final Steps That Boost Credibility & Impact [Slide Viewer]

Final Steps That Boost Credibility & Impact [Slide File Download]

Planning Your Post-Mortem Process [Downloadable Job Aid]

Bonus Resources

Recommended Reading & Viewing

Quantitative Research Report Examples

4 Survey Report Samples [Mini-Lesson]

Scope & Methodology [Mini-Lesson]

Your Rockstar Status Awaits: Final Assessment

Our Short Feedback Survey

Your candid feedback helps us improve

quantitative research report writing

About this course

  • 6 hours of video content

Meet Your Instructor

quantitative research report writing

Kathryn Korostoff

President, lead instructor research rockstar training and staffing, ready to rock.

Get access to this course and 25+ market research courses with a Backstage Pass membership.

  • ⋮⋮⋮ ×

Writing w/Calculator

Characteristics of Quantitative Writing Assignments:

  • Unlike conventional (non-quantitative) writing assignments, QW assignments require students to analyze and interpret quantitative data . Writers must use numbers in a variety of ways to help them define a problem, to see alternative points of view, to speculate about causes and effects, and to create evidence-based arguments. Often they must learn to construct and reference their own tables or graphs.
  • Quantitative writing generally presents students with an ' ill structured problem, ' requiring the analysis of quantitative data in an ambiguous context without a clear right answer. Unlike a math "story problem," which is usually a 'well-structured problem' with a single right answer, a QW assignment requires students to formulate a claim for a best solution and support it with reasons and evidence. Well structured versus Ill structured problems How a story problem differs from a QW Assignment
  • Quantitative writing forces students to contemplate the meaning of numbers , to understand where the numbers come from and how they are presented. Students must consider, for example, the different effects of using ordinal numbers versus percentages, means versus medians, raw numbers versus adjusted numbers, exact numbers versus approximated or rounded numbers, and so forth. At more advanced levels, students must understand the interpretive meaning of a standard deviation, the function of a chi square, or the purpose of specific kinds of algorithms in their disciplines. In all cases, they must consider their communicative goals and their audience's interests, needs, and background and to use numbers effectively within that rhetorical context.

Types of Quantitative Writing Assignments

Quantitative Writing doesn't have to mean writing a research paper. In fact, the majority of QW assignments are less ambitious than that. QW assignments can be designed in a variety of forms as indicated below.

  • Genre, audience and purpose - Good writing assignments include a rhetorical context for authors: What form should the writing take, to whom is it addressed and for what rhetorical purpose?
  • Length, stakes and complexity - QW assignments can range from very short to very long; they can be weighted little or much towards a student's grade; and they can employ simple or complex quantitative reasoning.
  • Informal writing - Quantitative writing need not be formal writing.
  • QW in formats other than essays - QW assignments need not be papers, per se. learn more about different types of QW assignments

Example of a Quantitative Writing Assignment

The following contains the core sentences from a representative QW assignment.

"Over the last century, the number of salmon that return to California rivers has been decreasing. Is this a serious problem? Should anything be done in response to this situation? You will investigate questions like this in your essay. The table below gives data for the number of Chinook salmon (in thousands) from 1986 to 2000."

This challenging assignment asks students to create an argument about salmon based on tabular data that students must analyze and interpret. To do the assignment, students must make inferences from the table, do calculations, convert tabular data to bar or line graphs, and then use the data meaningfully in their own arguments. The quantitative methods required are only moderately complex, but the questions posed "Is this a serious problem? Should anything be done?" make clear that this is an ill-structured problem. In the complete assignment , note how the instructors (Michael Burke and Jean Mach of the College of San Mateo) include intermediate steps that help guide students through their analysis of the data.

The salmon problem is just one example of the dozens of ways that instructors can create engaging quantitative writing assignments.

« Previous Page       Next Page »

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Qualitative vs. Quantitative Research | Differences, Examples & Methods

Qualitative vs. Quantitative Research | Differences, Examples & Methods

Published on April 12, 2019 by Raimo Streefkerk . Revised on June 22, 2023.

When collecting and analyzing data, quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings. Both are important for gaining different kinds of knowledge.

Common quantitative methods include experiments, observations recorded as numbers, and surveys with closed-ended questions.

Quantitative research is at risk for research biases including information bias , omitted variable bias , sampling bias , or selection bias . Qualitative research Qualitative research is expressed in words . It is used to understand concepts, thoughts or experiences. This type of research enables you to gather in-depth insights on topics that are not well understood.

Common qualitative methods include interviews with open-ended questions, observations described in words, and literature reviews that explore concepts and theories.

Table of contents

The differences between quantitative and qualitative research, data collection methods, when to use qualitative vs. quantitative research, how to analyze qualitative and quantitative data, other interesting articles, frequently asked questions about qualitative and quantitative research.

Quantitative and qualitative research use different research methods to collect and analyze data, and they allow you to answer different kinds of research questions.

Qualitative vs. quantitative research

Quantitative and qualitative data can be collected using various methods. It is important to use a data collection method that will help answer your research question(s).

Many data collection methods can be either qualitative or quantitative. For example, in surveys, observational studies or case studies , your data can be represented as numbers (e.g., using rating scales or counting frequencies) or as words (e.g., with open-ended questions or descriptions of what you observe).

However, some methods are more commonly used in one type or the other.

Quantitative data collection methods

  • Surveys :  List of closed or multiple choice questions that is distributed to a sample (online, in person, or over the phone).
  • Experiments : Situation in which different types of variables are controlled and manipulated to establish cause-and-effect relationships.
  • Observations : Observing subjects in a natural environment where variables can’t be controlled.

Qualitative data collection methods

  • Interviews : Asking open-ended questions verbally to respondents.
  • Focus groups : Discussion among a group of people about a topic to gather opinions that can be used for further research.
  • Ethnography : Participating in a community or organization for an extended period of time to closely observe culture and behavior.
  • Literature review : Survey of published works by other authors.

A rule of thumb for deciding whether to use qualitative or quantitative data is:

  • Use quantitative research if you want to confirm or test something (a theory or hypothesis )
  • Use qualitative research if you want to understand something (concepts, thoughts, experiences)

For most research topics you can choose a qualitative, quantitative or mixed methods approach . Which type you choose depends on, among other things, whether you’re taking an inductive vs. deductive research approach ; your research question(s) ; whether you’re doing experimental , correlational , or descriptive research ; and practical considerations such as time, money, availability of data, and access to respondents.

Quantitative research approach

You survey 300 students at your university and ask them questions such as: “on a scale from 1-5, how satisfied are your with your professors?”

You can perform statistical analysis on the data and draw conclusions such as: “on average students rated their professors 4.4”.

Qualitative research approach

You conduct in-depth interviews with 15 students and ask them open-ended questions such as: “How satisfied are you with your studies?”, “What is the most positive aspect of your study program?” and “What can be done to improve the study program?”

Based on the answers you get you can ask follow-up questions to clarify things. You transcribe all interviews using transcription software and try to find commonalities and patterns.

Mixed methods approach

You conduct interviews to find out how satisfied students are with their studies. Through open-ended questions you learn things you never thought about before and gain new insights. Later, you use a survey to test these insights on a larger scale.

It’s also possible to start with a survey to find out the overall trends, followed by interviews to better understand the reasons behind the trends.

Qualitative or quantitative data by itself can’t prove or demonstrate anything, but has to be analyzed to show its meaning in relation to the research questions. The method of analysis differs for each type of data.

Analyzing quantitative data

Quantitative data is based on numbers. Simple math or more advanced statistical analysis is used to discover commonalities or patterns in the data. The results are often reported in graphs and tables.

Applications such as Excel, SPSS, or R can be used to calculate things like:

  • Average scores ( means )
  • The number of times a particular answer was given
  • The correlation or causation between two or more variables
  • The reliability and validity of the results

Analyzing qualitative data

Qualitative data is more difficult to analyze than quantitative data. It consists of text, images or videos instead of numbers.

Some common approaches to analyzing qualitative data include:

  • Qualitative content analysis : Tracking the occurrence, position and meaning of words or phrases
  • Thematic analysis : Closely examining the data to identify the main themes and patterns
  • Discourse analysis : Studying how communication works in social contexts

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

A research project is an academic, scientific, or professional undertaking to answer a research question . Research projects can take many forms, such as qualitative or quantitative , descriptive , longitudinal , experimental , or correlational . What kind of research approach you choose will depend on your topic.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Streefkerk, R. (2023, June 22). Qualitative vs. Quantitative Research | Differences, Examples & Methods. Scribbr. Retrieved March 31, 2024, from https://www.scribbr.com/methodology/qualitative-quantitative-research/

Is this article helpful?

Raimo Streefkerk

Raimo Streefkerk

Other students also liked, what is quantitative research | definition, uses & methods, what is qualitative research | methods & examples, mixed methods research | definition, guide & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Book cover

Writing about Quantitative Research in Applied Linguistics

  • © 2014
  • Lindy Woodrow 0

University of Sydney, Australia

You can also search for this author in PubMed   Google Scholar

22k Accesses

24 Citations

2 Altmetric

  • Table of contents

About this book

Authors and affiliations, about the author, bibliographic information.

  • Publish with us

This is a preview of subscription content, log in via an institution to check access.

Access this book

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents(15 chapters)

Front matter, introduction.

Lindy Woodrow

General Considerations in Writing about Quantitative Research

Writing about research design, reliability, validity and ethics, writing about participants, presenting descriptive statistics, writing about specific statistical procedures, writing about t-tests, anova, ancova and manova, writing about regression, writing about correlation, writing about factor analysis, writing about structural equation modelling, writing about non-parametric tests, publishing quantitative research in applied linguistics, publishing research: journal articles, publishing research: book chapters and books, academic style, back matter.

  • applied linguistics
  • publishing research
  • quantitative methods

Book Title : Writing about Quantitative Research in Applied Linguistics

Authors : Lindy Woodrow

DOI : https://doi.org/10.1057/9780230369955

Publisher : Palgrave Macmillan London

eBook Packages : Palgrave Language & Linguistics Collection , Education (R0)

Copyright Information : Palgrave Macmillan, a division of Macmillan Publishers Limited 2014

Hardcover ISBN : 978-0-230-36996-2 Published: 25 September 2014

Softcover ISBN : 978-0-230-36997-9 Published: 25 September 2014

eBook ISBN : 978-0-230-36995-5 Published: 28 September 2014

Edition Number : 1

Number of Pages : XX, 199

Topics : Applied Linguistics , Language Teaching , Science, Humanities and Social Sciences, multidisciplinary , Printing and Publishing

Policies and ethics

  • Find a journal
  • Track your research
  • Sample Research

FREE 10+ Quantitative Research Report Samples & Templates in PDF | MS Word

quantitative research report image

According to an article from Chron, research is useful for businesses and organizations, especially in production, marketing, and financial practices. The research will help them predict trends in the marketplace, project sales, as well as identify potential problems and opportunities. Conducting research for business can be done in many ways, and one of the most appropriate ways is using quantitative research . This quantifies data into numbers that are easy to analyze, and it will be presented through a report, which is called a quantitative research report. In this article, you will be able to learn and understand the purpose of quantitative research, as well as the importance of using a quantitative research report. Scroll down below.

Quantitative Research Report

Free 10+ quantitative research report samples & templates, 1. business quantitative research report, 2. market testing report template, 3. value assessment quantitative research report, 4. mobile market quantitative research report, 5. quantitative and qualitative research report, 6. global market quantitative research report, 7. critiquing quantitative research report, 8. quantitative research report template, 9. quantitative research report sample, 10. formal quantitative research report, 11. standard quantitative research report, what is a quantitative research report, how to write a quantitative research report, 1. write the introduction, 2. describe the method used, 3. present the result, 4. state your conclusion, 5. add recommendations, faq’s, what are the four classifications of quantitative research, what are the characteristics of quantitative research, what is the difference between quantitative and qualitative research, what are the different methods of quantitative research.

Here are some professionally written quantitative research report samples and templates preformatted in PDF and MS Word file formats. These sample templates contain pre-made suggestive content that you can use as a reference. These templates are also available for instant download. Check them out below!

business quantitative research report

Size: 281 KB

market testing report template

Size: 551 KB

quantitative and qualitative research report

Size: 448 KB

global market quantitative research report

Size: 950 KB

critiquing quantitative research report

Size: 188 KB

quantitative research report template

Size: 179 KB

quantitative research report sample

Size: 787 KB

formal quantitative research report

Size: 398 KB

Business research serves as an essential role in business. It allows management to determine opportunities and competitions in every aspect of the business. This enables a business to operate and function effectively and efficiently. There are different ways to conduct business research, and one of the most commonly used ones is quantitative research. 

According to an article from Medium, quantitative research objectively tests or measures the behavior and attitude of the market that answers to a particular business market research objective. The data collected in this type of research are in numerical form, which is collected through surveys, questionnaires, etc. And the research data analysis and evaluation are presented through a quantitative research report.

A quantitative research report refers to a document that conveys and interprets the data collected during the quantitative research. In this, the quantitative research data are displayed and presented in diagrams, graphs, tables, etc. to make the information more accessible and understandable by the management. 

A quantitative research report is the end-result of quantitative research. It contains information regarding the research conducted. Writing a clear and accurate research analysis report for your quantitative research is necessary since it interprets important information. To help you with that, we have provided you some tips below. Here’s how.

Start making a quantitative research report by writing an introduction. The introduction must contain a summary of information about your research—an overview of the topic, the significance, objectives, and scope. The introduction must outline every important detail of your quantitative research.

In this section, you have to place the method used in the research, which is quantitative. Provide a brief description of the quantitative research, as well as the reason why you chose that method. Also, point out the common data collection methods that you used, which are surveys, interviews, whether paper, online or phone, etc.

After the methodology, the next thing you need to do is to present your quantitative research results and findings. Since quantitative research entails numerical data, you have to use graphs, tables, diagrams, etc. in doing so. Whatever tool you use, as long as it shows the figures clearly. Also, provide a brief explanation of each finding.

Start your conclusion with a brief statement of what the research is all about and its significance to your company. You may get some ideas from your introduction. However, refrain from repeating it word by word instead, paraphrase or summarize the main ideas of your research. Nevertheless, your conclusion must be a statement of your quantitative research and its findings.

Recommendations are present in every research, whether in academic research or business research. So, for your quantitative research report, you must also have recommendations. This area provides suggestions or assumptions that are based on the findings and conclusion of your research report. This is also the section where you give suggestions about some areas of the study that need further research.

There are four classifications of quantitative research that you can use in your business research. These classifications include descriptive, correlational, causal-comparative or quasi-experimental, and experimental research. Descriptive research describes the status of a currently identified variable, and correlational research determines the relationship between two or more variables. The causal-comparative or quasi-experimental research establishes cause and effect relationships among variables, and experimental research, which is called true experimentation that verifies the relationships of a group of variables using the scientific method.

Quantitative research has several characteristics, which include that data are collected using structured research instruments, results are based on larger sample sizes, the research study can be repeated since it is reliable. This also includes the research study that uses tools, such as surveys, to gather data, and the data are gathered in the form of numbers that are presented using charts, figures, and statistics.

The difference between quantitative and qualitative research is that quantitative research focuses on numerical data. It is used to quantify behaviors, opinions, etc. This research study specifies what is measured and how it is measured. On the other hand, qualitative research focuses on textual data. This research study is used to gain an in-depth understanding of the experiences, thoughts, opinions, and trends of an individual.

There are several methods that can be used in conducting quantitative research to gather data. These methods include interviews, probability sampling, observations, document reviews, surveys, and questionnaires. These quantitative research methods are commonly used by businesses and organizations and are proven effective in gathering accurate data.

A well-written quantitative business research report allows businesses to analyze business data and figures comprehensively. With the help of this, they will be able to generate information that will help them in decision-making, improve business operations, and form concrete marketing and business strategies for sustainability and success.

Related Posts

Free 10+ content validity samples & templates in pdf, free 10+ construct validity samples & templates in ms word | pdf, free 10+ code of human research ethics samples & templates in ms word | pdf, free 10+ biography research report samples and templates in pdf, free 10+ system documentation samples & templates in ms word | pdf, free 10+ process document samples & templates in ms word | pdf, free 10+ action research samples & templates in pdf, free 10+ longitudinal research samples & templates in pdf | ms word, free 10+ causal research samples & templates in ms word | pdf, free 10+ client discovery samples & templates in ms word | pdf, free 10+ null hypothesis samples & templates in ms word | pdf, free 9+ product knowledge samples & templates in pdf, free 10+ software documentation samples & templates in ms word | pdf, free 10+ exploratory research samples & templates in pdf | ms word, free 10+ experimental research samples & templates in ms word | pdf, free 6+ sample research analysis templates in ms word pdf, free 7+ quantitative chemical analysis samples in ms word pdf, free 9+ market research report samples in pdf ms word, free 10+ sample data analysis templates in excel.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

39k Accesses

749 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

quantitative research report writing

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

quantitative research report writing

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

quantitative research report writing

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

quantitative research report writing

COMMENTS

  1. A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

    The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question.1 An excellent research question clarifies the research writing while facilitating understanding of the research topic ...

  2. Quantitative Methods

    An Overview of Quantitative Research in Composition and TESOL. Department of English, Indiana University of Pennsylvania; Hopkins, Will G. "Quantitative Research Design." Sportscience 4, 1 (2000); "A Strategy for Writing Up Research Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper."

  3. Writing Quantitative Research Studies

    Summarizing quantitative data and its effective presentation and discussion can be challenging for students and researchers. This chapter provides a framework for adequately reporting findings from quantitative analysis in a research study for those contemplating to write a research paper. The rationale underpinning the reporting methods to ...

  4. Guidelines for Reporting Quantitative Methods and Results in Primary

    These guidelines, commissioned and vetted by the board of directors of Language Learning, outline the basic expectations for reporting of quantitative primary research with a specific focus on Method and Results sections. The guidelines are based on issues raised in: Norris, J. M., Ross, S., & Schoonen, R. (Eds.). (2015).

  5. What Is Quantitative Research?

    Revised on June 22, 2023. Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing ...

  6. How to Write an APA Methods Section

    Research papers in the social and natural sciences often follow APA style. This article focuses on reporting quantitative research methods. In your APA methods section, you should report enough information to understand and replicate your study, including detailed information on the sample, measures, and procedures used.

  7. Quantitative research design (JARS-Quant)

    Quantitative Research Design (JARS-Quant) The current JARS-Quant standards, released in 2018, expand and revise the types of research methodologies covered in the original JARS, which were published in 2008. JARS-Quant include guidance for manuscripts that report. In addition, JARS-Quant now divides hypotheses, analyses, and conclusions ...

  8. 16. Reporting quantitative results

    Execute a quantitative research report using key elements for accuracy and openness. So you've completed your quantitative analyses and are ready to report your results. We're going to spend some time talking about what matters in quantitative research reports, but the very first thing to understand is this: openness with your data and ...

  9. Improving quantitative writing one sentence at a time

    Scientific writing, particularly quantitative writing, is difficult to master. To help undergraduate students write more clearly about data, we sought to deconstruct writing into discrete, specific elements. We focused on statements typically used to describe data found in the results sections of research articles (quantitative comparative statements, QC). In this paper, we define the ...

  10. PDF Writing up Quantitative Research in the Social and

    Section 1: Foundations for Writing Quantitative Research Reports in the Social and Behavioral Sciences Chapter 1: Methodological Elements of Quantitative Research 3 Introduction3 Forms of Quantitative Research 3 The Definitional Hierarchy 6 Types of Variables 7 Validity and Reliability 9 Summary and Practice 12

  11. Writing About Quantitative Research

    Abstract. This chapter focuses on how to communicate the results of quantitative research. The first section of this chapter focuses on writing for scholarly audiences, as in the context of a research paper or an academic conference presentation. The second section of this chapter focuses on writing for policymaker or practitioner audiences.

  12. Research Report: Definition, Types + [Writing Guide]

    A quantitative research report is a type of research report that is written for quantitative research. ... Guide to Writing a Research Report. A lot of detail goes into writing a research report, and getting familiar with the different requirements would help you create the ideal research report. A research report is usually broken down into ...

  13. Research Report

    The study utilized a quantitative research design, which involved a survey questionnaire administered to a sample of 200 high school students. ... Overall, the timing of when to write a research report depends on the purpose of the research, the expectations of the audience, and any regulatory requirements that need to be met. However, it is ...

  14. How to Write a Quantitative Analysis Report

    Step 3. Create graphs showing visual representations of the results. You can use bar graphs, line graphs or pie charts depending to convey the data. Only write about the pertinent findings, or the ones you think matter most, in the body of the report. Any other results can be attached in the appendices at the end of the report.

  15. Writing Quantitative Research Reports

    Writing Quantitative Research Reports. What makes for a great quantitative research report? In this class taught by Kathryn Korostoff, you learn how to write a great quantitative market research report—even if you are new to report writing—in a fun practical way. Get Access

  16. Techniques for Reporting Quantitative Data

    A rough sequence of steps for writing a quantitative research report describes in this section: 1. Specify a summary or abstract of the report to give a quick picture of the research article, thesis, review paper, conference proceeding, or in-depth analysis of a particular subject. 2. Define the research problem and discuss the methodology ...

  17. (PDF) Research Methodology WRITING A RESEARCH REPORT

    Quantitative Research Report A quantitative research report is a type of research report that is written for quantitative research. ... (2020). Research report: Definition, types + [writing guide ...

  18. What is Quantitative Writing?

    Quantitative writing (QW) requires students to grapple with numbers in a real world context, to describe observations using numbers, and to use the numbers in their own analyses and arguments. Good quantitative writing assignments ask students to do more than compute an answer. In addition they ask students to draw conclusions based on ...

  19. Qualitative vs. Quantitative Research

    When collecting and analyzing data, quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings. Both are important for gaining different kinds of knowledge. Quantitative research. Quantitative research is expressed in numbers and graphs. It is used to test or confirm theories and assumptions.

  20. PDF Writing about Quantitative Research

    1. Report writing—Study and teaching (Higher)—Handbooks, manuals, etc. 2. Research—Methodology—Study and teaching (Higher)—Handbooks, manuals, etc. 3. Applied linguistics—Methodology—Handbooks, manuals, etc. 4. Academic writing—Study and teaching. 5. Quantitative research—Study and teaching. I. Title. PE1478.W68 2014 418.007 ...

  21. FREE 10+ Quantitative Research Report Samples & Templates in PDF

    A quantitative research report refers to a document that conveys and interprets the data collected during the quantitative research. In this, the quantitative research data are displayed and presented in diagrams, graphs, tables, etc. to make the information more accessible and understandable by the management. How to Write a Quantitative ...

  22. Predicting and improving complex beer flavor through machine ...

    For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 ...