The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection

  • Open access
  • Published: 13 August 2021
  • Volume 86 , pages 1029–1053, ( 2022 )

Cite this article

You have full access to this open access article

  • Benjamin A. Parris   ORCID: orcid.org/0000-0003-2402-2100 1 ,
  • Nabil Hasshim 1 , 2 , 5 ,
  • Michael Wadsley 1 ,
  • Maria Augustinova 3 &
  • Ludovic Ferrand 4  

11k Accesses

32 Citations

12 Altmetric

Explore all metrics

Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in the Stroop literature to measure the candidate varieties of interference and facilitation are critically evaluated and the processing levels that contribute to Stroop effects are discussed. It is concluded that the literature does not provide clear evidence for a distinction between conflicting and facilitating representations at phonological, semantic and response levels (together referred to as informational conflict), because the methods do not currently permit their isolated measurement. In contrast, it is argued that the evidence for task conflict as being distinct from informational conflict is strong and, thus, that there are at least two loci of attentional selection in the Stroop task. Evidence suggests that task conflict occurs earlier, has a different developmental trajectory and is independently controlled which supports the notion of a separate mechanism of attentional selection. The modifying effects of response modes and evidence for Stroop effects at the level of response execution are also discussed. It is argued that multiple studies claiming to have distinguished response and semantic conflict have not done so unambiguously and that models of Stroop task performance need to be modified to more effectively account for the loci of Stroop effects.

Similar content being viewed by others

research paper on stroop effect

Different types of semantic interference, same lapses of attention: Evidence from Stroop tasks

Michele Scaltritti, Remo Job & Simone Sulpizio

Behavioral and electrophysiological investigation of semantic and response conflict in the Stroop task

Maria Augustinova, Laetitia Silvert, … Valentin Flaudias

research paper on stroop effect

Response competition better explains Stroop interference than does response exclusion

Ardi Roelofs

Avoid common mistakes on your manuscript.

Introduction

In his doctoral dissertation, John R. Stroop was interested in the extent to which difficulties that accompany learning, such as interference, can be reduced by practice (Stroop, 1935 ). For this purpose, he construed a particular type of stimulus. Stroop displayed words in a color that was different from the one that they actually designated (e.g., the word red in blue font). After he failed to observe any interference from the colors on the time it took to read the words (Exp.1), he asked his participants to identify their font color. Because the meaning of these words (e.g., red) interfered with the to-be-named target color (e.g., blue), Stroop observed that naming aloud the color of these words takes longer than naming aloud the color of small squares included in his control condition (Exp.2). In line with both his expectations and other learning experiments carried out at the time, this interference decreased substantially over the course of practice. However, daily practice did not eliminate it completely (Exp.3). During the next thirty years, this result and more generally this paradigm received only modest interest from the scientific community (see, e.g., Jensen & Rohwer, 1966, MacLeod, 1992 for discussions). Things changed dramatically when color-word stimuli, ingeniously construed by Stroop, became a prime paradigm to study attention, and in particular selective attention (Klein, 1964 ).

The ability to selectively attend to and process only certain features in the environment while ignoring others is crucial in many everyday activities (e.g., Jackson & Balota, 2013 ). Indeed, it is this very ability that allows us to drive without being distracted by beautiful surroundings or to quickly find a friend in a hallway full of people. It is clear then that an ability to reduce the impact of potentially interfering information by selectively attending to the parts of the world that are consistent with our goals, is essential to functioning in the world as a purposive individual. The Stroop task (Stroop, 1935 ), as this paradigm is now known, is a selective attention task in that it requires participants to focus on one dimension of the stimulus whilst ignoring another dimension of the very same stimulus. When the word dimension is not successfully ignored, it elicits interference: Naming aloud the color that a word is printed in takes longer when the word denotes a different color (incongruent trials, e.g., the word red displayed in color-incongruent blue font) compared to a baseline condition. This difference in color-naming times is often referred to as the Stroop interference effect or the Stroop effect (see the section ‘Definitional issues’ for further development and clarifications of these terms).

Evidencing its utility, the Stroop task has been widely used in clinical settings as an aid to assess disorders related to frontal lobe and executive attention impairments (e.g., in attention deficit hyperactivity disorder, Barkley, 1997 ; schizophrenia, Henik & Salo, 2004 ; dementia, Spieler et al., 1996 ; and anxiety, Mathews & MacLeod, 1985 ; see MacLeod, 1991 for an in-depth review of the Stroop task). The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935 ) is one of the most cited in the history of psychology and cognitive science (e.g., Gazzaniga et al., 2013 ; MacLeod, 1992 ). It is, however, important to understand that the Stroop task as it is currently employed in neuropsychological practice (e.g., Strauss et al., 2007 ), its implementations in most basic and applied research (see here below), and leading accounts of the effect it produces, are profoundly rooted in the idea that the Stroop effect is a unitary phenomenon in that it is caused by the failure of a single mechanism (i.e., it has a single locus). By addressing the critical issue of whether there is a single locus or multiple loci of Stroop effects, the present review not only addresses several pending issues of theoretical and empirical importance, but also critically evaluates these current practices.

The where vs. the when and the how of attentional control

The Stroop effect has been described as the gold standard measure of selective attention (MacLeod, 1992 ) in which a smaller Stroop interference effect is an indication of greater attentional selectivity. However, the notion that it is selective attention that is the cognitive mechanism enabling successful performance in the Stroop task has recently been sidelined (see Algom & Chajut, 2019 , for a discussion of this issue). For example, in a recent description of the Stroop task, Braem et al. ( 2019 ) noted that the size of the Stroop congruency effect is “indicative of the signal strength of the irrelevant dimension relative to the relevant dimension, as well as of the level of cognitive control applied” (p769). Cognitive control is a broader concept than selective attention in that it refers to the entirety of mechanisms used to control thought and behavior to ensure goal-oriented behavior (e.g., task switching, response inhibition, working memory). Its invocation in describing the Stroop task has proven to be somewhat controversial given that it implies the operation of top-down mechanisms, which might or might not be necessary to explain certain experimental findings (Algom & Chajut, 2019 ; Braem et al., 2019 ; Schmidt, 2018 ). It does, however, have the benefit of hypothesizing a form of attentional control that is not a static, invariant process but instead posits a more dynamic, adaptive form of attentional control, and provides foundational hypotheses about how and when attentional control might happen. However, the present work addresses that which the cognitive control approach tends to eschew (see Algom & Chajut, 2019 ): the question of where the conflict that causes the interference comes from. Importantly, the answer to the where question will have implication for the how and when questions.

The question of where the interference derives has historically been referred to as the locus of the Stroop effect (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019 ). Whilst, by virtue of our interest in where attentional selection occurs, we review evidence for the early or late selection of information in the color-word Stroop task, recent models of selective attention have shown that whether selection is early or late is a function of either the attentional resources available to process the irrelevant stimulus (Lavie, 1995) or the strength of the perceptual representation of the irrelevant dimension (Tsal & Benoni, 2010 ). Moreover, despite being referred to as the gold standard attentional measure and as one of the most robust findings in the field of psychology (MacLeod, 1992 ), it is clear that Stroop effects can be substantially reduced or eliminated by making what appear to be small changes to the task. For example, Besner, Stolz, and Boutillier ( 1997 ) showed that the Stroop effect can be reduced and even eliminated by coloring a single letter instead of all letters of the irrelevant word (although notably they used button press responses which produced smaller Stroop effects (Sharma & McKenna, 1998 ) making it easier to eliminate interference; see also Parris, Sharma, & Weekes, 2007 ). In addition, Melara and Mounts ( 1993 ) showed that by making the irrelevant words smaller to equate the discriminability of word and color, the Stroop effect can be eliminated and even reversed.

Later, Dishon-Berkovits and Algom ( 2000 ) noted that often in the Stroop task the dimensions are correlated in that one dimension can be used to predict the other (i.e., when an experimenter matches the number of congruent (e.g., the word red presented in the color red) and incongruent trials in the Stroop task, the irrelevant word is more often presented in its matching color than in any other color which sets up a response contingency). They demonstrated that when this dimensional correlation was removed the Stroop effect was substantially reduced. By showing that the Stroop effect is malleable through the modulation of dimensional uncertainty (degree of correlation of the dimensional values and how expected the co-occurrences are) or dimensional imbalance (of the salience of each dimension) their data, and resulting model (Melara & Algom, 2003 ; see also Algom & Fitousi, 2016 ), indicate that selective attention is failing because the experimental set-up of the Stroop task provides a context with little or no perceptual load / little or no perceptual competition, and where the dimensions (word and color) are often correlated and / or asymmetrical in discriminability that contributes to the robust nature of the Stroop effect. In other words, the Stroop task sets selective attention mechanisms up to fail, pitching as it does the intention to ignore irrelevant information against the tendency and resources to process conspicuous and correlated characteristics of the environment (Melara & Algom, 2003 ). But, in the same way that neuropsychological impairments teach us something about how the mind works (Shallice, 1988 ), it is these failures that give us an opportunity to explore the architecture of the mechanisms of selective attention in healthy and impaired populations. We, therefore, ask the question: if control does fail, where (at what levels of processing) is conflict experienced in the color-word Stroop task?

Given our focus on the varieties of conflict (and facilitation), the where of control, we will not concern ourselves with the how and the when of control. Manipulations and models of the Stroop task that are not designed to understand the types of conflict and facilitation that contribute to Stroop effects such as list-wise versus item-specific congruency proportion manipulations (e.g., Botvinick et al., 2001 ; Bugg, & Crump, 2012 ; Gonthier et al., 2016 ; Logan & Zbrodoff, 1979 ; Schmidt & Besner, 2008 ; Schmidt, Notebaert, & Van Den Bussche, 2015 ; see Schmidt, 2019 , for a review) or memory load manipulations (e.g., De Fockert, 2013 ; Kalanthroff et al., 2015 ; Kim et al., 2005 ; Kim, Min, Kim & Won, 2006 ), will be eschewed, unless these manipulations are specifically modified in a way that permits the understanding of the processing involved in producing Stroop interference and facilitation. To reiterate the aims of the present review, here we are less concerned with the evaluative function of control which judges when and how control operates (Chuderski & Smolen, 2016 ), but are instead concerned with the regulative function of control and specifically at which processing levels this might occur. In short, the present review attempts to identify whether at any level, other than the historically favoured level of response output, processing reliably leads to conflict (or facilitation) between activated representations. Before we address this question, however, we must first address the terminology used here and, in the literature, to describe different types of Stroop effects.

Definitional issues to consider before we begin

A word about baselines and descriptions of stroop effects.

Given the number of studies that have employed the Stroop task since its inception in 1935, it is no surprise that a variety of modifications of the original task have been employed, including the introduction of new trial types (as exemplified by Klein, 1964 ) and new ways of responding, to measure and understand mechanisms of selective attention. This has led to disagreement over what is being measured by each manipulation, obfuscating the path to theoretical enlightenment. Various trial types have been used to distinguish types of conflict and facilitation in the color-word Stroop task (see Fig.  1 ), although with less fervor for facilitation varieties, resulting in a lack of agreement about how one should go about indexing response conflict, semantic conflict, and other forms of conflict and facilitation. Indeed, as can be seen in Fig.  1 , one person’s semantic conflict can be another person’s facilitation; a problem that arises due to the selection of the baseline control condition. Differences in performance between a critical trial and a control trial might be attributed to a specific variable but this method relies on having a suitable baseline that differs only in the specific component under test (Jonides & Mack, 1984 ).

figure 1

This figure shows examples of the various trial types that have been used to decompose the Stroop effect into various types of conflict (interference) and facilitation. This has resulted in a lack of clarity about what components are being measured. Indeed, as can be seen, one person’s semantic conflict can be another person’s facilitation, a problem that arises due to the selection of the baseline control condition

Selecting an appropriate baseline, and indeed an appropriate critical trial, to measure the specific component under test is non-trivial. For example, congruent trials, first introduced by Dalrymple-Alford and Budayr ( 1966 , Exp. 2), have become a popular baseline condition against which to compare performance on incongruent trials. Congruent trials are commonly responded to much faster than incongruent trials and the difference in reaction time between the two conditions has been variously referred to as the Stroop congruency effect (e.g., Egner et al., 2010 ), the Stroop interference effect (e.g., Leung et al., 2000 ), and the Total Stroop Effect (Brown et al., 1998 ), and Color-Word Impact (Kahneman & Chajczyk, 1983 ). However, when compared to non-color-word neutral trials, congruent trials are often reported to be responded to faster, evidencing a facilitation effect of the irrelevant word on the task of color naming (Dalrymple-Alford, 1972 ; Dalrymple-Alford & Budayr, 1966 ). Referring to the difference between incongruent and congruent trials as Stroop interference then—as is often the case in the Stroop literature—fails to recognize the role of facilitation observed on congruent trials and epitomizes a wider problem. As already emphasized by MacLeod ( 1991 ), this difference corresponds to “(…) the sum of facilitation and interference, each in unknown amounts” (MacLeod, 1991 , p.168). Moreover, as will be discussed in detail later, congruent trial reaction times have been shown to be influenced by a newly discovered form of conflict, known as task conflict (Goldfarb & Henik, 2007 ) and are not, therefore, straightforwardly a measure of facilitation either.

Furthermore, whilst the common implementation of the Stroop task involves incongruent, congruent, and non-color-word neutral trials (or perhaps where the non-color-word neutral baseline is replaced by repeated letter strings e.g., xxxx), this common format ignores the possibility that the difference between incongruent and neutral trials involves multiple processes (e.g., semantic and response level conflict). As Klein ( 1964 ) showed the irrelevant word in the Stroop task can refer to concepts semantically associated with a color (e.g., sky; Klein, 1964 ), potentially permitting a way to answer to the question of whether selection occurs early at the level of semantics, before response selection, in the processing stream. But it is unclear whether such trials are direct measures of semantic conflict or indirect measures of response conflict.

Here, we employ the following terms: We refer to the difference between incongruent and congruent conditions as the Stroop congruency effect , because it contrasts performance in conditions with opposite congruency values. For the reasons noted above, the term Stroop interference or just interference is preferentially reserved for referring to slower performance on one trial type compared to another. The word conflict will denote competing representations at any particular level that could be the cause of interference (note that interference might not result from conflict (De Houwer, 2003 ) as, for example, in the emotional Stroop task, interference could result without conflict from competing representations (Algom et al., 2004 )). When the distinction is not critical, the terms interference and conflict will be used interchangeably. The term Stroop facilitation or just facilitation will refer to the speeding up of performance on one trial type compared to another (unless specified otherwise). In common with the literature, facilitation will also be used to refer to the opposite of conflict; that is, it will denote facilitating representations at any level. Finally, the term Stroop effect(s) will be employed to refer more generally to all of these effects.

Levels of conflict vs. levels of selection

When considering the standard incongruent Stroop trial (e.g., red in blue) where the word dimension is a color word (e.g., red) that is incongruent with the target color dimension that is being named, and where the color red is also a potential response, one might surmise numerous levels of representation where these two concepts might compete. Processing of the color dimension of a Stroop stimulus to name the color would, on a simple analysis, require initial visual processing, followed by activation of the relevant semantic representation and then word-form (phonetic) encoding of the color name in preparation for a response. For this process to advance unimpeded until response there would need to be no competing representations activated at any of those stages. Like color naming, the processes of word reading also requires visual processing but of letters and not of colors perhaps avoiding creating conflict at this level, although there is evidence for a competition for resources at the level of visual processing under some conditions (Kahneman & Chajczyk, 1983 ). Word reading also requires the computation of phonology from orthography which color processing does not. One way interference might occur at this level is if semantic processing or word-form encoding during the processing of the color dimension also leads to the unnecessary (for the purposes of providing a correct response) activation of the orthographic representation of the color name—as far as we are aware there is no evidence for this. However, orthography does appear to lead to conflict through a different route—the presence of a word or word-like stimulus appears to activate the full mental machinery used to process words. This unintentionally activated word reading task set, conflicts with the intentionally activated color identification task set, creating task conflict. Task conflict occurs whenever an orthographically plausible letter string is presented (e.g., the word table leads to interference, as does the non-word but pronounceable letter string fanit ; the letter string xxxxx less so; Levin & Tzelgov, 2016 ; Monsell et al., 2001 ).

Despite being a task in which participants do not intend to engage, irrelevant word processing would also likely involve the activation of a phonological representation of the word and the activation of a semantic representation (and likely some word-form encoding), either of which could lead to the activation of representations competing for selection. However, just because the word is processed at certain level (e.g., orthography or phonology here) does not mean that each of these levels independently lead to conflict. Phonological information would only independently contribute to conflict if the process of color naming activated a competing representation at the same level. Otherwise, the phonological representation of the irrelevant word might simply facilitate activation of the semantic representation of the irrelevant word thereby providing competition for the semantic representation of the relevant color. In which case, whilst phonological information would contribute to Stroop effects, no selection mechanism would be required at the phonological level. And of course, there could be conflict at the phonological processing level, but with no selection mechanism available, conflict would have to be resolved later. To identify whether selection occurs at the level of phonological processing, a method would be needed to isolate phonological information from information at the semantic and response levels.

So-called late selection accounts would argue that any activated representations at these levels would result in increased activation at the response level where selection would occur with no competition or selection at earlier stages (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019a , 2019b , 2019c ; for discussions of this topic). In contrast, so-called early selection accounts (De Houwer, 2003 ; Scheibe et al., 1967 ; Seymour, 1977 ; Stirling, 1979 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ) argue for earlier and multiple sites of attentional selection with Hock and Egeth ( 1970 ) even arguing that the perceptual encoding of the color dimension is slowed by the irrelevant word, although this has been shown to be a problematic interpretation of their results (Dyer, 1973 ). In Zhang and colleagues models, attentional selection occurred and was resolved at the stimulus identification stage, before any information was passed on to the response level which had its own selection mechanism.

The organization of the review

It is important to emphasize at this point then that when considering the locus or loci of the Stroop effect, there are in fact two issues to address. The first concerns the level(s) of processing that significantly contribute to Stroop interference (and facilitation) so that a specific type of conflict actually arises at this level. The second issue concerns the level(s) of attentional selection: Is there, like Zhang and Kornblum ( 1998 ) and Zhang et al. ( 1999 ) have suggested, more than one level at which attentional selection occurs?

With regards to the first issue, we start below by critically evaluating the evidence for different levels of processing that putatively contribute to conflict with the objective of assessing the methods used to index the forms of conflict, and what we can learn from them. To do this, we employed the distinction introduced by MacLeod and MacDonald ( 2000 ) who argued for two categories of conflict: informational and the aforementioned task conflict (see also Levin & Tzelgov, 2016 ) to further structure the review. Informational conflict arises from the semantic and response information that the irrelevant word conveys. This roughly corresponds to the distinction between stimulus-based and response-based conflicts (Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). According to this approach, conflict arises due to overlap between the dimensions of the Stroop stimulus at the level of stimulus processing (Stimulus–Stimulus or S–S overlap) and at the level of response production (Stimulus–Response or S–R overlap). At the level of stimulus processing interference can occur at the perceptual encoding, memory retrieval, conceptual encoding and stimulus comparison stages. At the level of response production interference can also occur at response selection, motor programming and response execution. In the Stroop task, the relevant and irrelevant dimensions both involve colors and would, thus, produce Stimulus–Stimulus conflict and both stimuli overlap with the response (S–R overlap) because the response involves color classification. We also include phonological processing and word frequency in the informational conflict taxon (cf. Levin & Tzelgov, 2016 ). We discuss informational conflict and its varieties in the first section which is entitled ‘Decomposing Informational conflict’.

Task conflict, as noted above, arises when two task sets compete for resources. In the Stroop task, the task set for color identification is endogenously and purposively activated, and the task set for word reading is exogenously activated on presentation of the word. The simultaneous activation of two task sets creates conflict even before the identities of the Stroop dimensions have been processed. Therefore, this form of conflict is generated by all irrelevant words in the Stroop task including congruent and neutral words (Monsell et al., 2001 ). We discuss task conflict in the section ‘ Task conflict ’. We then discuss the often overlooked phenomenon of Stroop facilitation in the section entitled ‘ Informational facilitation ’. In the section entitled “Other evidence relevant to the issue of locus vs. loci of the Stroop effect” we consider the influence of response mode (vocal, manual, oculomotor) on the variety of conflicts and facilitation observed in the subsection ‘Response modes and the loci of the Stroop effect’ and we consider whether conflict and facilitation effects are resolved even once a response has been favored in the subsection ‘Beyond response selection: Stroop effects on response execution’. In the final section entitled “Locus or loci of selection?”, we use the outcome of these deliberations to discuss the second issue of whether the evidence supports attentional selection at a single or at multiple loci.

Decomposing informational conflict

A seminal paper by George S. Klein in 1964 (Klein, 1964 ) represents a critical impetus for understanding different types of informational conflict. Indeed, up until Klein, all studies had utilized incongruent color-word stimuli as the irrelevant dimension. Klein was the first to manipulate the relatedness of the irrelevant word to the relevant color responses to determine the “evocative strength of the printed word” ( 1964 , p. 577). To this end, he compared color-naming times of lists of nonsense syllables, low-frequency non-color-related words, high-frequency non-color words, words with color-related meanings (semantic associates: e.g., lemon, frog, sky), color words that were not in the set of possible response colors (non-response set stimuli), and color words that were in the set of possible response colors (response set stimuli). The response times increased linearly in the order they are presented above. Whilst lists of nonsense syllables vs. low-frequency words, high-frequency words vs. semantic-associative stimuli, and semantic-associative stimuli vs. non-response set stimuli did not differ, all other comparisons were significant.

It is important to underscore that for Klein himself, there was no competition between semantic nodes or at any stage of processing, and, thus, no need for attentional selection other than at the response stage. Only when both irrelevant word and relevant color are processed to the point of providing evidence towards different motor responses, do the two sources of information compete. Said differently, whilst he questioned the effect of semantic relatedness, Klein assumed that semantic relatedness would only affect the strength of activation of alternative motor responses. Highlighting his favoring of a single late locus for attentional selection, Klein noted that words that are semantically distant from the color name would be less likely to “arouse the associated motor-response in competitive intensity” (p. 577). Although others (e.g., early selection accounts mentioned above) have argued for competition and selection occurring earlier than response output, a historically favored view of the Stroop interference effect as resulting solely from response conflict has prevailed (MacLeod, 1991 ) such that so-called informational conflict (MacLeod & MacDonald, 2000 ) is viewed as being essentially solely response conflict. That is, the color and word dimensions are processed sufficiently to produce evidence towards different responses and before the word dimension is incorrectly selected, mechanisms of selective attention at response output have to either inhibit the incorrect response or bias the correct response.

Response and semantic level processing

To assess the extent to which we can (or cannot) move forward from this latter view, we describe and critically evaluate methods used to dissociate and measure the potentially independent contributions of response and semantic conflict. We start by considering so-called same-response trials before going on to consider semantic-associative trials, non-response set trials and a method that has used semantic distance on the electromagnetic spectrum as a way to determine the involvement of semantic conflict in the color-word Stroop task. Indeed, this is an important first step for determining whether at this point informational conflict can (or cannot) be reliably decomposed.

Same-response trials

Same-response trials utilize a two-to-one color-response mapping and have become the most popular way of distinguishing semantic and response conflict in recent studies (e.g., Chen et al., 2011 ; Chen, Lei, Ding, Li, & Chen, 2013a ; Chen, Tang & Chen, 2013b ; Jiang et al., 2015 ; van Veen & Carter, 2005 ). First introduced by De Houwer ( 2003 ), this method maps two color responses to the same response button (see Fig.  1 ), which allows for a distinction between stimulus–stimulus (lexico-semantic) and stimulus–response (response) conflict.

By mapping two response options onto the same response key (e.g., both ‘blue’ and ‘yellow’ are assigned to the ‘z’ key), certain stimuli combinations (e.g., when blue is printed in yellow) are purported to not involve competition at the level of response selection; thus, any interference during same-response trials is thought to involve only semantic conflict. Any additional interference on different-response incongruent trials (e.g., when red is printed in yellow and where both ‘red’ and ‘yellow’ are assigned to different response keys) is taken as an index of response conflict. Performance on congruent trials (sometimes referred to as identity trials when used in the context of the two-to-one color-response mapping paradigm, here after 2:1 paradigm) is compared to performance on same-response incongruent trials to reveal interference that can be attributed to only semantic conflict, whereas a different-response incongruent vs same-response incongruent trial comparison is taken as an index of response conflict. Thus, the main advantage of using same-response incongruent trials as an index of semantic conflict is that this approach claims to be able to remove all of the influence of response competition (De Houwer, 2003 ). Notably, according to some models of Stroop task performance same-response incongruent trials should not produce interference because they do not involve response conflict (Cohen, Dunbar & McCelland, 1990 ; Roelofs, 2003 ).

Despite providing a seemingly convenient measure of semantic and response conflict, the studies that have employed the 2:1 paradigm share one major issue—that of an inappropriate baseline (see MacLeod, 1992 ). Same-response incongruent trials have consistently been compared to congruent trials to index semantic conflict. However, congruent trials also involve facilitation (both response and semantic facilitation—see below for more discussion of this) and thus, the difference between these two trial types could simply be facilitation and not semantic interference, a possibility De Houwer ( 2003 ) alluded to in his original paper (see also Schmidt et al., 2018 ). And whilst same-response trials plausibly involve semantic conflict, they are also likely to involve response facilitation because despite being semantically incongruent, the two dimensions of this type of Stroop stimulus provide evidence towards the same response. This means that both same-response and congruent trials involve response facilitation. Therefore the difference between same-response and congruent trials would actually be semantic conflict (experienced on same-response trials) + semantic facilitation (experienced on congruent trials), not just semantic conflict. This also has ramifications for the difference between different-response and same-response trials since the involvement of response facilitation on same-response trials means that the comparison of these two trials types would actually be response conflict plus response facilitation, not just response conflict.

Hasshim and Parris ( 2014 ) explored this possibility by comparing same-response incongruent trials to non-color-word neutral trials. They reasoned that this comparison could reveal faster RTs to same-response incongruent trials thereby providing evidence for response facilitation on same-response trials. In contrast, it could also reveal faster RTs to non-color-word neutral trials, thus, would have provided evidence for semantic interference (and would indicate that whatever response facilitation is present is hidden by an opposing and greater amount of semantic conflict). Hasshim and Parris reported no statistical difference between the RTs of the two trial types and reported Bayes Factors indicating evidence in favor of the null hypothesis of no difference. This would suggest that, when using reaction time as the index of performance, same-response incongruent trials cannot be employed as a measure of semantic conflict since they are not different from non-color-word neutral trials. In a later study, the same researchers investigated whether the two-to-one color-response mapping paradigm could still be used to reveal semantic conflict when using a more sensitive measure of performance than RT (Hasshim & Parris, 2015 ). They attempted to provide evidence for semantic conflict using an oculomotor Stroop task and an early, pre-response pupillometric measure of effort, which had previously been shown to provide a reliable alternative measure of the potential differences between conditions (Hodgson et al., 2009 ). However, in line with their previous findings, they reported Bayes Factors indicating evidence for no statistical difference between the same-response incongruent trials and non-color-word neutral trials. These findings, therefore, suggest that the difference between same-response incongruent trials and congruent trials indexes facilitation on congruent trials, and that the former trials are not therefore a reliable measure of semantic conflict when reaction times or pupillometry are used as the dependent variable. Notably, Hershman and Henik ( 2020 ) included neutral trials in their study of the 2:1 paradigm, but did not report statistics comparing same-response and neutral trials (although they did report differences between same-response and congruent trials where the latter had similar RTs to their neutral trials) It is clear from their Fig. 1, however, that pupil sizes for neutral and same-response trials do begin to diverge at around the time the button press response was made. This divergence gets much larger ~ 500 ms post-response indicating that a difference between the two trial types is detectable using pupillometry. Importantly, however, Hershman and Henik employed repeated letter string as their neutral condition, which does not involve task conflict (see the section on task conflict below for more details). This means that any differences between their neutral trial and the same-response trial could be entirely due to task and not semantic conflict.

However, despite Hasshim and Parris consistently reporting no difference between same-response and non-color-word neutral trials, in an unpublished study, Lakhzoum ( 2017 ) has reported a significant difference between non-color-word neutral trials and same-response trials. Lakhzoum’s study contained no special modifications to induce a difference between these two trial types, and had roughly similar trial and participant numbers and a similar experimental set-up to Hasshim and Parris. Yet Lakhzoum observed the effect that Hasshim and Parris have consistently failed to observe. The one clear difference between Lakhzoum ( 2017 ), Hasshim and Parris ( 2014 , 2015 ), however, was that Lakhzoum used French participants and presented the stimuli in French where Hasshim and Parris conducted their studies in English. A question for further research then is whether and to what extent language, including issues such as orthographic depth of the written script of that language, might modify the utility of same-response trials as an index of semantic conflict.

Indeed, even though the 2:1 paradigm is prone to limitations, more research is needed to assess its utility for distinguishing response and semantic conflict. Notably, in both their studies Hasshim and Parris used colored patches as the response targets (at least initially, Hasshim & Parris, 2015 , replaced the colored patches with white patches after practice trials) which could have reduced the magnitude of the Stroop effect (Sugg & McDonald, 1994 ). Same-response trials cannot, for obvious reasons, be used with the commonly used vocal response as a means to increase Stroop effects (see Response Modes and varieties of conflict section below), but future studies could use written word labels, a manipulation that has also been shown to increase Stroop effects (Sugg & McDonald, 1994 ), and thus might reveal a difference between same-response incongruent and non-color-word neutral conditions. At the very least future studies employing same-response incongruent trials should also employ a neutral non-color-word baseline (as opposed to color patches used by Shichel & Tzelgov, 2018 ) to properly index semantic conflict and should avoid the confounding issues associated with congruent trials (see also the section on Informational Facilitation below).

As noted above, same-response incongruent trials are also likely to involve response facilitation since both dimensions (word and color) provide evidence toward the same response. Since congruent trials and same-response incongruent trials both involve response facilitation, the difference between the two conditions likely represents semantic facilitation, not semantic conflict. As a consequence, indexing response conflict via the difference between different-response and same-response trials is also problematic. Until further work is done to clarify these issues, work applying the 2:1 color-response paradigm to understand the neural substrates of semantic and response conflicts (e.g., Van Veen & Carter, 2005 ) or wider issues such as anxiety (Berggren & Derakshan, 2014 ) remain difficult to interpret.

Non-response set trials

Non-response set trials are trials on which the irrelevant color word used is not part of the response set (e.g., the word ‘orange’ in blue, where orange is not a possible response option and blue is; originally introduced by Klein, 1964 ). Since the non-response set color word will activate color-processing systems, interference on such trials has been interpreted as evidence for conflict occurring at the semantic level. These trials should in theory remove the influence of response conflict because the irrelevant color word is not a possible response option and thus, conflict at the response level is not present. The difference in performance between the non-response set trials and a non-color-word neutral baseline condition (e.g., the word ‘table’ in red) is taken as evidence of interference caused by the semantic processing of the irrelevant color word (i.e., semantic conflict). In contrast, response conflict can be isolated by comparing the difference between the performance on incongruent trials and the non-response set trials. This index of response conflict has been referred to as the response set effect (Hasshim & Parris, 2018 ; Lamers et al., 2010 ) or the response set membership effect (Sharma & McKenna, 1998 ) and describes the interference that is a result of the irrelevant word denoting a color that is also a possible response option. The aim of non-response set trials is to provide a condition where the irrelevant word is semantically incongruent with the relevant color such that the resultant semantic conflict is the only form of conflict present.

It has been argued that the interference measured using non-response set trials, the non-response set effect, is an indirect measure of response conflict (Cohen et al., 1990 ; Roelofs, 2003 ) and is, thus, not a measure of semantic conflict. That is, the non-response set effect results from the semantic link between the non-response set words and the response set colors and indirect activation of the other response set colors leads to response competition with the target color. As far as we are aware there is no study that has provided or attempted to provide evidence that is inconsistent with this argument. Thus, for non-response set trials to have utility in distinguishing response and semantic conflict, future research will need to evidence the independence of these types of conflict in RTs and other dependent measures.

Semantic-associative trials

Another method that has been used to tease apart semantic and response conflict employs words that are semantically associated with colors (e.g., sky-blue, frog-green). In trials of this kind (e.g., sky printed in green), first introduced by Klein ( 1964 ), the irrelevant words are semantically related to each of the response colors. Recall that for Klein this was a way of investigating different magnitudes of response conflict (the indirect response conflict interpretation). Indeed, the notion of comparing RTs on color-associated incongruent trials to those on color-neutral trials to specifically isolate semantic conflict (i.e., so-called “sky-put” design) was first suggested by Neely and Kahan ( 2001 ). It was later actually empirically implemented by Manwell, Roberts and Besner ( 2004 ) and used since in multiple studies investigating Stroop interference (e.g., Augustinova & Ferrand, 2014 ; Risko et al., 2006 ; Sharma & McKenna, 1998 ; White et al., 2016 ).

Interference observed when using semantic associates tends to be smaller than when using non-response set trials (Klein, 1964 ; Sharma & McKenna, 1998 ). This suggests that semantic associates may not capture semantic interference in its entirety (or alternatively that non-response set trials involve some response conflict). Sharma and McKenna ( 1998 ) postulated that this is because non-response set trials involve an additional level of semantic processing which, following Neumann ( 1980 ) and La Heij, Van der Heijdan, and Schreuder ( 1985 ), they called semantic relevance (due to the fact that color words are also relevant in a task in which participants identify colors). It is, however, also the case that smaller interference observed with semantic associates compared to non-response set trials can be conceptualized simply as less semantic association with the response colors for non-color words (sky-blue) than for color words (red–blue).

As with non-response set trials, it is unclear whether semantic associates exclude the influence of response competition because they too can be modeled as indirect measures of response conflict (e.g., Roelofs, 2003 ). Since semantic-associative interference could be the result of the activation of the set of response colors to which they are associated (for instance when sky in red activates competing response set option blue), it does not allow for a clear distinction between semantic and response processes. In support of this possibility, Risko et al. ( 2006 ) reported that approximately half of the semantic-associative Stroop effect is due to response set membership and therefore response level conflict. The raw effect size of pure semantic-associative interference (after interference due to response set membership was removed) in their study was only between 6 ms (manual response, 112 participants) and 10 ms (vocal response, 30 participants).

When the same group investigated this issue with a different approach (i.e., ex-Gaussian analysis), their conclusions were quite different. White and colleagues ( 2016 ) found the semantic Stroop interference effect (difference between semantic-associative and color-neutral trials) in the mean of the normal distribution (mu) and in the standard deviation of the normal distribution (sigma), but not the tail of the RT distribution (tau). This finding was different from past studies that found standard Stroop interference in all three parameters (see, e.g., Heathcote et al., 1991 ). Therefore, White and colleagues reasoned that the source of the semantic (as opposed standard) Stroop effect is different such that the interference associated with response competition on standard color-incongruent trials (that is to be seen in tau) is absent in incongruent semantic associates. However, White et al. only investigated semantic conflict. A more recent study that considered both response and semantic conflict in the same experiment found they influence similar portions of the RT distribution (Hasshim, Downes, Bate, & Parris, 2019 ), suggesting that ex-Gaussian analysis cannot be used to distinguish the two types of conflict.

Interestingly, Schmidt and Cheesman ( 2005 ) explored whether semantic-associative trials involve response conflict by employing the 2:1 paradigm depicted above. With the standard Stroop stimuli, they reported the common differences between same- and different-response incongruent trials (that are thought to indicate response conflict) and between congruent and same-response incongruent (that are thought to indicate semantic conflict in the 2:1 paradigm). However, with semantic-associative stimuli they only observed an effect of semantic conflict a finding that differs from that of Risko et al. ( 2006 ) whose results indicate an effect of response conflict with semantic-associative stimuli. But, as already noted, the issues associated with employing just congruent trials as a baseline in the 2:1 paradigm and the potential response facilitation on same-response trials lessens the interpretability of this result.

Complicating matters further still, Lorentz et al. ( 2016 ) showed that the semantic-associative Stroop effect is not present in reaction time data when response contingency (a measure of how often an irrelevant word is paired with any particular color) is controlled by employing two separate contingency-matched non-color-word neutral conditions (but see Selimbegovic, Juneau, Ferrand, Spatola & Augustinova, 2019 ). There was, however, evidence for Stroop facilitation with these stimuli and for interference effects in the error data. Nevertheless, studies utilizing semantic-associative stimuli that have not controlled for response contingency might not have accurately indexed semantic-associative interference. Future research should focus on assessing the magnitude of the semantic-associative Stroop interference effect after the influences of response set membership and response contingency have been controlled.

Levin and Tzelgov ( 2016 ) also reported that they failed to observe the semantic-associative Stroop effect across multiple experiments using a vocal response (in both Hebrew and Russian). Only when the semantic associations were primed via a training protocol were semantic-associative Stroop effects observed, although they were not able to consistently report evidence for the null hypothesis of no difference. They subsequently argued that the semantic-associative Stroop effect is probably present but is a small and “unstable” contributor to Stroop interference. This is a somewhat surprising conclusion given the small but consistent effects reported by others with a vocal response (Klein, 1964 ; Risko et al., 2006 ; Scheibe et al., 1967 ; White et al., 2016 ; see Augustinova & Ferrand, 2014 , for a review). However, it seems reasonable to conclude that the semantic-associative Stroop effect is not easily observed, especially with a manual response (e.g., Sharma & McKenna, 1998 ).

Finally, any observed semantic-associative interference could be interpreted as being an indirect measure of response competition (even after factors such as response set membership and response contingency are controlled). Indeed, the colors associated with the semantic-associative stimuli are also linked to the response set colors (Cohen et al., 1990 ; Roelofs, 2003 ) and thus, semantic associates do not generate an unambiguous measure of semantic conflict, at least when only RTs are used. Thus, it seems essential for future research to investigate this issue with additional, and perhaps more refined indicators of response processing such as EMGs.

Semantics as distance on the electromagnetic spectrum

Klopfer ( 1996 ) demonstrated that RTs were slower when both dimensions of the Stroop stimulus were closely related on the electromagnetic spectrum. The electromagnetic spectrum is the range of frequencies of electromagnetic radiation and their wavelengths including those for visible light. The visible light portion of the spectrum goes from red with the shortest and violet with the longest wavelengths with Orange, Yellow, Green and Blue (amongst others) in between. The Stroop effect has been reported to be larger when the color and word dimensions of the Stroop stimulus are close on the spectrum (e.g., blue in green) compared to when the colors were distantly related (e.g., blue in red; see also Laeng et al., 2005 , for an effect of color opponency on Stroop interference). In other words, Stroop interference is greater when the semantic distance between the color denoted by the word and the target color in “color space” is smaller, making it seemingly difficult to argue that semantic conflict does not contribute to Stroop interference. However, Kinoshita, Mills, and Norris ( 2018 ) recently failed to replicate this electromagnetic spectrum effect indicating that more research is needed to assess whether this is a robust effect. Even if replicated, however, this manipulation cannot escape the interpretation of semantic conflict as being the indirect indexing of response conflict. Therefore, these replications also call for additional indicators of response processing or the lack of thereof.

Can we distinguish the contribution of response and semantic processing?

Perhaps due to the past competition between early and late selection, single-stage accounts of Stroop interference (Logan & Zbrodoff, 1998 ; MacLeod, 1991 ) response and semantic conflict have historically been the most studied and, therefore, compared types of conflict. For instance, there is a multitude of studies indicating that semantic conflict is often preserved when response conflict is reduced by experimental manipulations including hypnosis-like suggestion (Augustinova & Ferrand, 2012 ), priming (Augustinova & Ferrand, 2014 ), Response–Stimulus Interval (Augustinova et al., 2018a ), viewing position (Ferrand & Augustinova, 2014a ) and single letter coloring (Augustinova & Ferrand, 2007 ; Augustinova et al., 2010 , 2015 , 2018a , 2018b ). This dissociative pattern (i.e., significant semantic conflict while response conflict is reduced or even eliminated) is often viewed as indicating two qualitatively distinct types of conflict, suggesting that these manipulations result in response conflict being prevented. However, these studies have commonly employed semantic-associative conflict which could be indirectly measuring response conflict and it could, therefore, be argued that it is not the type of conflict but simply residual response conflict that remains (Cohen et al., 1990 ; Roelofs, 2003 ). Therefore, it still remains plausible that the dissociative pattern simply indicates quantitative differences in response conflict.

As we have discussed in this section, interference generated by both non-response trials and trials that manipulation proximity on the electromagnetic spectrum are prone to the same limitations. The 2:1 paradigm is a paradigm that could in principle remove response conflict from the conflict equation, but the issues surrounding this manipulation need to be further researched before we can be confident of its utility. Therefore, at this point, it seems reasonable to conclude that published research conducted so far with additional color-incongruent trial types (same-response, non-response, or semantic-associative trials) does not permit the unambiguous conclusion that the informational conflict generated by standard color-incongruent trials (word ‘red’ presented in blue) can be decomposed into semantic and response conflicts. More than ever then, cumulative evidence from more time- and process-sensitive measures are required.

Other types of informational conflict: considering the role of phonological processing and word frequency

Whilst participants are asked to ignore the irrelevant word in the color-word Stroop task, it is clear that their attempts to do so are not successful. If word processing proceeds in an obligatory fashion such that before accessing the semantic representation of the irrelevant word, the letters, orthography, and phonology are also processed, interference could happen at these levels of processing. But, as anticipated by Klein ( 1964 ), just because the word is processed at these levels does not mean that each leads to level-specific conflict. To determine whether or not these different levels of processing also independently contribute to Stroop interference, various trial types and manipulations have been employed that have attempted to dissociate pre-semantic levels of processing. The most notable methods are: (1) phonological overlap between the irrelevant word and color name; (2) the use of pseudowords; and (3) manipulation of word frequency. This section attempts to identify whether pre-semantic processing of the irrelevant word reliably leads to conflict (or facilitation) at levels other than response output.

Phonological overlap between word and color name

A study by Dalrymple-Alford ( 1972 ) presented evidence for solely phonological interference in the Stroop task. Dalrymple-Alford manipulated the phonemic overlap between the irrelevant word and color name. For example, if the color to be named was red, the to-be-ignored word would be rat (sharing initial phoneme) or pod (sharing the end phoneme) or a word that shares no phoneme at all (e.g., fit ). Dalrymple-Alford reported evidence for greater interference at the initial letter than at the end letter position (similar effects were observed for facilitation). Using a more carefully designed set of stimuli (originally created by Coltheart et al., 1999 , who focused on just facilitation), Marmurek et al. ( 2006 ) also showed greater interference and facilitation at the initial letter position than the end letter position; although, in their study effects at the end letter position did not reach significance. This paradigm represents a direct measure of phonological processing that, importantly, does not have a semantic component (other than the weak conflict that would result from the activation of two semantic representations with unrelated meanings). However, in line with the interpretation by Coltheart et al. ( 1999 ), Marmurek and colleagues argued it was evidence for phonological processing of the irrelevant word that either facilitates or interferes with the production of the color name at the response output stage (see also Parris et al., 2019a , 2019b , 2019c ; Regan, 1978; Singer et al., 1975 ). Thus, whilst the word is processed phonologically, the only phonological representation with which the resulting representation could compete is that created during the phonological encoding of the color name, which would only be produced at later response processing levels. In sum, it is not possible to conclude in favor of qualitatively different conflict (or facilitation) other than that at the response level using this approach.

Pseudowords

A pseudoword is a non-word that is pronounceable (e.g., veglid ). In fact, some real words are so rare (e.g., helot , eft ) that to most they are equivalent to pseudowords. As noted above, Klein ( 1964 ) used rare words in the Stroop task and showed that they interfered less than higher-frequency words but more than consonant strings (e.g., GTBND ). Both Burt’s ( 2002 ) and Monsell et al.’s ( 2001 ) studies later supported the finding that pseudowords result in more interference than consonant strings. In recent work, Kinoshita et al. ( 2017 ) asked what aspects of the reading process is triggered by the irrelevant word stimulus to produce interference in the color-word Stroop task. They compared performance on five types of color-neutral letter strings to incongruent words. They included real words (e.g., hat ), pronounceable non-words (or pseudowords; e.g., hix ), consonant strings (e.g., hdk ), non-alphabetic symbol strings (e.g., &@£ ), and a row of Xs. They reported that there was a word-likeness or pronounceability gradient with real words and pseudowords showing an equal amount of interference (with interference increasing with string length) and more than that produced by the consonant strings. Consonant strings produced more interference than the symbol strings and the row of Xs which did not differ from each other. The absence of the lexicality effect (defined by color-neutral real words producing more interference than pseudowords) was explained by Kinoshita and colleagues as being a consequence of the pre-lexically generated phonology from the pronounceable irrelevant words interfering with the speech production processes involved in naming the color. Under this account, the process of phonological encoding (the segment-to-frame association processes in articulation planning) of the color name must be slowed by the computation of phonology that occurs independent of lexical status (because it happens with pronounceable pseudowords). Notably, the authors reported evidence for pre-lexically generated phonology when participants responded vocally (by saying aloud the color name), but not when participants responded manually (by pressing a key that corresponds to the target color) suggesting the effects were the result of the need to articulate the color name.

Some pseudowords can sound like color words (e.g., bloo), and are known as pseudohomophones. Besner and Stolz ( 1998 ) employed pseudohomophones as the irrelevant dimension, and found substantial Stroop effects when compared to a neutral baseline (see also Lorentz et al., 2016 ; Monahan, 2001 ) suggesting that there is phonological conflict in the Stroop task. However, pseudohomophones do not involve only phonological conflict since they contain substantial orthographic overlap with their base words (e.g., bloo , yeloe , grene , wred ) and will likely activate the semantic representations of the colors indicated by the word via their shared phonology. In short, interference produced by pseudohomophones could result from phonological, orthographic, or semantic processing but also and importantly it can still simply result from response conflict (see also Tzelgov et al., 1996 , work on cross-script homophones which shows phonologically mediated semantic/response conflict, but not phonological conflict).

Taken together, this work shows a clear effect of phonological processing of the irrelevant word on Stroop task performance; and one that likely results from the pre-lexical phonological processing of the irrelevant word. Again, however, it is unclear whether the resulting competition arises at the pre-lexical level (suggesting the color name’s pre-lexical phonological representation is unnecessarily activated) or whether phonological processing of the irrelevant word leads to phonological encoding of that word that then interferes with the phonological encoding of the relevant color name. The latter seems more likely than the former.

High- vs. low-frequency words

In support of the notion that non-semantic lexical factors contribute to Stroop effects, studies have shown an effect of the word frequency of non-color-related words on Stroop interference. Word frequency refers to the likelihood of encountering that word in reading and conversation. It is a factor that has long been known to contribute to word reading latency, and given that color words tend to be high-frequency words, it is possible word frequency contributes to Stroop effects. Whilst the locus of word frequency effects in word reading are unclear, it is known that it takes longer to access lexico-semantic (phonological/semantic) representations of low-frequency words (Gherhand & Barry, 1998 , 1999 ; Monsell et al., 1989 ).

According to influential models of the Stroop task, the magnitude of Stroop interference is determined by the strength of the connection between the irrelevant word and the response output level (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Zhang et al., 1999 ). Since high-frequency words are by definition encountered more often, their strength of connection to the response output level would be higher than that for low-frequency words. This leads to the prediction that color-naming times should be longer when the distractor word is of a higher frequency. Evidence in support of this has been reported by Klein ( 1964 ), Fox et al. ( 1971 ) and Scheibe et al. ( 1967 ). However, Monsell et al. ( 2001 ) pointed out methodological issues in these older studies that could have confounded the results. First, these previous studies employed the card presentation version of the Stroop task in which the items from each stimulus condition (e.g., all the high-frequency words) are placed on different cards and the time taken to respond to all the items on one card is recorded. This method, it was argued, could result in the adoption of different response criteria for the different cards and permits previews of the next stimulus which could result in overlap of processing. Second, Monsell et al. noted that these studies employed a limited set of 4–5 stimuli in each condition which were repeated numerous times on each card, potentially leading to practice effects that would potentially nullify any effects of word frequency. After addressing these issues, Monsell et al. ( 2001 ) reported no effects of word frequency on color-naming times, although there was a non-significant tendency for low-frequency words to result in more interference than high-frequency words. With the same methodological control as Monsell et al., but with a greater difference in frequency between the high and low conditions, Burt ( 1994 , 1999 , 2002 ) has repeatedly reported that low-frequency words produce significantly more interference than high-frequency words (findings recently replicated by Navarrete et al., 2015 ). A recent study by Levin and Tzelgov ( 2016 ) also reported more interference to low-frequency words although their effects were not consistent across experiments, a finding that could be attributed to their use of a small set of words for each class of words.

The repeated finding of greater interference for low-frequency words is consistent with the notion that word frequency contributes to determining response times in the Stroop task, but is inconsistent with predictions from models of the class exemplified by Cohen et al. ( 1990 ). The finding of larger Stroop effects for lower-frequency words provides a potent challenge to the many models based on the Parallel Distributed Processing (PDP) connectionist framework (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Kornblum et al., 1990 ; Kornblum & Lee, 1995 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ; see Monsell et al., 2001 for a full explanation of this). As noted, these models would argue, on the basis of a fundamental tenet of their architectures, that higher-frequency words should produce greater interference because they have stronger connection strengths with their word forms. Notably, whilst unsupported by later studies, the lack of an effect of word frequency in Monsell et al.’s data led them to the conclusion that there was another type of conflict involved in the Stroop task, called task conflict. It is to the topic of task conflict that we now turn.

Task conflict

The presence of task conflict in the Stroop task was first proposed in MacLeod and MacDonald’s ( 2000 ) review of brain imaging studies (see also Monsell et al., 2001 ; see Littman et al., 2019 , for a mini review). The authors proposed its existence because the anterior cingulate cortex (ACC) appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli such as xxxx (e.g., Bench et al., 1993 ). MacLeod and MacDonald suggested that increased ACC activation by congruent and incongruent stimuli reflects the signaling the need for control recruitment in response to task conflict. Since task conflict is produced by the activation of the mental machinery used to read, interference at this level occurs with any stimulus that is found in the mental lexicon. Studies have used this logic to isolate task conflict from informational conflict (e.g., Entel & Tzelgov, 2018 ).

Congruent trials, proportion of repeated letter strings trials and negative facilitation

In contrast to color-incongruent trials that are thought to produce both task and informational conflicts, color-congruent trials are only thought to produce task conflict. Conflict of any type, by definition, increases response times and thus, congruent trial reaction times can be expected to be longer than those on trials that do not activate a task set for word reading. Repeated color patches, symbols or letters (e.g., ■■■, xxxx or ####) have, therefore, been introduced as a baseline for such a comparison. Indeed, these trials are not expected to generate task conflict as they do not activate an item in the mental lexicon. The difference between these non-linguistic baselines and congruent trials would therefore represent a measure of task conflict, and has been referred to as negative facilitation. However, a common finding in such experiments is that congruent trials still produce faster RTs than neutral non-word stimuli or positive facilitation (Entel et al., 2015 ; see also Augustinova et al., 2019 ; Levin & Tzelgov, 2016 , Shichel & Tzelgov, 2018 ), indicating that task conflict is not fully measured under such conditions. Goldfarb and Henik ( 2007 ) reasoned that this is likely due to the fact that faster responses on congruent trials compared to a non-linguistic baseline results when task conflict control is highly efficient, permitting the expression of positive facilitation.

To circumvent this issue, they attempted to reduce task conflict control by increasing the proportion of non-word neutral trials (repeated letter strings) to 75% (see also Kalanthroff et al., 2013 ). Increasing the proportion of non-word neutral trials would create the expectation for a low task conflict context and so task conflict monitoring would effectively be offline. In addition to increasing the proportion of non-word neutral trials, on half of the trials, the participants received cues that indicated whether the following stimulus would be a non-word or a color word, giving another indication as to whether the mechanisms that control task conflict should be activated. For non-cued trials, when presumably task conflict control was at its nadir, and therefore task conflict at its peak, RTs were slower for congruent trials than for non-word neutral trials, producing a negative facilitation effect. Goldfarb and Henik ( 2007 ) suggested that previous studies had not detected a negative facilitation effect because resolving task conflict for congruent stimuli does not take long, and thus, as mentioned above, the effects of positive facilitation had hidden those of negative facilitation. In sum, by reducing task control both globally (by increasing the proportion of neutral trials) and locally (by adding cues to half of the trials), Goldfarb and Henik were able to increase task conflict enough to demonstrate a negative facilitation effect; an effect that has been shown to be a robust and prime signature of task conflict (Goldfarb & Henik, 2006 , 2007 ; Kalantroff et al., 2013).

Steinhauser and Hübner ( 2009 ) manipulated task conflict control by combining the Stroop task with a task-switching paradigm. In this paradigm participants switch between color naming and reading the irrelevant word (see Kalanthroff et al., 2013 , for a discussion on task switching and task conflict). Thus, the two task sets are active in this task context. This means that during color-naming Stroop trials, the word dimension of the stimulus will be more strongly associated with word processing than it otherwise would. This would have the effect of increasing the conflict between the task set for color naming and the task set of word reading. Steinhauser and Hübner ( 2009 ) found that under these experimental conditions, participants performed worse on congruent (and incongruent) trials than they did on the non-word neutral trials, evidencing negative facilitation, the key marker of task conflict. These results showing increasing task conflict when there is less control over the task set for word reading on color-naming trials reaffirmed Goldfarb and Henik’s ( 2007 ) findings that showed that reducing task control on color-naming trials leads to task conflict.

Whilst both of the above methods are useful in showing that task conflict can influence the magnitude of Stroop interference and facilitation, both manipulations result in magnifying task conflict (and likely other forms of conflict) to levels greater than is present when such targeted manipulations are not used.

Repeated letter strings without a task conflict control manipulation

As has been noted, task conflict appears to be present whenever the irrelevant stimulus has an entry in the lexical system. Consequently, studies have used the contrast in mean color-naming latencies between color-neutral words and repeated letter strings to index task conflict (Augustinova et al., 2018a ; Levin & Tzelgov, 2016 ). However, Augustinova et al. argued that both of these stimuli might include task conflict in different quantities. This is because the processing activated by a string of repeated letters (e.g., xxx) stops at the orthographic pre-lexical level, whereas the one activated by color-neutral words (e.g., dog) proceeds through to access to meaning (see also Augustinova et al., 2019 ; Ferrand et al., 2020 ), and as such the latter might more strongly activate the task set for word reading. Augustinova et al. ( 2019 ) reported task conflict (color-neutral—repeated letter strings) with vocal responses but not manual responses. Likewise, in a manual response study, Hershman et al. ( 2020 ) reported that repeated letter strings did not differ in terms of Stroop interference relative to symbol strings, consonant strings and color-neutral words. All were responded to more slowly than congruent trials, however, evidencing facilitation on congruent trials. Levin and Tzelgov ( 2016 ) compared vocal response color-naming times of repeated letter strings and shapes and found that repeated letter strings had longer color-naming times indicating some level of extra conflict with repeated letter strings, which they referred to as orthographic conflict, but which could also be expected to activate a task set for word reading. The implication of this work is that whilst repeated letter strings can be used as a baseline against which to measure task conflict relative to color-neutral words, they are likely to be useful mainly with vocal responses (Augustinova et al., 2019 ), and moreover can be expected to lead to some level of task conflict (Levin & Tzelgov, 2016 ).

For a purer measure of task conflict, when eschewing manipulations needed to produce negative facilitation, future research would do better to compare response times for color-neutral stimuli with those for shapes whilst employing a vocal response (Levin & Tzelgov, 2016 ; see Parris et al., 2019a , 2019b , 2019c , who reported no difference between color-neutral stimuli and unnamable/novel shapes with a manual response in an fMRI experiment). This does not mean, however, that task conflict is not measureable with manual responses in designs that eschew manipulations that produce negative facilitation: Continuing with their exploration of Stroop effects in pupillometric data Hershman et al. ( 2020 ) reported that pupil size data revealed larger pupils to congruent than to repeated letter strings (and also symbol strings, consonant strings and non-color-related words); in other words, they reported negative facilitation.

Does task conflict precede informational conflict?

The studies discussed above also suggest that task conflict occurs earlier than informational conflict. Hershman and Henik ( 2019 ) recently provided evidence that supports this supposition. Using incongruent, congruent and a repeated letter string baseline, but without manipulating the task conflict context in a way that would produce negative facilitation, Hershman and Henik observed a large interference effect and small non-significant, positive facilitation. However, the authors also recorded pupil dilations during task performance and reported both interference and negative facilitation (pupils were smaller for the repeated letter string condition than for congruent stimuli). Importantly, the pupil data began to distinguish between the repeated letter string condition and the two word conditions (incongruent and congruent) up to 500 ms before there was divergence between the incongruent and congruent trials. In other words, task conflict appeared earlier than informational conflict in the pupil data.

If it is not firmly established that task conflict comes before informational conflict on a single trial, recent research has shown that it certainly seems to come first developmentally. By comparing performance in 1st, 3rd and 5th graders, Ferrand and colleagues ( 2020 ) showed that 1st graders experience smaller Stroop interference effects (even when controlling for processing speed differences) compared to 3rd and 5th graders. Importantly, whereas the Stroop interference effect in these older children is largely driven by the presence of response, semantic and task conflict, in the 1st graders (i.e., pre-readers) this interference effect was entirely due to task conflict. Indeed, these children produced slower color-naming latencies for all items using words as distractors compared to repeated letter strings, without being sensitive to color-(in)congruency and to the informational (phonological, semantic or. response) conflict that it generates. The finding of task conflict’s developmental precedence is consistent with the idea that visual expertise for letters (as evidence by aforementioned N170 tuning for print) is known to be present even in pre‐readers (Maurer et al., 2005 ).

A model of task conflict

Kalanthroff et al. ( 2018 ) presented a model of Stroop task performance that is based on processing principles of Cohen and colleagues’ models (Botvinick et al., 2001 ; Cohen et al, 1990 ). What is unique about their model is the role proactive (intentional, sustained) control plays in modifying task conflict (see Braver, 2012 ). When proactive control is strong, bottom-up activation of word reading is weak, and top-down control resolves any remaining task competition rapidly. Conversely, when proactive control is weak, bottom-up information can activate task representations more readily leading to greater task conflict. According to their model, the presence of task conflict inhibits all response representations, effectively raising the response threshold and slowing responses. This raising of the response threshold would not happen for repeated letter string trials (e.g., xxxx) because the task unit for word reading would not be activated. Since responses for congruent trials would be slowed, negative facilitation results. To control task conflict when it arises, Kalanthroff et al. ( 2018 ) argued that due to the low level of proactive control, reactive control is triggered to resolve task conflict via the weak top-down input from the controlling module in the Anterior Cingulate Cortex. Thus, in contrast to Botvinick et al.’s ( 2001 ) model, reactive control is triggered by weak proactive control, not the detection of informational conflict. When proactive control is high, there is no task conflict, and the reactive control mechanism is not triggered, and the response convergence at the response level leads to response facilitation which can be fully expressed. Since task conflict control is not reliant on the presence of intra-trial informational conflict, and it is not resolved at the response output level, it is resolved by an independent control mechanism. Thus, the Kalanthroff et al. model predicts the independent resolution of response and task conflict.

In sum, task conflict has been shown to be an important contributor to both Stroop interference and Stroop facilitation effects. Task conflict can result in the reduction of the Stroop facilitation effect, increased Stroop interference, and in its more extreme form, it can produce negative facilitation (RTs to congruent trials are longer than those to a non-word neutral baseline). A concomitant decrease in Stroop facilitation and increase in Stroop interference (or vice versa) is also another potential marker of task conflict (Parris, 2014 ), although since a reduced Stroop facilitation and an increased Stroop interference can be produced by other mechanisms (i.e., decreased word reading/increased attention to the color dimension and increased response conflict, respectively), at this point, negative facilitation is clearly the best marker of task conflict (in RT or pupil data; Hershman & Henik, 2019 ). Kalanthroff et al. ( 2018 ) have argued that task conflict is a result of low levels of proactive control. However, more work is perhaps needed to identify what triggers activation of the task set for word reading and how types of informational conflict might interact with task conflict. Levin and Tzelgov ( 2016 ) describe informational conflict as being an “episodic amplification of task interference” (p3), where task conflict is a marker of the automaticity of reading and informational conflict the effect of dimensional overlap between stimuli and responses. With recent evident suggesting readability is a key factor in producing task conflict (Hershman et al., 2020 ), task conflict is possibly closely related to the ease with which a string of letters is phonologically encoded, its pronounceability (Kinoshita et al., 2017 ), suggesting a link between task and phonological conflict. Indeed, Levin and Tzelgov ( 2016 ) associated the orthographic and lexical components of word reading with task conflict. However, it is unclear how phonological processing is categorized in their framework and importantly how facilitation effects are accounted for under such a taxonomy.

Informational facilitation

As already mentioned, Dalrymple-Alford and Budayr ( 1966 , Exp. 2) were the first to report a facilitation effect of the irrelevant word on color naming (see also Dalrymple-Alford, 1972 for coining the term). Since then, the Stroop facilitation effect has become an oft-present effect in Stroop task performance and is usually measured by the difference in color-naming performance on non-color-word trials and color-congruent trials. However, the use of congruent trials is, more than any other trial type, fraught with confounding issues. As amply developed in the previous section, when task conflict is high, congruent word trial RTs can actually be longer than non-color-word trial RTs eliminating the expression of positive facilitation in the RT data and even producing negative facilitation (Goldfarb & Henik, 2007 ). Indeed, perhaps the first record of task conflict in the Stroop literature, Heathcote et al. ( 1991 ) reported that whilst the arithmetic mean difference between color-congruent and color-neutral trial types reveals facilitation in the Gaussian portion of the RT distribution, it actually reveals interference in the tail of the RT distribution. In sum, congruent trial RTs are clearly influenced by processes that pull RTs in different directions. Moreover, it has been argued that Stroop facilitation effects are not true facilitation effects at all, in the sense that the faster RTs on congruent trials do not represent the benefit of converging information from the two dimensions of the Stroop stimulus (see below for a further discussion of this issue). Thus, before considering what levels of processing contribute to facilitation effects, we must first consider the nature of such effects.

Accounting for positive facilitation

Since clear empirical demonstrations of task conflict being triggered by color-congruent trials were reported (see above), it has become difficult to consider the Stroop facilitation effect as a flip side of the Stroop interference (Dalrymple-Alford & Budayr, 1966 ). Stroop facilitation is often observed to be smaller, and less consistent, than Stroop interference (MacLeod, 1991 ) and this asymmetricity is largely dependent on the baseline used (Brown, 2011 ). Yet, this asymmetrical effect has been accounted for by models of the Stroop task via informational facilitation (i.e., without considering the opposing effect of task conflict). For example, in Cohen et al.’s ( 1990 ) model smaller positive facilitation is accounted for via a non-linear activation function which imposes a ceiling effect on the activation of the correct response—in other words, double the input (convergence) does not translate into double the output (Cohen et al., 1990 ).

MacLeod and McDonald (2000) and Kane and Engle ( 2003 ) have argued that the facilitating effect of the color-congruent irrelevant word is not true facilitation from any level of processing and is instead the result of ‘inadvertent reading’. That is, on some color-congruent trials, participants use only the word dimension to generate a response, meaning that these responses would be 100 ms–200 ms faster than if they were color naming (because word reading is that much faster than color naming). The argument is that it happens on only the occasional congruent trial (because of the penalty (error or large RTs) that would result from carrying it over to incongruent trials). Doing this occasionally would equate to the roughly 25 ms Stroop facilitation effect observed in most studies and would explain why facilitation is generally smaller than interference. Since the color-naming goal is not predicted to be active on these occasional congruent trials, it implies that only the task set for word reading is active, and hence the absence (or a large reduction) of task conflict, which fits with the finding of more informational facilitation in low task conflict contexts. Inadvertent reading would also be expected to produce facilitation in the early portion of the reaction time distribution (as supported by Heathcote et al.’s findings).

Roelofs ( 2010 ) argued, however, that with cross-language stimuli presented to bilingual participants, words cannot be read aloud to produce facilitation between languages (i.e., the Dutch word Rood —meaning ‘red’—cannot be read aloud to produce the response ‘red’ by Dutch–English bilinguals). Roelofs ( 2010 ) asked Dutch–English bilingual participants to name color patches either in Dutch or English whilst trying to ignore contiguously presented Dutch or English words. Given that informational facilitation effects were observed both within and between languages, Roelofs argued that the Stroop facilitation effect cannot be based on inadvertent reading. However, whilst Rood (Red), Groen (Green), and Blau (Blue) are not necessarily phonologically similar to their English counterparts, they clearly share orthographic similarities, which could produce facilitation effects (including semantic facilitation). Still, Roelofs observed large magnitudes of facilitation effects rendering it less likely that facilitation was based solely on orthography, although this was primarily when the word preceded the onset of the color patch. There were indeed relatively small facilitation effects when the word and color were presented at the same time. Nevertheless, the inadvertent reading account also cannot easily explain facilitation on semantic-associative congruent trials (see below for evidence of this) since the word does not match the response.

Another influence that can account for the facilitating effect of congruent trials is response contingency. Response contingency refers to the association between an irrelevant word and a response. In a typical Stroop task set-up, the numbers of congruent and incongruent trials are matched (e.g., 48 congruent/48 incongruent). Since in each congruent trial, there is only one possible word to pair with each color, it means that each color word is more frequently paired with its corresponding color (when the word red is displayed, there is a higher probability of its color being red). This would mean that responses on congruent trials would be further facilitated through learned word–response associations, and those on incongruent trials further slowed, by something other than and additional to the consequence of word processing (Melara & Algom, 2003 ; Schmidt & Besner, 2008 ). Indeed, it is as yet unclear as to whether informational facilitation would remain if facilitative effects of response contingency were controlled. Therefore, future studies are needed to address this still open issue (see Lorentz et al., 2016 for this type of endeavor but with semantic associates).

Decomposing informational facilitation

Perhaps because it has been perceived as the lesser, and less stable effect, the Stroop facilitation effect has not been explored as much as the Stroop interference effect in terms of potential varieties of which it may be comprised (Brown, 2011 ). Coltheart et al. ( 1999 ) have shown that when the irrelevant word and the color share phonemes (e.g., rack in red, boss in blue), participants are faster to name the color than when they do not (e.g., hip in red, mock in blue). Given that none of the words used in their experiment contained color relations, their effect was likely entirely based on phonological facilitation (see also Dennis & Newstead, 1981 ; Marmurek et al., 2006 ; Parris et al., 2019a , 2019b , 2019c ; Regan, 1979). Notably, effects such as this could not be explained by either the inadvertent reading nor response convergence accounts of Stroop facilitation and could not have resulted from response contingency (whilst any word in red, green or blue would have a greater chance of beginning with an ‘r’, ‘g’ and ‘b’ than any other letter respectively, there were three times as many trials in which the words did not begin with those letters). It is possible, however, that phonological facilitation operates on a different mechanism to semantic and response facilitation effects.

To the best of our knowledge only four published studies have explored this variety of informational facilitation directly. Dalrymple-Alford ( 1972 ) reported a 42 ms semantic-associative facilitation effect (non-color-word neutral—semantic-associative congruent) and a 67 ms standard facilitation effect (non-color-word neutral—congruent) suggesting a response facilitation effect of 25 ms (see Glaser & Glaser, 1989 ; and Mahon et al., 2012 , for replications of this effect). Interestingly, however, when compared to a letter string baseline (e.g., xxxx), the congruent semantic associates actually produced interference—a finding implicating an influence of task conflict. More recently, Augustinova et al. ( 2019 ) reported semantic (11 ms) and response (39 ms) facilitation effects with vocal responses but only semantic facilitation (14 ms) with manual responses (response facilitation was a non-significant 7 ms). Interestingly, the comparison between the letter string baseline and congruent semantic associates produced 9 ms facilitation with the manual response, but 33 ms interference with the vocal response suggesting a complex relationship between response mode, semantic facilitation and task conflict. Indeed, exactly like color-congruent items discussed above, both congruent semantic-associative trials and their color-neutral counterpart with no facilitatory components still involve task conflict.

These (potentially) isolable forms of facilitation are interesting, require further study, and have the potential to shed light on impairments in selective attention and cognitive control. Of particular interest is how these forms of facilitation are modified by the presence of various levels of task conflict. Nevertheless, as with semantic conflict, it is possible that apparent semantic facilitation effects result from links between the irrelevant dimension and the response set colors (Roelofs, 2003 ) meaning that they are response- and not semantically based effects. Therefore, other approaches are needed to tackle the issue of semantic (vs. response) facilitation. It might be useful to recall at this point that both Roelofs’ ( 2010 ) cross-language findings and the differences in reaction times between congruent and same-response trials (e.g., De Houwer, 2003 ) possibly result from semantic facilitation and so would not be helpful in this regard.

Other evidence relevant to the issue of locus vs. loci of the Stroop effect

Response modes and the loci of the stroop effect.

Responding manually (via keypress) in the Stroop task consistently leads to smaller Stroop effects when compared to responding vocally (saying the name aloud, e.g., Augustinova et al., 2019 ; McClain, 1983 ; Redding & Gerjets, 1977 ; Repovš, 2004 ; Sharma & McKenna, 1998 ). It has been argued that this is because each response type has differential access to the lexicon where interference is proposed to occur (Glaser & Glaser, 1989 ; Kinoshita et al., 2017 ; Sharma & McKenna, 1998 ). Indeed, smaller Stroop effects with manual (as opposed to vocal) responses has been attributed to one of its components (i.e., semantic conflict) being significantly reduced (Brown & Besner, 2001 ; Sharma & McKenna, 1998 ). Therefore, the manipulation of response mode has been used to address the issue of the locus of the Stroop effect.

In response to reports of failing to observe Stroop effects with manual responses (e.g., McClain, 1983 ), Glaser and Glaser ( 1989 ) proposed in their model that manual responses with color patches on the response keys could not produce interference because perception of the color and the response to it were handled by the semantic system with little or no involvement of the lexical system where interference was proposed to occur. However, based on the earlier translation models (e.g., Virzi & Egeth, 1985 ), Sugg and McDonald ( 1994 ) showed that Stroop interference was obtained with manual responses when the response buttons were labeled with written color words instead of colored patches. Sugg and McDonald argued that written label responses must have direct access to the lexical system.

Using written label manual responses, Sharma and McKenna ( 1998 ) tested Glaser and Glaser’s model and showed that response mode matters when considering the types of conflict that participants experience in the Stroop task. They reported that in contrast to vocal responses, manual responses produced no lexico-semantic interference as measured by comparing semantic-associative and non-color-word neutral trials, and by comparing non-response set trials with semantic-associative trials, although they did report a response set effect (response set—non-response set) with both vocal (spoken) and manual responses. Sharma and McKenna interpreted their results as being partially consistent with Glaser and Glaser’s model, suggesting that the types of conflict experienced in the Stroop task are different between response modes. However, Brown and Besner ( 2001 ) later re-analyzed the data from Sharma and McKenna and showed that if you do not only analyze adjacent conditions (with condition order determined by a priori beliefs about the magnitude of Stroop effects) and compare instead non-adjacent conditions such as non-response set and non-color-word neutral trials (the non-response set effect), semantic conflict is observed with a manual response.

Roelofs ( 2003 ) has theorized that interference with manual responses only occurs because verbal labels are attached to the response keys; such a position predicts that manual and vocal responses should lead to similar conflict and facilitation effects, but smaller overall effects with manual responses due to the proposed mediated nature of manual Stroop effects. Consistently, many studies have since reported robust interference effects including semantic conflict effects with manual responses using colored patch labels (as measured by non-response set—non-color-word neutral, e.g., Hasshim & Parris, 2018 ; or as measured by semantic-associative Stroop trials, e.g., Augustinova et al., 2018a ). Parris et al., ( 2019a , 2019b ), Zahedi, Rahman, Stürmer, & Sommer (2019) and Kinoshita et al. ( 2017 ) have reported data indicating that the difference between manual and vocal responses occurs later in the phonological encoding or articulation planning stage where vocal responses encourage greater phonological encoding than does the manual response (see Van Voorhis & Dark, 1995 for a similar argument).

Augustinova et al. ( 2019 ) have reported that the difference between manual and vocal responses is largely due to a larger contribution of response conflict with vocal responses. Yet, in addition they also reported a much larger contribution of task conflict with vocal responses. Notably, the contribution of both semantic conflict and semantic facilitation remained roughly the same for the response modes, whereas response facilitation increased dramatically (from non-significant 7 ms to 39 ms) with vocal responses indicating that response and semantic forms of facilitation are independent. Therefore, the research to date suggests that there are larger response- and task-based effects with vocal responses. Since negative facilitation was not used as a measure of performance in this study, which has been reported with manual responses (e.g., Goldfarb & Henik, 2007 ), one needs to be careful what conclusions are drawn about task conflict; nevertheless, task conflict does seem to contribute less to Stroop effects with manual responses under common Stroop task conditions in which task conflict control is not manipulated. Importantly, this only applies to response times. As already noted, Hershman and Henik ( 2019 ) reported no task conflict with manual responses but also showed that in the same participants pupil sizes changes revealed task conflict in the form of negative facilitation on the very same trials.

It is important that more research investigating how the make-up of Stroop interference might change with response mode is conducted, especially since other response modes such as typing (Logan & Zbrodoff, 1998 ), oculomotor (Hasshim & Parris, 2015 ; Hodgson et al., 2009 ) and mouse (Bundt, Ruitenberg, Abrahamse, & Notebaert, 2018 ) responses have been utilized. This is especially important given that a lesion to the ACC has been reported to affect manual but not vocal response Stroop effects (Turken & Swick, 1999 ). Up until very recently very little consideration has been given to how response mode might affect Stroop facilitation effects (Augustinova et al., 2019 ) so more research is needed to better understand the influence of response mode on facilitation effects. Indeed, as noted above models have proposed either the same or different processes underlying manual and vocal Stroop effects providing predictions that need to be more fully tested. Aside from issues surrounding measurement of the varieties of conflict and facilitation that underlie Stroop effects with manual and vocal responses, mitigating the conclusions that can be drawn from the work summarized in this section, it is interesting that the way we act on the Stroop stimulus can potentially change how it is processed.

Beyond response selection: Stroop effects on response execution

So far, we have concentrated on Stroop effects that occur before response selection. However, it is also possible that Stroop effects could be observed after (or during) response selection. When addressing questions about the locus of the Stroop effect, some studies have questioned the commonly held assumption that there is modularity between response selection and response execution; that is, they have considered whether interference experienced at the level of response selection spills over into the actual motoric action of the effectors (e.g., the time it takes to articulate the color name) or whether interference is entirely resolved before then. Researchers have considered this possibility with vocal (measuring the time between the production of the first phoneme and the end of the last; Kello et al., 2000 ), type-written (measuring the time between the pressing of the first letter key and the pressing of the last letter key; Logan & Zbrodoff, 1998 ), oculomotor (measuring the amplitude (size) of the saccade (eye movement) to the target color patch; Hodgson, Parris, Jarvis & Gregory, 2009 ), and mouse movement (Bundt et al., 2018 ; Yamamoto, Incera & McLennan, 2016 ) responses.

In Hodgson et al.’s ( 2009 ) study, participants responded by making an eye movement to one of four color patches located in a plus-sign configuration around the centrally presented Stroop stimulus to indicate the font color of the Stroop stimulus. In two experiments, one in which the target’s color remained in the same location throughout the experiment and one in which the colors occupied a different patch location (still in the plus-sign configuration) on every trial, Stroop interference effects were observed on saccadic latency, but not on saccade amplitude or velocity indicating that all interference is resolved before a motor movement is made and, therefore, that Stroop interference does not affect response execution. Similar null effects on response execution were reported for type-written responses across four experiments by Logan and Zbrodoff ( 1998 ).

Kello et al. ( 2000 ) initially also observed no Stroop effects on vocal naming durations (the time it takes to actually vocalize the response). In a follow-up experiment, however, in which they introduced a response deadline of 575 ms, they observed Stroop congruency effects on response durations. This likely holds for the other studies on response execution mentioned here. Indeed, Hodgson et al. pointed out that they could not exclude the possibility that under some circumstances the spatial characteristics of saccades would also show effects on incongruent trials given previous work showing that increasing spatial separation between target and distractor stimuli leads to an increase in the effect of the distractor on characteristics of the saccadic response (Findlay, 1982 ; McSorley et al., 2004 ; Walker et al., 1997 ).

Bundt et al. ( 2018 ) recently reported a Stroop congruency effect on response execution times in a study requiring participants to use a computer mouse to point to the target patch on the screen. Response targets where all in the upper half of the computer screen and participants guided the mouse from a start position in the lower half of the screen. They observed this effect despite not separating the target and distractor or enforcing a response time deadline. The configuration differences, the use of mouse-tracking vs. the oculomotor methodology and the language of the stimuli (Dutch vs. English), might have contributed to producing the different results. Unfortunately, Bundt and colleagues did not employ a neutral trial baseline so it is not clear whether their effect represents interference, facilitation, or both.

In summary, two studies have reported Stroop effects on response execution; findings that represent a challenge to the currently assumed modularity between response selection and execution. More work is needed to determine what conditions produce Stroop effects on response execution and in which response modalities. Furthermore, it would be interesting for future research to reveal whether semantic and task conflict are registered at this very late stage of selection. For now, this work suggests that even if selection only occurred at the level of response output and not before, it is not always entirely successful, even if the eventual response is correct.

Locus or loci of selection?

In many early considerations of the Stroop effect, a putative explanation was that interference would not occur unless a name has been generated for the irrelevant dimension; and interference was a form of response conflict due to there being a single response channel (Morton, 1969 ). Since word reading would more quickly produce a name than color naming it was thought that the word name would be sat in the response buffer before the color name arrived and, thus, would have to be expunged before the correct name could be produced. Thus, Stroop interference was thought to be a consequence of the time it took to process each of the dimensions.

Treisman ( 1969 ) questioned why selective attention did not gate the irrelevant word. Treisman concluded that the task of focusing on one dimension whilst excluding the other was impossible, especially when the dimensions are presented simultaneously. Parallel processing of both dimensions would, therefore, occur and thus, response competition could be conceived of as the failure of selective attention to fully focus on the color dimension and gate the input from word processing. Bringing Treisman ( 1969 ) and Morton’s ( 1969 ) positions together, Dyer ( 1973 ) proposed interference results from both a failure in selective attention and a bottleneck at the level of response (at which the word information arrives more quickly). However, the speed-of-processing account has been shown to be unsupported (Glaser & Glaser, 1982; MacLeod & Dunbar, 1988 ), leaving the failure of attentional selection as the main mechanism leading to Stroop interference.

Whilst it is clear that participants must select a single response in the Stroop task and, thus, that selection occurs at response output, conflict stems from incompatibility between task-relevant and task-irrelevant stimulus features (Egner et al., 2007 ), and is, thus, stimulus-based conflict. However, even if stimulus incompatibility does make an independent contribution to Stroop interference it might not have an independent selection mechanism; all interference produced at all levels might accumulate and be resolved only later when a single response has to be selected. One way to investigate whether selection occurs at any level other than response output would be to show successful resolution of conflict in the complete absence of response conflict. The 2:1 color-response mapping paradigm is the closest method so far construed that would permit this but as we have explained it is problematic and moreover, it only addresses the distinction between semantic and response conflict.

There are now accounts of the Stroop task which argue that selection occurs both at early and late stages of processing (Altmann & Davidson, 2001 ; Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Phaf et al., 1990 ; Sharma & McKenna, 1998 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). For example, in Kornblum and colleagues’ models selection occurs for both SS-conflict and SR-conflict, independently. We have provided evidence for multiple levels of processing contributing to Stroop interference—both stimulus- and response-based contributions. At the level of the stimulus, we have argued that there is good evidence for task conflict. At the level of response, we have argued that the current methods used to dissociate forms of informational conflict including phonological, semantic (stimulus) and response conflict do not permit us to conclude in favor of separate selection mechanisms for each. Moreover, we have discussed evidence that selection at the level of response output is not entirely successful given that response execution effects have been reported.

Another approach would be to show that the different forms of conflict are independently affected by experimental manipulations. Above we alluded to Augustinova and colleagues research showing that semantic conflict is often reported to be preserved in contexts where response conflict is reduced (e.g., Augustinova & Ferrand, 2012 ). However, we discussed the potential limitations of this approach. Taking another example, in an investigation of the response set effect and non-response set effect, Hasshim and Parris ( 2018 ) reported within-subjects experiments in which the trial types (e.g., response set, non-response set, non-color-word neutral) were presented either in separate blocks (pure) or in blocks containing all trial types in a random order (mixed). They observed a decrease in RTs to response set trials when trials were presented in mixed blocks when compared to the RTs to response set trials in pure blocks. These findings demonstrate that presentation format modulates the magnitude of the response set effect, substantially reducing it when trials are presented in mixed blocks. Importantly for present purposes, the non-response set effect was not affected by the manipulation suggesting that the response set and non-response set effects are driven by independent mechanisms. However, Hasshim and Parris’s effect could also be a consequence of the limited effect of presentation format and simply be showing that some conflict is left over—and we do not know which type of conflict it is because the measure was not good enough (see also Hershman et al., 2020 ; Hershman & Henik, 2019 , 2020 , showing that conflict can be present but not expressed in the RT data). Future research could further investigate the effect of mixing trial types in blocks on the expression of types of conflict and facilitation in both within- and between-subjects designs.

Kinoshita et al. ( 2018 ) argued that semantic Stroop interference can be endogenously controlled evincing independent selection. The authors reported that a high proportion (75%) of non-readable neutral trials (#s) magnified semantic conflict (in the same way this manipulation increases task conflict). This means that a low proportion of non-readable neutral trials leads to reduced semantic conflict. However, since their manipulation was based on the number of non-readable stimuli, Kinoshita et al. ( 2018 ) would have also increased task conflict. Neatly, their non-color-related neutral word baseline condition permitted them to show that the semantic component of informational conflict was modulated. Uniquely, in their study they employed both semantic-associative and non-response set trials to measure semantic conflict, perhaps providing converging evidence for a modification of semantic conflict. Problematically, however, they did not include a measure of response conflict in their study so it is not known whether purported indices of response conflict are also affected along with the indices of semantic conflict and thus, their results do not unambiguously represent a modification of semantic conflict. Their study does, however, provide evidence that as task conflict increases, so inevitably does informational conflict because task conflict is an indication that the word is being processed (assuming a sufficient reading age; see Ferrand et al., 2020 ).

It is our contention that despite attempts to show independence of control of semantic and response conflict, the published evidence so far does not permit a clear conclusion on the matter because the measures themselves are problematic. Future research could combine the semantic distance manipulation (Klopfer, 1996 ) with a corollary for responses (see, e.g., Chen & Proctor, 2014 ; Wühr & Heuer, 2018 ). For example, an effect of the physical (e.g., red in blue, where red is next to blue on a response box vs. red in green when green is further away from the red response key) and conceptual (e.g., red in blue, where the red response is indicated by the key labeled ‘5’ and the blue by a key labeled ‘6’) distance of the response keys has been reported whereby the closer physically or conceptually the response keys, the greater the amount of interference experienced (Chen & Proctor, 2014 ). Controlling for semantic distance whilst manipulating response distance and vice versa might give an insight into the contributions of semantic and response conflict to Stroop interference by allowing the independent manipulation of both.

In our opinion, methods addressing task conflict, particularly those demonstrating negative facilitation and its control, are evidence for a form of conflict that is independent from response conflict. The evidence for an earlier locus (Hershman & Henik, 2019 ), distinct developmental trajectory (Ferrand et al., 2020 ) and independent control (Goldfarb & Henik, 2007 ; Kalanthroff et al., 2013 ) support the notion that task conflict has a different locus and selection mechanism to response conflict. Therefore, any model of Stroop performance that does not account for task conflict does not provide a full account of factors contributing to Stroop effects. Only one model currently accounts for task conflict (Kalanthroff et al., 2018 ) although this model employs the PDP connectionist architecture that falls foul of the word frequency findings noted above.

Unambiguous evidence that interference (or facilitation) is observed even in the absence of response competition (or convergence) constitutes a necessary prerequisite for moving beyond the historically favored response locus of Stroop effects. In our opinion, task conflict has been shown to be an independent locus for Stroop interference, but phonological, semantic and response conflict (collectively informational conflict) have not been shown to be independent forms of conflict. One could argue that models that incorporate early selection mechanisms are better supported by the evidence, at least in their ability to represent multiple levels of selection that might possibly occur, if not necessarily where that selection occurs since these models do not account for task conflict. Moreover, no extant model can currently predict interference that is observed to occur at the level of response execution and only one model seems able to account for differences in magnitudes of Stroop effects as a function of response modes (Roelofs, 2003 ).

In short, if the conclusions drawn here are accepted, models of Stroop task performance will have to be modified so they can more effectively account for multiple loci of both Stroop interference and facilitation. This also applies to the implementations of the Stroop task that are currently used in neuropsychological practice (e.g., Strauss et al., 2007 ) and applied in basic and applied research. As discussed by Ferrand and colleagues (2020), the extra sensitivity of the Stroop test (stemming from the ability to detect and rate each of these components separately) would provide clinical practitioners with invaluable information since the different forms of conflict are possibly detected and resolved by different neural regions. In sum, this review also calls for changes in Stroop research practices in basic, applied and clinical research.

Availability of data and material

Not applicable.

Algom, D., & Chajut, E. (2019). Reclaiming the Stroop effect back from control to input-driven attention and perception. Frontiers in Psychology, 10 , 1683. https://doi.org/10.3389/fpsyg.2019.01683

Article   PubMed   PubMed Central   Google Scholar  

Algom, D., Chajut, E., & Lev, S. (2004). A rational look at the emotional stroop phenomenon: A generic slowdown, not a stroop effect. Journal of Experimental Psychology: General, 133 (3), 323–338.

Article   Google Scholar  

Algom, D., & Fitousi, D. (2016). Half a century of research on Garner interference and the separability–integrality distinction. Psychological Bulletin, 142 (12), 1352–1383.

Article   PubMed   Google Scholar  

Altmann, E. M. & Davidson, D. J. (2001). An integrative approach to Stroop: Combining a language model and a unified cognitive theory. In J. D. Moore & K. Stenning (Eds.), Proceedings of the 23rd Annual Conference of the Cognitive Science Society (pp. 21–26). Hillsdale, NJ: Laurence Erlbaum.

Augustinova, M., Clarys, D., Spatola, N., & Ferrand, L. (2018b). Some further clarifications on age-related differences in Stroop interference. Psychonomic Bulletin & Review, 25 , 767–774.

Augustinova, M., & Ferrand, L. (2007). Influence de la présentation bicolore des mots sur l’effet Stroop [First letter coloring and the Stroop effect]. Annee Psychologique, 107 , 163–179.

Augustinova, M., & Ferrand, L. (2012). Suggestion does not de-automatize word reading: Evidence from the semantically based Stroop task. Psychonomic Bulletin & Review, 19 (3), 521–527.

Augustinova, M., & Ferrand, L. (2014). Automaticity of word reading evidence from the semantic stroop paradigm. Current Directions in Psychological Science, 23 (5), 343–348.

Augustinova, M., Flaudias, V., & Ferrand, L. (2010). Single-letter coloring and spatial cuiing do not eliminate or reduce a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 17 , 827–833.

Augustinova, M., Parris, B. A., & Ferrand, L. (2019). The loci of Stroop interference and facilitation effects with manual and vocal responses. Frontiers in Psychology, 10 , 1786.

Augustinova, M., Silvert, L., Ferrand, L., Llorca, P. M., & Flaudias, V. (2015). Behavioral and electrophysiological investigation of semantic and response conflict in the Stroop task. Psychonomic Bulletin & Review, 22 , 543–549.

Augustinova, M., Silvert, S., Spatola, N., & Ferrand, L. (2018a). Further investigation of distinct components of Stroop interference and of their reduction by short response stimulus intervals. Acta Psychologica, 189 , 54–62.

Barkley, R. A. (1997). Behavioral inhibition, sustained attention, and executive functions: Constructing a unifying theory of ADHD. Psychological Bulletin, 121 (1), 65.

Bench, C. J., Frith, C. D., Grasby, P. M., Friston, K. J., Paulesu, E., Frackowiak, R. S. J., & Dolan, R. J. (1993). Investigations of the functional anatomy of attention using the Stroop test. Neuropsychologia, 31 (9), 907–922.

Berggren, N., & Derakshan, N. (2014). Inhibitory deficits in trait anxiety: Increased stimulus-based or response-based interference? Psychonomic Bulletin & Review, 21 (5), 1339–1345.

Besner, D., Stolz, J. A., & Boutilier, C. (1997). The stroop effect and the myth of automaticity. Psychonomic Bulletin & Review , 4 (2), 221–225. https://doi.org/10.3758/BF03209396

Besner, D., & Stolz, J. A. (1998). Unintentional reading: Can phonological computation be controlled? Canadian Journal of Experimental Psychology-Revue Canadienne De Psychologie Experimentale, 52 (1), 35–43.

Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict monitoring and cognitive control. Psychological Review, 108 (3), 624–652.

Braem, S., Bugg, J. M., Schmidt, J. R., Crump, M. J., Weissman, D. H., Notebaert, W., & Egner, T. (2019). Measuring adaptive control in conflict tasks. Trends in Cognitive Sciences., 23 (9), 769–783.

Braver, T. S. (2012). The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences, 16 (2), 106–113.

Brown, M., & Besner, D. (2001). On a variant of Stroop’s paradigm: Which cognitions press your buttons? Memory & Cognition, 29 (6), 903–904.

Brown, T. L. (2011). The relationship between Stroop interference and facilitation effects: Statistical artifacts, baselines, and a reassessment. Journal of Experimental Psychology: Human Perception and Performance, 37 (1), 85–99.

PubMed   Google Scholar  

Brown, T. L., Gore, C. L., & Pearson, T. (1998). Visual half-field Stroop effects with spatial separation of word and color targets. Brain and Language, 63 (1), 122–142.

Bugg, J. M., & Crump, M. J. C. (2012). In support of a distinction between voluntary and stimulus-driven control: A review of the literature on proportion congruent effects. Frontiers in Psychology, 3 , 367.

Bundt, C., Ruitberg, M. F., Abrahamse, E. L. & Notebaert, W. (2018). Early and late indications of item-specific control in a Stroop mouse tracking study. PLoS One, 13 (5), e0197278.

Burt, J. S. (1994). Identity primes produce facilitation in a colour naming task. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 47 (A), 957–1000.

Burt, J. S. (1999). Associative priming in color naming: Interference and facilitation. Memory and Cognition, 27 (3), 454–464.

Burt, J. S. (2002). Why do non-colour words interfere with colour naming? Journal of Experimental Psychology-Human Perception and Performance, 28 (5), 1019–1038.

Chen, A., Bailey, K., Tiernan, B. N., & West, R. (2011). Neural correlates of stimulus and response interference in a 2–1 mapping Stroop task. International Journal of Psychophysiology, 80 (2), 129–138.

Chen, A., Tang, D., & Chen, X. (2013b). Training reveals the sources of Stroop and Flanker interference effects. PLoS ONE, 8 (10), e76580. https://doi.org/10.1371/journal.pone.0076580

Chen, J., & Proctor, R. W. (2014). Conceptual response distance and intervening keys distinguish actions goals in the Stroop Colour-Identification Task. Psychonomic Bulletin and Review, 21 (5), 1238–1243.

Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013a). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66 , 577–584.

Chuderski, A., & Smolen, T. (2016). An integrated utility-based model of conflict evaluation and resolution in the Stroop task. Psychological Review, 123 (3), 255–290.

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97 (3), 332.

Coltheart, M., Woollams, A., Kinoshita, S., & Perry, C. (1999). A position-sensitive Stroop effect: Further evidence for a left-to-right component in print-to-speech conversion. Psychonomic Bulletin & Review, 6 (3), 456–463.

Dalrymple-Alford, E. C. (1972). Associative facilitation and interference in the Stroop color-word task. Perception & Psychophysics, 11 (4), 274–276.

Dalrymple-Alford, E. C., & Budayr, B. (1966). Examination of some aspects of the Stroop color-word test. Perceptual and Motor Skills, 23 , 1211–1214.

De Fockert, J. W. (2013). Beyond perceptual load and dilution: A review of the role of working memory in selective attention. Frontiers in Psychology, 4 , 287.

De Houwer, J. (2003). On the role of stimulus-response and stimulus-stimulus compatibility in the Stroop effect. Memory & Cognition, 31 (3), 353–359.

Dennis, I., & Newstead, S. E. (1981). Is phonological recoding under strategic control? Memory & Cognition, 9 (5), 472–477.

Dishon-Berkovits, M., & Algom, D. (2000). The Stroop effect: It is not the robust phenomenon that you have thought it to be. Memory and Cognition , 28 , 1437–1449.

Dyer, F. N. (1973). The Stroop phenomenon and its use in the study of perceptual, cognitive and response processes. Memory & Cognition, 1 (2), 106–120.

Egner, T., Delano, M., & Hirsch, J. (2007). Separate conflict-specific cognitive control mechanisms in the human brain. NeuroImage, 35 (2), 940–948.

Egner, T., Ely, S., & Grinband, J. (2010). Going, going, gone: Characterising the time-course of congruency sequence effects. Frontiers in Psychology, 1 , 154.

Entel, O., & Tzelgov, J. (2018). Focussing on task conflict in the Stroop effect. Psychological Research Psychologische Forschung, 82 (2), 284–295.

Entel, O., Tzelgov, J., Bereby-Meyer, Y., & Shahar, N. (2015). Exploring relations between task conflict and informational conflict in the Stroop task. Psychological Research Psychologische Forschung, 79 , 913–927.

Ferrand, L., & Augustinova, M. (2014). Differential effects of viewing positions on standard versus semantic Stroop interference. Psychonomic Bulletin & Review, 21 (2), 425–431.

Ferrand, L., Ducrot, S., Chausse, P., Maïonchi-Pino, N., O’Connor, R. J., Parris, B. A., Perret, P., Riggs, K. J., & Augustinova, M. (2020). Stroop interference is a composite phenomenon: Evidence from distinct developmental trajectories of its components. Developmental Science, 23 (2), e12899.

Findlay, J. M. (1982). Global visual processing for saccadic eye movements. Vision Research, 22 (8), 1033–1045.

Fox, L. A., Schor, R. E., & Steinman, R. J. (1971). Semantic gradients and interference in color, spatial direction, and numerosity. Journal of Experimental Psychology, 91 (1), 59–65.

Gazzaniga, M. S., Ivry, R., & Mangun, G. R. (2013). Cognitive Neuroscience: The Biology of Mind (IV). Norton.

Google Scholar  

Gherhand, S., & Barry, C. (1998). Word frequency effects in oral reading are not merely age-of-acquisition effects in disguise. Journal of Experimental Psychology: Learning, Memory and Cognition, 24 , 267–283.

Gherhand, S., & Barry, C. (1999). Age of acquisition, word frequency, and the role of phonology in the lexical decision task. Memory & Cognition, 27 (4), 592–602.

Glaser, W. R., & Glaser, M. O. (1989). Context effects in stroop-like word and picture processing. Journal of Experimental Psychology: General, 118 (1), 13–42.

Goldfarb, L., & Henik, A. (2006). New data analysis of the Stroop matching task calls for a reevaluation of theory. Psychological Science, 17 (2), 96–100.

Goldfarb, L., & Henik, A. (2007). Evidence for task conflict in the Stroop effect. Journal of Experimental Psychology: Human Perception and Performance, 33 (5), 1170–1176.

Gonthier, C., Braver, T. S., & Bugg, J. M. (2016). Dissociating proactive and reactive control in the Stroop task. Memory and Cognition, 44 (5), 778–788.

Hasshim, N., Bate, S., Downes, M., & Parris, B. A. (2019). Response and semantic Stroop effects in mixed and pure blocks contexts: An ex-Gaussian analysis. Experimental Psychology, 66 (3), 231–238.

Hasshim, N., & Parris, B. A. (2014). Two-to-one color-response mapping and the presence of semantic conflict in the Stroop task. Frontiers in Psychology, 5 , 1157.

Hasshim, N., & Parris, B. A. (2015). Assessing stimulus-stimulus (semantic) conflict in the Stroop task using saccadic two-to-one colour response mapping and preresponse pupillary measures. Attention, Perception and Psychophysics, 77 , 2601–2610.

Hasshim, N., & Parris, B. A. (2018). Trial type mixing substantially reduces the response set effect in the Stroop task. Acta Psychologica, 189 , 43–53.

Heathcote, A., Popiel, S. J., & Mewhort, D. J. K. (1991). Analysis of response time distributions: An example using the Stroop task. Psychological Bulletin, 109 , 340–347.

Henik, A., & Salo, R. (2004). Schizophrenia and the stroop effect. Behavioral and Cognitive Neuroscience Reviews, 3 (1), 42–59.

Hershman, R., & Henik, A. (2019). Dissociation between reaction time and pupil dilation in the Stroop task. Journal of Experimental Psychology: Learning, Memory and Cognition, 45 (10), 1899–1909.

Hershman, R., & Henik, A. (2020). Pupillometric contributions to deciphering Stroop conflicts. Memory & Cognition, 48 (2), 325–333.

Hershman, R., Levin, Y., Tzelgov, J., & Henik, A. (2020). Neutral stimuli and pupillometric task conflict. Psychological Research Psychologische Forschung . https://doi.org/10.1007/s00426-020-01311-6

Hock, H. S., & Egeth, H. (1970). Verbal interference with encoding in a perceptual classification task.  Journal of Experimental Psychology, 83 (2, Pt.1), 299–303.

Hodgson, T. L., Parris, B. A., Gregory, N. J., & Jarvis, T. (2009). The saccadic Stroop effect: Evidence for involuntary programming of eye movements by linguistic cues. Vision Research, 49 (5), 569–574.

Jackson, J. D., & Balota, D. A. (2013). Age-related changes in attentional selection: Quality of task set or degradation of task set across time? Psychology and Aging , 28 (3), 744– 753. https://doi.org/10.1037/a0033159

Jiang, J., Zhang, Q., & van Gaal, S. (2015). Conflict awareness dissociates theta-band neural dynamics of the medial frontal and lateral frontal cortex during trial-by-trial cognitive control. NeuroImage, 116 , 102–111.

Jonides, J. & Mack, R. (1984). On the Cost and Benefit of Cost and Benefit. Psychological Bulletin , 96 (1), 29–44.

Kahneman, D., & Chajczyk, D. (1983). Tests of automaticity of reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9 (4), 497–509.

Kalanthroff, E., Goldfarb, L., Usher, M., & Henik, A. (2013). Stop inter- fering: Stroop task conflict independence from informational conflict and interference. Quarterly Journal of Experimental Psychology , 66 , 1356–1367. https://doi.org/10.1080/17470218.2012.741606 .

Kalanthroff, E., Avnit, A., Henik, A., Davelaar, E., & Usher, M. (2015). Stroop proactive control and task conflict are modulated by concurrent working memory load. Psychonomic Bulletin and Review, 22 (3), 869–875.

Kalanthroff, E., Davelaar, E., Henik, A., Goldfarb, L., & Usher, M. (2018). Task conflict and proactive control: A computational theory of the Stroop task. Psychological Review, 125 (1), 59–82.

Kane, M. J., & Engle, R. W. (2003). Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General, 132 (1), 47–70.

Kello, C. T., Plaut, D. C., & MacWhinney, B. (2000). The task-dependence of staged versus cascaded processing: An empirical and computational study of Stroop interference in speech production. Journal of Experimental Psychology: General, 129 (3), 340–360.

Kim, M.-S. Min, S.-J. Kim, K., & Won, B.-Y. (2006). Concurrent working memory load can reduce distraction: An fMRI study [Abstract]. Journal of Vision, 6 (6):125, 125a, http://journalofvision.org/6/6/125/ , doi: https://doi.org/10.1167/6.6.125 .

Kim, S.-Y., Kim, M.-S., & Chun, M. M. (2005). Concurrent working memory load can reduce distraction. Proceedings of the National Academy of Sciences, 102 (45), 16524–16529.

Kinoshita, S., De Wit, B., & Norris, D. (2017). The magic of words reconsidered: Investigating the automaticity of reading color-neutral words in the Stroop task. Journal of Experimental Psychology: Learning Memory and Cognition, 43 (3), 369–384.

Kinoshita, S., Mills, L., & Norris, D. (2018). The semantic stroop effect is controlled by endogenous attention.  Journal of Experimental Psychology: Learning Memory and Cognition . DOI:  https://doi.org/10.1037/xlm0000552

Klein, G. S. (1964). Semantic power measured through the interference of words with color-naming. The American Journal of Psychology, 77 (4), 576–588.

Klopfer, D. S. (1996). Stroop interference and color-word similarity. Psychological Science, 7 (3), 150–157.

Kornblum, S., Hasbroucq, T., & Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus-response compatibility–a model and taxonomy. Psychological Review, 97 (2), 253–270.

Kornblum, S., & Lee, J. W. (1995). Stimulus-response compatibility with relevant and irrelevant stimulus dimensions that do and do not overlap with the response. Journal of Experimental Psychology: Human Perception and Performance, 21 (4), 855–875.

La Heij, W., & van der Heijdan & Schreuder, . (1985). Semantic priming and Stroop-like interference in word-naming tasks. Journal of Experimental Psychology: Human Perception and Performance, 11 , 60–82.

Laeng, B., Torstein, L., & Brennan, T. (2005). Reduced Stroop interference for opponent colours may be due to input factors: Evidence from individual differences and a neural network simulation. Journal of Experimental Psychology: Human Perception and Performance, 31 (3), 438–452.

Lakhzoum, D. (2017). Dissociating semantic and response conflicts in the Stroop task: evidence from a response-stimulus interval effect in a two-to-one paradigm. Master’s thesis in partial fulfilment of the requirements for the research Master’s degree in Psychology. Faculty of Psychology, Social Sciences and Education Science Clermont-Ferrand.

Lamers, M. J., Roelofs, A., & Rabeling-Keus, I. M. (2010). Selection attention and response set in the Stroop task. Memory & Cognition, 38 (7), 893–904.

Leung, H.-C., Skudlarski, P., Gatenby, J. C., Peterson, B. S., & Gore, J. C. (2000). An event-related functional MRI study of the Stroop color word interference task. Cerebral Cortex, 10 (6), 552–560.

Levin, Y., & Tzelgov, T. (2016). What Klein’s “semantic gradient” does and does not really show: Decomposing Stroop interference into task and informational conflict components. Frontiers in Psychology, 7 , 249.

PubMed   PubMed Central   Google Scholar  

Littman, R., Keha, E., & Kalanthroff, E. (2019). Task conflict and task control: A mini-review. Frontiers in Psychology, 10 , 1598.

Logan, G. D., & Zbrodoff, N. J. (1979). When it helps to be misled: Facilitative effects of increasing the frequency of conflicting stimuli in a Stroop-like task. Memory and Cognition, 7 , 166–174.

Logan, G. D., & Zbrodoff, N. J. (1998). Stroop-type interference: Congruity effects in colour naming with typewritten responses. Journal of Experimental Psychology-Human Perception and Performance, 24 (3), 978–992.

Lorentz, E., McKibben, T., Ekstrand, C., Gould, L., Anton, K., & Borowsky, R. (2016). Disentangling genuine semantic Stroop effects in reading from contingency effects: On the need for two neutral baselines. Frontiers in Psychology, 7 , 386.

Luo, C. R. (1999). Semantic competition as the basis of Stroop interference: Evidence from Color-Word matching tasks. Psychological Science, 10 (1), 35–40.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109 (2), 163–203.

MacLeod, C. M. (1992). The Stroop task: The" gold standard" of attentional measures. Journal of Experimental Psychology: General, 121 (1), 12–14.

MacLeod, C. M., & Dunbar, K. (1988). Training and Stroop-like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14 (1), 126–135.

MacLeod, C. M., & MacDonald, P. A. (2000). Interdimensional interference in the Stroop effect: Uncovering the cognitive and neural anatomy of attention. Trends in Cognitive Sciences, 4 (10), 383–391.

Mahon, B. Z., Garcea, F. E., & Navarrete, E. (2012). Picture-word interference and the Response-Exclusion Hypothesis: A response to Mulatti and Coltheart. Cortex, 48 , 373–377.

Manwell, L. A., Roberts, M. A., & Besner, D. (2004). Single letter colouring and spatial cuing eliminates a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 11 (3), 458–462–817.

Marmurek, H. H. C., Proctor, C., & Javor, A. (2006). Stroop-like serial position effects in color naming of words and nonwords. Experimental Psychology, 53 (2), 105–110.

Mathews, A., & MacLeod, C. (1985). Selective processing of threat cues in anxiety states. Behaviour Research and Therapy, 23 (5), 563–569.

Maurer, U., Brem, S., Bucher, K., & Brandeis, D. (2005). Emerging neurophysiological specialization for letter strings. Journal of Cognitive Neuroscience, 17 (10), 1532–1552.

McClain, L. (1983). Effects of response type and set size on Stroop color-word performance. Perceptual & Motor Skills, 56 , 735–743.

McSorley, E., Haggard, P., & Walker, R. (2004). Distractor modulation of saccade trajectories: Spatial separation and symmetry effects. Experimental Brain Research, 155 , 320–333.

Melara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110 (3), 422–471.

Melara, R. D., & Mounts, J. R. W. (1993). Selective attention to Stroop dimension: Effects of baseline discriminability, response mode, and practice. Memory & Cognition , 21 , 627–645.

Monahan, J. S. (2001). Coloring single Stroop elements: Reducing automaticity or slowing color processing? The Journal of General Psychology, 128 (1), 98–112.

Monsell, S., Dolyle, M. C., & Haggard, P. N. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General, 118 , 43–71.

Monsell, S., Taylor, T. J., & Murphy, K. (2001). Naming the colour of a word: Is it responses or task sets that compete? Memory & Cognition, 29 (1), 137–151.

Morton, J. (1969). Categories of interference: Verbal mediation and conflict in card sorting. British Journal of Psychology., 60 (3), 329–346.

Navarrete, E., Sessa, P., Peressotti, F., & Dell’Acqua, R. (2015). The distractor frequency effect in the colour-naming Stroop task: An overt naming event-related potential study. Journal of Cognitive Psychology, 27 (3), 277–289.

Neely, J. H., & Kahan, T. A. (2001). Is semantic activation automatic? A critical re-evaluation. In H.L. Roediger, J.S. Nairne, I. Neath, & A.M. Surprenant (Eds.), The Nature of Remembering: Essays in Honor of Robert G. Crowder (pp. 69–93). Washington, DC: American Psychological Association.

Neumann, O. (1980). Selection of information and control of action. Unpublished doctoral dissertation, University of Bochum, Bochum, Germany.

Parris, B. A. (2014). Task conflict in the Stroop task: When Stroop interference decreases as Stroop facilitation increases in a low task conflict context. Frontiers in Psychology, 5 , 1182.

Parris, B. A., Sharma, D., & Weekes, B. (2007). An Optimal Viewing Position Effect in the Stroop Task When Only One Letter Is the Color Carrier. Experimental Psychology , 54 (4), 273–280. https://doi.org/10.1027/1618-3169.54.4.273 .

Parris, B. A., Augustinova, M., & Ferrand, L. (2019a). Editorial: The locus of the Stroop effect. Frontiers in Psychology . https://doi.org/10.3389/fpsyg.2019.02860

Parris, B. A., Sharma, D., Weekes, B. S. H., Momenian, M., Augustinova, M., & Ferrand, L. (2019b). Response modality and the Stroop task: Are there phonological Stroop effects with manual responses? Experimental Psychology, 66 (5), 361–367.

Parris, B. A., Wadsley, M. G., Hasshim, N., Benattayallah, A., Augustinova, M., & Ferrand, L. (2019c). An fMRI study of Response and Semantic conflict in the Stroop task. Frontiers in Psychology, 10 , 2426.

Phaf, R. H., Van Der Heijden, A. H. C., & Hudson, P. T. W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22 , 273–341.

Redding, G. M., & Gerjets, D. A. (1977). Stroop effects: Interference and facilitation with verbal and manual responses. Perceptual & Motor Skills, 45 , 11–17.

Regan, J. E. (1979). Automatic processing . (Doctoral dissertation, University of California, Berkeley, 1977). Dissertation Abstracts International 39, 1018-B.

Repovš, G. (2004). The mode of response and the Stroop effect: A reaction time analysis. Horizons of Psychology, 13 , 105–114.

Risko, E. F., Schmidt, J. R., & Besner, D. (2006). Filling a gap in the semantic gradient: Color associates and response set effects in the Stroop task. Psychonomic Bulletin & Review, 13 (2), 310–315.

Roelofs, A. (2003). Goal-referenced selection of verbal action: Modeling attentional control in the Stroop task. Psychological Review, 110 (1), 88–125.

Roelofs, A. (2010). Attention and Facilitation: Converging information versus inadvertent reading in Stroop task performance. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36 , 411–422.

Scheibe, K. E., Shaver, P. R., & Carrier, S. C. (1967). Color association values and response interference on variants of the Stroop test. Acta Psychologica, 26 , 286–295.

Schmidt, J. R. (2019). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin and Review, 26 (3), 753–771.

Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (3), 514–523.

Schmidt, J. R., & Cheesman, J. (2005). Dissociating stimulus-stimulus and response-response effects in the Stroop task. Canadian Journal of Experimental Psychology, 59 (2), 132–138.

Schmidt, J. R., Hartsuiker, R. J., & De Houwer, J. (2018). Interference in Dutch-French bilinguals: Stimulus and response conflict in intra- and interlingual Stroop. Experimental Psychology, 65 (1), 13–22.

Schmidt, J. R., Notebaert, W., & Den Bussche, V. (2015). Is conflict adaptation an illusion? Frontiers in Psychology, 6 , 172.

Selimbegovič, L., Juneau, C., Ferrand, L., Spatola, N., & Augustinova, M. (2019). The Impact of Exposure to Unrealistically High Beauty standards on inhibitory control. L’année Psychologique/topics in Cognitive Psychology, 119 , 473–493.

Seymour, P. H. K. (1977). Conceptual encoding and locus of the Stroop effect. Quarterly Journal of Experimental Psychology, 29 (2), 245–265.

Shallice, T. (1988). From Neuropsychology to Mental Structure. Cambridge University Press; Cambridge.

Sharma, D., & McKenna, F. P. (1998). Differential components of the manual and vocal Stroop tasks. Memory & Cognition, 26 (5), 1033–1040.

Shichel, I., & Tzelgov, J. (2018). Modulation of conflicts in the Stroop effect. Acta Psychologica, 189 , 93–102.

Singer, M. H., Lappin, J. S., & Moore, L. P. (1975). The interference of various word parts on colour naming in the Stroop test. Perception & Psychophysics, 18 (3), 191–193.

Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy younger and older adults and in individuals with dementia of the Alzheimer’s type. Journal of Experimental Psychology: Human Perception and Performance, 22 (2), 461.

Steinhauser, M., & Hubner, R. (2009). Distinguishing response conflict and task conflict in the Stroop task: Evidence from ex-Gaussian distribution analysis. Journal of Experimental Psychology. Human Perception and Performance, 35 (5), 1398–1412.

Stirling, N. (1979). Stroop interference: An input and an output phenomenon. The Quarterly Journal of Experimental Psychology, 31 (1), 121–132.

Strauss, E., Sherman, E., & Spreen, O. (2007). A compendium of neuropsychological tests: Administration, Norms and Commentary (3rd ed.). Oxford University Press.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18 (6), 643–662.

Sugg, M. J., & McDonald, J. E. (1994). Time course of inhibition in color-response and word-response versions of the Stroop task. Journal of Experimental Psychology: Human Perception and Performance, 20 (3), 647–675.

Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76 (3), 282–299.

Tsal, Y., & Benoni, H. (2010). Diluting the burden of load: Perceptual load effects are simply dilution effects. Journal of Experimental Psychology: Human Perception and Performance, 36 (6), 1645–1656.

Turken, A. U., & Swick, D. (1999). Response selection in the human anterior cingulate cortex. Nature Neuroscience, 2 , 920–924.

Tzelgov, J., Henik, A., Sneg, R., & Baruch, O. (1996). Unintentional word reading via the phonological route: The Stroop effect with cross-script homophones. Journal of Experimental Psychology: Learning, Memory and Cognition, 22 (2), 336–349.

Van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27 (3), 497–504.

Van Voorhis, B. A., & Dark, V. J. (1995). Semantic matching, response mode, and response mapping as contributors to retroactive and proactive priming. Journal of Experimental Psychology: Learning, Memory and Cognition, 21 , 913–932.

Virzi, R. A., & Egeth, H. E. (1985). Toward a Translational Model of Stroop Interference. Memory & Cognition, 13 (4), 304–319.

Walker, R., Deubel, H., Schneider, W., & Findlay, J. (1997). Effect of remote distractors on saccade programming: Evidence for an extended fixation zone. Journal of Neurophysiology, 78 , 1108–1119.

Wheeler, D. D. (1977). Locus of interference on the Stroop test. Perceptual and Motor Skills, 45 , 263–266.

White, D., Risko, E. F., & Besner, D. (2016). The semantic Stroop effect: An ex-Gaussian analysis. Psychonomic Bulletin & Review, 23 (5), 1576–1581.

Wühr, P., & Heuer, H. (2018). The impact of anatomical and spatial distance between responses on response conflict. Memory and Cognition, 46 , 994–1009.

Yamamoto, I., & S. & McLennan, C. T. . (2016). A reverse Stroop task with mouse tracking. Frontiers in Psychology, 7 , 670.

Zahedi, A., Rahman, R. A., Stürmer, B., & Sommer, W. (2019). Common and specific loci of Stroop effects in vocal and manual tasks, revealed by event-related brain potentials and post-hypnotic suggestions. Journal of Experiment Psychology: General. EPub ahead of print: http://dx.doi.org/ https://doi.org/10.1037/xge0000574

Zhang, H., & Kornblum, S. (1998). The effects of stimulus–response mapping and irrelevant stimulus–response and stimulus–stimulus overlap in four-choice Stroop tasks with single-carrier stimuli. Journal of Experimental Psychology: Human Perception and Performance, 24 (1), 3–19.

Zhang, H. H., Zhang, J., & Kornblum, S. (1999). A parallel distributed processing model of stimulus–stimulus and stimulus–response compatibility. Cognitive Psychology, 38 (3), 386–432.

Download references

The work reported was supported in part by ANR Grant ANR-19-CE28-0013 and RIN Tremplin Grant 19E00851 of Normandie Région, France.

Author information

Authors and affiliations.

Department of Psychology, Faculty of Science and Technology, Bournemouth University, Talbot Campus, Poole, Fern Barrow, BH12 5BB, UK

Benjamin A. Parris, Nabil Hasshim & Michael Wadsley

School of Psychology, University College Dublin, Dublin, Ireland

Nabil Hasshim

Normandie Université, UNIROUEN, CRFDP, 76000, Rouen, France

Maria Augustinova

Université Clermont Auvergne, CNRS, LAPSCO, 63000, Clermont-Ferrand, France

Ludovic Ferrand

School of Applied Social Sciences, De Montfort University, Leicester, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Benjamin A. Parris .

Ethics declarations

Conflict of interest, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Parris, B.A., Hasshim, N., Wadsley, M. et al. The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection. Psychological Research 86 , 1029–1053 (2022). https://doi.org/10.1007/s00426-021-01554-x

Download citation

Received : 10 July 2020

Accepted : 27 June 2021

Published : 13 August 2021

Issue Date : June 2022

DOI : https://doi.org/10.1007/s00426-021-01554-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

HYPOTHESIS AND THEORY article

Reclaiming the stroop effect back from control to input-driven attention and perception.

Daniel Algom

  • 1 School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
  • 2 Department of Education and Psychology, Open University of Israel, Ra’anana, Israel

According to a growing consensus, the Stroop effect is understood as a phenomenon of conflict and cognitive control. A tidal wave of recent research alleges that incongruent Stroop stimuli generate conflict, which is then managed and resolved by top-down cognitive control. We argue otherwise: control studies fail to account for major Stroop results obtained over a century-long history of research. We list some of the most compelling developments and show that no control account can serve as a viable explanation for major Stroop phenomena and that there exist more parsimonious explanations for other Stroop related phenomena. Against a wealth of studies and emerging consensus, we posit that data-driven selective attention best accounts for the gamut of existing Stroop results. The case for data-driven attention is not new: a mere twenty-five years ago, the Stroop effect was considered “the gold standard” of attention ( MacLeod, 1992 ). We identify four pitfalls plaguing conflict monitoring and control studies of the Stroop effect and show that the notion of top-down control is gratuitous. Looking at the Stroop effect from a historical perspective, we argue that the recent paradigm change from stimulus-driven selective attention to control is unwarranted. Applying Occam’s razor, the effects marshaled in support of the control view are better explained by a selectivity of attention account. Moreover, many Stroop results, ignored in the control literature, are inconsistent with any control account of the effect.

Everyday functioning requires a modicum of ability to attend selectively to the relevant feature of objects, excluding irrelevant or distracting features. In the absence of this ability, one cannot concentrate on texting a friend in the cafeteria, listening to a presentation in class, or negotiating the traffic when driving or walking. Facility at isolating the task-relevant attribute is indispensable for adaptation and survival. The Stroop effect ( Stroop, 1935 ) assays this vital mental faculty. In fact, the Stroop effect is psychology’s oldest and still most popular tool for assessing the ability at focusing exclusively on the attribute of interest in the object ( Eidels et al., 2010 ). In Stroop’s (1935) original setup, the objects were color words printed in color, and the relevant attribute for responding was the color (while ignoring the carrier word). To gauge the influence of the task-irrelevant words, the Stroop effect is defined as the difference in color-naming performance between congruent (the word naming its color such as RED in red, with the former indicating the word and the latter the color) and incongruent (word and color conflict, such as RED in green) stimuli. Better performance with congruent than with incongruent stimuli shows that people paid attention to the task-irrelevant words, thereby compromising exclusive focus on the print colors. Had people focused exclusively on the target color, no word dependent difference in color naming (=Stroop effect) would have emerged. A century after Stroop’s landmark study, the effect bearing his name continues to fascinate researchers, sustaining an ever growing amount of studies. Despite the vast literature, the effect has eluded a consensual theoretical resolution.

A Bit of History

The Stroop effect boasts a convoluted history. In the first period, between 1935 and 1964, the effect attracted little interest and was discussed as a learning phenomenon ( MacLeod, 1992 ). In Stevens’ (1951) celebrated handbook, there is but a single passing reference to Stroop in a chapter on learning and retention. After 1964, the theoretical interpretation of the effect changed dramatically to one of attention ( Klein, 1964 ; Jensen and Rohwer, 1966 ). The number of publications rose quickly, and the pace shows no signs of abating to date. The new construal of the Stroop effect occurred contemporaneously with the advent of the cognitive paradigm in psychology. The trend of accommodating attention peaked in the last decade of the Twentieth century. Colin M. MacLeod, author of the definitive review ( MacLeod, 1991 ), called the Stroop effect “one of the benchmark measures of attention ” ( MacLeod, 1992 , p. 12).

However, the dominant conceptual framing of the Stroop effect changed yet again at around the turn of the twenty-first century. The new approach centered on the notions of “conflict” and “control.” It was actually the latter term that was first popularized by Posner and his associates (e.g., Posner and Petersen, 1990 ; Posner and Raichle, 1994 ; see also, Petersen and Posner, 2012 ). These authors conceived performance in the Stroop task to be under “executive control” ( Fan et al., 2002 , p. 341) or simply as an “executive function” ( Petersen and Posner, 2012 , p. 73) under the control of well localized brain loci (in particular, the anterior cingulate system). Of course, it would be absurd to deny brain control of whatever we do, but assuming minute monitoring and very-small-scale response adjustments via central command ignores the influence of input-driven bottom-up processes. An all-engulfing central control view would still need to explain the ways and means of top-down penetration of Stroop performance on such a fine-grain scale. For all his efforts at identification of brain loci for cognitive functions, Posner was aware of the fact that these associations did not amount to a (Stroop) theory, to wit, “much needs to be learned about the mechanisms ” used by the “executive system” ( Posner and Raichle, 1994 , p. 174, emphasis added). Subsequent development of the control view claimed to identify such a specific top-down mechanism – conflict monitoring and management – which governs Stroop performance. This novel theory of the Stroop effect rests on the original observation by Posner and Raichle, 1994 that “the anterior cingulate system is more active during trials of the Stroop task in which conflict exists than during trials in which it does not” (p. 171). However, more recent research increasingly questions an exclusive connection between enhanced activity of the anterior cingulate system and conflict (e.g., Steinhauser and Hiibner, 2009 ; Grinband et al., 2011a ; Levin and Tzelgov, 2016 ; see also again, Posner and Raichle, 1994 ).

Conflict monitoring theory ( Botvinick et al., 2001 ) proposes that performance in the Stroop task is governed by central control, which adjusts the attention allocated to the target color on a trial-to-trial basis. In particular, Stroop-incongruent stimuli generate a large amount of conflict (due to the mismatch between the color and the word). This conflict, in turn, invites increased control, which subsequently reduces the attention allocated to the task-irrelevant word. It is difficult to overstate the grip on current research of the control account. The fad of conflict monitoring and control is unprecedented within the Stroop milieu; following Schmidt’s (2019) observation, the first few articles published between 1998 and 2004 now combine for over 30,000 citations in the literature (e.g., Carter et al., 1998 ; Botvinick et al., 1999 , 2001 , 2004 ; MacDonald et al., 2000 ; Miller and Cohen, 2001 ; Kerns et al., 2004 ; see Schmidt, 2019 , for an extensive bibliography). The upshot is, the Stroop effect has been appropriated from being an index of input-driven selective attention to a tool for generating conflict and measuring control.

Goal of The Present Review

We believe that the recent paradigm shift in the construal of the Stroop effect is unwarranted. Our goal in this review is to show, against a wealth of recent studies and emerging consensus, that there is in fact no compelling evidence for control or top-down influence in the Stroop effect. Certainly, the term “top-down” is used in a variety of ways in different domains of cognitive psychology (see Firestone and Scholl, 2016 ). Within the Stroop milieu, “top-down” influence is currently conceived as an overall strategy, which is typically determined in advance. It is exercised through control and results in adaptation to conflict. It is this meaning of “top-down” influence that we challenge as a valid theory of the Stroop effect.

We are not alone in challenging the conflict monitoring account. In the face of an overwhelming literature, James Schmidt has mounted a powerful attack on the psychological reality of conflict monitoring and control, dubbing them repeatedly “an illusion” (e.g., Schmidt et al., 2015 , 2018 ). In two comprehensive reviews, Schmidt concluded that data-driven explanations (e.g., biased learning and memory) provide a sufficient account of the findings subsumed under the conflict monitoring and control ( Schmidt, 2013 , 2019 ; see also, Schmidt and Besner, 2008 ; Schmidt, 2016a , b ). Notably, Schmidt’s alternative explanation does not appeal to the notion of conflict and control. Schmidt addresses in admirable detail the various biases lurking in major control studies and concludes that those biases compromise their validity as well as the attendant explanation in terms of conflict and control. Given Schmidt’s contribution and the availability of further comprehensive reviews of the control literature (e.g., Egner, 2008 , 2014 ; Bugg and Chanani, 2011 ; Bugg and Crump, 2012 ; Bugg and Hutchison, 2013 ; Bugg, 2014 ; Abrahamse et al., 2016 ; Cohen-Shikora et al., in press ), we eschew another general review. Instead, the present article is a theoretical critique of the control account, one rooted in bona fide Stroop literature.

The present review takes the neglect of basic Stroop results in control studies as a point of departure and expands the analysis to show that conflict monitoring and control cannot serve as a viable theory of the Stroop effect. As we recounted, the Stroop effect boasts a long and rich history (rapidly approaching the century mark), but large chunks of this research are ignored in the control literature. We show that factoring in basic findings of proper Stroop research challenges the validity of any theory of conflict monitoring and control.

The Structure of the Review

To anticipate the development, we first state in a concise fashion our main argument. Four pitfalls plaguing control studies of the Stroop effect are then pinpointed. We follow by discussing each point in detail. These discussions, informed by basic Stroop literature, form the backbone of the paper. The understanding that conflict or control accounts do not comprise a viable candidate explanation of the Stroop effect is stated in the section “Conclusion.”

The Main Argument: What is and what is not Explained by Conflict and Control?

Very succinctly, the conflict monitoring account proposes that attention is dynamically allocated to either the target (color) or the distractor (color word) via central control. Each time high conflict is met (by a Stroop-incongruent stimulus), control is engaged to enhance focus on the target. This amplified control is relaxed when high conflict is not experienced (by a Stroop-congruent stimulus). Of the wide range of Stroop-related phenomena (see, e.g., MacLeod, 1991; Melara and Algom, 2003 , or Sabri et al., 2001 , for reviews), the evidence for the conflict monitoring account is based almost exclusively on two effects: the proportion congruent (PC) effect and the sequential effect known as the Gratton effect ( Gratton et al., 1992 ).

What is Explained by Conflict Monitoring and Control?

The PC effect is the observation that the Stroop effect is smaller when there are a disproportionately large number of incongruent stimuli in the set. For example, the Stroop effect is smaller when the stimulus ensemble includes 80% incongruent stimuli (hence 20% congruent stimuli) than when the ensemble includes 20% incongruent stimuli (hence 80% congruent stimuli). The conflict monitoring account provides a ready explanation for this modulation of the Stroop effect: Participants experience a great deal of conflict in the mostly incongruent set, a condition that is bound to summon strong central control. The enhanced control, in turn, results in focused attention to the target attribute. The task-irrelevant word is less attended, and the net result is a small Stroop effect. Therefore, the greater the number of incongruent stimuli, the smaller the Stroop effect.

The Gratton effect is the observation that the (color) response to an incongruent stimulus that follows an incongruent stimulus is faster than the response to an incongruent stimulus that does not follow an incongruent stimulus (i.e., it is preceded by a congruent stimulus). The same explanation is offered by the conflict account, now on a smaller scale. After experiencing conflict on trial n −1, control is invited to exert its influence, so that its salutary effect is observed on trial n . In other words, due to enhanced control, the participant adapts to conflict and maximizes the ability to ignore the task-irrelevant word.

In summary, this new account provides reasonably straightforward explanations for these two effects in terms of conflict, control, and conflict adaptation. There is a pitfall, though: Much simpler explanations are available based on properties of the data at hand. We discuss these stimulus-driven explanations and show that they are to be favored over control on grounds of both parsimony and general applicability.

What is not Explained by Conflict Monitoring and Control?

Whereas alternative explanations exist for the PC and the Gratton effects ( Schmidt, 2019 ), conflict monitoring and control theory have real difficulty explaining the following Stroop finding. Presenting the same number of incongruent stimuli can result in a large Stroop effect, a zero Stroop effect, or a reverse Stroop effect (where colors intrude on word naming more than vice versa). The trifle stimulus manipulation that produces these diverse outcomes is slight changes in the relative salience of the color and the word components of the stimulus. It is important to note that the changes of salience are so slight that the words remain eminently legible and the colors similarly remain eminently identifiable under all the conditions. These findings are devastating for the control account (e.g., Garner and Felfoldy, 1970 ; Garner, 1974 ; Pomerantz, 1983 ; Melara and Mounts, 1993 ; Algom et al., 1996 ; Melara and Algom, 2003 ; Algom and Fitousi, 2016 ). Presumably the same amount of conflict is experienced, yet performance changes dramatically regardless of “conflict.”

Quite apart from these observations, portions of the Stroop literature contain studies in which presentation of Stroop stimuli – i.e., conflict generating stimuli – does not yield a Stroop effect (e.g., Flowers et al., 1979 ; McClain, 1983a , b ; Glaser and Glaser, 1989 ). Again, no control explanation is able to account for such results. In general, control theory is unable to explain variation in Stroop results when the amount of conflict is held constant.

A further observation is arguably fatal for control theory: congruent stimuli produce Stroop facilitation (faster color naming to congruent than to neutral stimuli) just as incongruent stimuli produce Stroop interference (faster color naming to neutral than to incongruent stimuli), and the Stroop effect entails both , i.e., the effect is not solely interference. Thus, participants respond “red” faster to the word RED in red than to the word TABLE in red, a result called facilitation, and the Stroop effect is sometimes generated wholly or mostly by facilitation rather than by interference ( Brown, 2011 ; Eidels, 2012 ). The faster RTs to congruent than to neutral stimuli – Stroop facilitation – is not a transient or ephemeral result; it is a systematic effect (as much as Stroop interference), and conflict monitoring theory seems unable to account for a Stroop effect produced by facilitation. Finally, control theory faces difficulty in accounting for Stroop’s original results ( Stroop, 1935 ). In Stroop’s experimental condition, all of the stimuli were incongruent, so that control was presumably very strong. Conflict monitoring theory predicts a small Stroop effect (interference). In sharp contrast to this prediction, Stroop recorded what is arguably the largest Stroop effect in the literature.

In the remainder of the review, we expand on all the above points. We show that effects attributed to central top-down control are actually changes in the stimulus input; the effects are well captured by input-driven attention or its failure. Next, we identify four pitfalls lurking in studies performed under the control approach.

Four Pitfalls in Control Studies of the Stroop Effect

First , arguably the most severe pitfall is that key term of “conflict” in the “conflict-generated-control” approach is vague and imprecise. The problem is already apparent in the widely cited study of Botvinick et al. (2001) , a pioneering undertaking in the field. The notions of “conflict monitoring” and “control” are thoroughly discussed, but what is missing from the text is a clear, unambiguous theoretical definition of the key term of “conflict.” Monitoring is rightly showcased as the new development (the added component to the computational model of Cohen et al., 1990 , or that of Cohen and Huston, 1994 ), but what is being monitored is underdefined. In lieu of a theoretical definition, Botvinick et al. (2001) ponder how “conflict might be measured ” or “ operationally defined” (p. 630; emphases added). For a tool, the authors elected to use Hopfield’s (1982) measure of “energy” in a recurrent neural network to indicate the level of conflict; in words, “conflict” is conceived as “the simultaneous activation of incompatible representations … e.g., representations of alternate responses” ( Botvinick et al., 2001 , p. 630). This definition is imprecise as is. In particular, the notion of “incompatible representations” is left hopelessly ambiguous.

To understand the cost of the ambiguity, consider the following critical question. Does “conflict” and “incompatible representations” apply only to logically contradictory responses (hence, to truly incompatible responses) or to all possible responses to multidimensional stimuli? To render the question more concrete: Is a circle in green and the word RED in green both conflict stimuli? With the first stimulus, there is no logical or semantic conflict (or agreement) between color and shape. There cannot be congruent and incongruent cases with stimuli composed of color and shape – a green circle is neither more nor less congruent or incongruent than say a blue rectangle. The Stroop effect cannot be calculated for such stimuli simply because the Stroop effect is defined by the difference between congruent and incongruent cases. A certain shape and a certain color cannot be in conflict because neither excludes the other; the responses to the shape and the color of a green apple are never incompatible. By contrast, the second stimulus is a Stroop stimulus: The word and the color can match (=congruent stimulus) or conflict (=incongruent stimulus). An incongruent Stroop stimulus is a genuine conflict stimulus because the response to the word excludes the response to the color. The responses to the word and to the color are inescapably incompatible. Conversely, for the congruent Stroop stimulus, RED in red, the responses to the word and the color do not compete with one another as they are the very same single response. Because the responses are compatible (not incompatible), congruent Stroop stimuli are free of conflict. Considering the Botvinick et al. (2001) model, the approach called “conflict monitoring and control” does not appreciate or recognize the qualitative difference between Stroop or conflict stimuli, on the one hand, and non-Stroop or non-conflict stimuli, on the other hand. Adverse consequences ensue for theory and research alike.

In the computational model of Botvinick et al. (2001) , virtually all multidimensional stimuli are conflict stimuli, i.e., Stroop-congruent stimuli such as RED in red and non-Stroop stimuli such as a green apple all are conflict stimuli. This feature alone defies common sense and violates fundamental laws of logic. For common sense, to maintain the absurd thesis that RED in red produces conflict – when both components agree, support, and converge on the same single response – is tantamount to leaving the notion of conflict void of meaning. For logic, to discount the structural difference between the Stroop-incongruent stimulus, RED in green, and the non-Stroop stimulus, green apple, means ignoring the basic law of non-contradiction. For RED in green, the possible responses (red, green) cannot both be true (for that ink color), so that the responses are mutually exclusive. By contrast, for a green apple, the possible responses (green for color and apple for shape) can both be true at the same time, so that the responses are not mutually exclusive. In logic, the truth-functionally compound statements (e.g., Copi, 2015 ) that are (or that can be) associated with RED in green and with a green apple are fundamentally different. Again, this difference is ignored in the model. Thus, Botvinick et al. (2001) affirm in their text that on “ incongruent trials … the intersection of … two pathways … causes conflict ” (p. 631, emphasis added), but this tells only part of the story; in their model, congruent trials also generate (less) conflict.

To recap, the Botvinick et al. model holds that Stroop-congruent stimuli, Stroop-incongruent stimuli, non-Stroop stimuli, all produce conflict to a different degree. The difference is merely quantitative. By contrast, common sense, logic, and insights based on a century of Stroop research hold that (1) incongruent stimuli entail conflict, (2) non-Stroop and neutral stimuli lack the quality of conflict (conflict is orthogonal to such stimuli), and (3) congruent stimuli are free of conflict. Although computationally elegant and manageable (and parsimonious), the idea that Stroop-congruent (and non-Stroop) stimuli cause conflict is conceptually untenable.

The tenuous relation in the model between Stroop-congruity and conflict came to the fore in subsequent extensions of the model, which also included errors ( Yeung et al., 2004 , 2011 ; Yeung and Nieuwenhuis, 2009 ). The extended versions each used a different implementation of the model, which, in turn, affected the Congruity-Conflict predictions to the extent that it was questioned “whether a single unified model of conflict monitoring exists” ( Grinband et al., 2011b , p. 321). In the more recent version of Yeung et al. (2011) , “conflict” is conceived as enhanced anterior cingulate activity that can result from a large variety of sources, including sensory noise, attention fluctuation, and response bias – all of which can and often do “dwarf” congruity-related conflict. Maintaining that “conflict” corresponds to any unrelated sensorimotor activity (that affects RT) leads to the absurd idea that “conflict” exists even when detecting a simple one-dimensional signal with a single response option. This “diffuse definition” of conflict (if it is a definition in the first place) “trivializes” the concept of conflict, making it practically useless ( Grinband et al., 2011b , pp. 321–322). In the final analysis, “conflict” in the Yeung et al. (2011) model is basically independent of congruity and is independent of response compatibility (see again, Grinband et al., 2011b ); the notions of congruity and (in)compatibility that first motivated the Botvinick et al. (2001) effort are trivialized in later implementations of the model. As a result, the model is an ill-suited candidate theory of the Stroop effect.

We identify three fundamental problems with the Botvinick et al. (2001) approach (and its various offspring). First, as noted in Grinband et al. (2011b) , conflict monitoring was never tested against the natural null hypothesis that enhanced anterior cingulate activity is associated with task general processes of perception, attention, and memory, rather than with conflict. When tested against this null hypothesis ( Grinband et al., 2011a ), no evidence for involvement of conflict (monitoring) was found beyond the generic effect of task engagement. The second fundamental problem is that the model couples a highly specific and richly developed concept from cognitive psychology to electrophysiological activity in a certain brain region – ignoring throughout the loaded ramifications of the concept within cognitive science and philosophy. Instead, the model (especially in recent implementations) stretches the notion of conflict beyond reasonable limits (the model might well have used “energy” or any other term to replace the increasingly debilitated “conflict”). The third fundamental problem concerns methodology, namely the scientific value and usefulness of the concepts of “conflict” and “control.” In the model, virtually any act of perception and cognition is marked by conflict. Conflict is lurking beneath such quotidian actions as reading familiar words, deciding between independent non-opposing alternatives, or just responding to any stimulus in an unspecified manner. However, if everything is conflict, then conflict becomes an empty, useless concept. A useful scientific definition should specify not only what is included, but also what is excluded.

Finally, inconsistent with the computational model discussed, the majority of Stroop studies subsumed under the control idea do place conflict quite naturally in Stroop-incongruent stimuli. As a rule, Stroop-incongruent trials are defined as “conflict stimuli,” implying that Stroop-congruent stimuli are free of conflict. This binary conception is the dominant and accepted view in large portions of the control literature. The terms “incongruent stimuli” and “conflict stimuli” are used interchangeably in the control literature e.g., (see the titles of Bugg and Smallwood, 2016 , or of Mayr et al., 2003 ). We reiterate, the term “conflicting stimuli” implies non-conflicting stimuli (i.e., congruent or neutral stimuli), and this distinction actually informs much discussion of the Stroop effect in the control literature. Nevertheless, we return to discuss the implications of basic Stroop findings for the continuum conception entailed in the computational model and show that “conflict” and “control” are superfluous to an explanation of the varieties of Stroop effects.

Second , in the “conflict-generated-control” approach, parallel processing or cross-talk is typically tailored to result in interference. However, a cross-talk can also result in facilitation and in a gain to performance ( MacLeod, 1991 ; MacLeod and MacDonald, 2000 ; Roelofs, 2010 ). Again, the prime example in the control literature of cross-talk produced interference is the Stroop effect. However, the Stroop effect is not solely interference; it is also facilitation. Stroop effects attributed to interference may well be those of facilitation. In the absence of partitioning the effect into interference and facilitation, a partition that is rarely done in control studies, one cannot decide the source. Without appropriate measurement, the Stroop effect cannot serve as arbiter of conflict.

Arguably, too, the notion of a Stroop effect produced by facilitation is anathema to the conflict-control approach (e.g., Lindsay and Jacoby, 1994 ; Brown, 2011 ; Eidels, 2012 ). After all, conflict is supposed to generate interference. However, if the same Stroop presentation systematically generates facilitation (rather than conflict and interference), the notion of enhanced control summoned by conflict is called into question.

Third , it is not completely clear where the conflict resides (e.g., Levin and Tzelgov, 2016 ). Does the conflict reside in the stimulus, i.e., impacting early input-driven processing, or does it mainly reside in the response? In the face of a certain level of ambiguity, most discussions and modeling efforts focus on late processing, close to the response. However, this conception can be challenged. Following Garner ( Garner, 1962 , 1970 , 1974 ; Garner and Felfoldy, 1970 ; see also Melara and Algom, 2003 ; Algom and Fitousi, 2016 ), it is eminently possible that the conflict (mainly) resides in the stimulus. The problem is that authors within the control approach ignore the makeup of the stimulus. The perceptual properties of the Stroop stimulus – the physical features of the colors and the fonts used – are neglected. However, these basic perceptual properties can predict whether there will be a Stroop effect to begin with, as well as its direction (standard or reverse). For example, the relative perceptual salience of the presented color and word can determine if there is a Stroop effect, and, if there is, its magnitude ( Garner, 1974 ; Melara and Mounts, 1993 ; Melara and Algom, 2003 ). Presenting Stroop stimuli does not ipso-facto guarantee that there is a Stroop effect! Depending on the perceptual properties of the stimuli, the same Stroop presentation can generate a Stroop effect, a zero Stroop effect, or a reverse Stroop effect (by which colors intrude on word reading more than vice versa; e.g., Pomerantz, 1983 ; Pomerantz and Pristach, 1989 ; Algom et al., 1996 ; Dishon-Berkovits and Algom, 2000 ). The upshot is, stimulus properties can determine the Stroop effect without need to engage any central control mechanism.

Fourth , the makeup of the stimulus is not the only data-driven mechanism governing the Stroop effect. Another data-driven influence on the Stroop effect is the correlation introduced over the experimental trials between the target colors and the task-irrelevant words. Because the Stroop task entails naming the color and because the Stroop effect measures the ability to attend selectively to the color, any color-word correlation introduced compromises exclusive attention to the color. A fair number of control experiments jeopardize the Stroop task by introducing just such a correlation between the relevant ink colors and the irrelevant words. The correlation makes the nominally irrelevant words predictive of the target color, so that attending to the word helps maximizing color performance. Inevitably, exclusive attention to the target colors is compromised. The original Stroop task as a measure of the selectivity of attention is disabled.

In several studies within the control approach (e.g., Bugg and Smallwood, 2016 ; Hutchison et al., 2016 ), the correlation between word and color over the experimental trials was created by the lopsided makeup of the block (for example, of a block of 10 trials, eight were congruent). In this case, the nominally irrelevant word largely predicts the target color. The situation is exacerbated by instructions that augment the actual correlation. For example, the participants are told that the majority (say, 80%) of the next block (of, say, 10 trials) will be congruent. The problem again is that this instruction and the attendant design already create a correlation between the nominally irrelevant words and the relevant colors, which is fatal for the selective attention tested ( Dishon-Berkovits and Algom, 2000 ; Melara and Algom, 2003 ; Schmidt and Besner, 2008 ). Apart from the instructions, virtually all control studies entailed a word-color correlation by presenting (grossly) unequal number of congruent and incongruent stimuli. One must realize that imbalanced presentation of congruent and incongruent stimuli necessarily creates a correlation between the color and word components. Because (1) the Stroop effect measures (the failure of) selective attention to the color and (2) a color-word correlation diverts attention to the irrelevant word, a large Stroop effect is thereby created. Most important, this factor of correlation is stimulus dependent, i.e., it does not invite a central control mechanism to account for the Stroop results. All that is involved is simply the perception of correlation ( Kareev, 1995a , b , 2000 ; Kareev et al., 1997 ).

We note that, in the control approach, providing advance information or biasing the probability of congruent and incongruent stimuli (by grossly imbalanced presentation) is legitimate. In this approach, these procedures are merely a means for generating conflict. What is not recognized though is that this way of generating conflict comes at the expense of compromising the meaning and the serviceability of the original Stroop test (as a tool of measuring selective attention). The manipulation is still called “Stroop,” but, in truth, it has almost nothing to do with the Stroop effect. It is thus hardly surprising that the Stroop effect itself is not calculated or is rendered marginal in a fair number of studies within the control approach (e.g., Hutchison et al., 2016 ; Kleiman et al., 2016 ; see also, Wegner and Erber, 1992 ; Wegner et al., 1993 , on the use of the Stroop task without the calculation of the Stroop effect in “mental control”).

Resolving the Pitfalls within Bona Fide Stroop Research

We proceed by elucidating the problems mentioned, benefiting from the results and insights obtained within Stroop research proper. To anticipate, resolution within genuine Stroop research shows that the notion of control is simply gratuitous as a means for explaining the Stroop phenomenon.

Pitfall 1: General Definition of Conflict and Non-Conflict Stimuli

In the absence of a definition for the basic term, “conflict,” the control approach considers the Stroop stimulus as representative of all multidimensional stimuli. However, all multidimensional stimuli are not also conflict or Stroop stimuli. As we recounted, badly missing is the distinction between Stroop and non-Stroop stimuli. The missing distinction is conductive to the absurd notion that the ink-color response “green” to the word RED in green is comparable to the ink-color response “green” to a triangle in green . The missing distinction similarly leads to the notion that these ink-color responses are on the same foot as categorization responses to the word TABLE. Control theory holds that whenever there are multiple alternative responses to the (multidimensional) stimulus, there is conflict (in need of control). This idea, however, ignores the nature of the relations between the alternatives. The alternatives can be conflicting or matching as they are in Stroop-congruent stimuli (e.g., RED in red) or non-conflicting and non-opposing or simply logically unrelated. Stroop stimuli belong in the first class, but other multidimensional stimuli belong in the second class. Control studies blur the all-important dividing line between Stroop and non-Stroop stimuli.

What is the one property telling Stroop and non-Stroop stimuli apart? The defining feature of all Stroop stimuli is the existence of a logical relationship, compatibility or incompatibility, between their components. Each and every Stroop stimulus falls into one of the mutually exclusive and exhaustive classes of congruent or incongruent combinations. For example, all conceivable combinations of a color word and a print color must result in either a congruent (the word naming its color) or an incongruent (word and color mismatch) stimulus. Precluded is any other type of combination. By contrast, there is no logical conflict between the shape and the color of a green triangle. Again, an adequate theory of the Stroop effect must entail the uniqueness of Stroop stimuli as well as their distinct processing.

A ready example highlighting the last point is the so-called “emotional Stroop effect” (e.g., Algom et al., 2004 , 2009 ). The emotional Stroop effect is the difference in color-naming performance between emotional (e.g., the word DEATH printed in red) and neutral (e.g., the word DOOR printed in red) stimuli. Because the words are not color words, these stimuli lack the logical relationship of conflict or correspondence between their attributes. The word DISEASE printed in blue is neither more nor less congruent than the word LECTURE presented in pink. The stimuli in the emotional Stroop task do not divide into congruent and incongruent combinations. Consequently, the Stroop effect cannot be calculated in studies of the emotional Stroop effect. Given a color-naming task, as in the classic Stroop task, the word BLUE printed in yellow (or in blue) is a Stroop stimulus, but the word CANCER printed in yellow (or in any other color) is not a Stroop stimulus. Conflict resides in the first type of stimuli but not in the second type of stimuli. Note that color naming may nonetheless be slower to CANCER than to TABLE, but that slowdown is not a Stroop effect. Clearly, all differences in performance do not derive from conflict.

Pitfall 2: The Stroop Effect: Conflict and Facilitation

The control approach (as a Stroop theory) fails to account for Stroop facilitation. The standard Stroop experiment includes three types of stimuli: congruent stimuli (e.g., the word RED in red), incongruent stimuli (RED in green), and neutral stimuli (e.g., TABLE in red). The following equation defines the Stroop effect in all experimental designs:

where MRT is the mean reaction time (RT) to name the ink color. The Stroop effect can be partitioned into Stroop interference (SI), so that SI = MRT (incongruent) – MRT (neutral), and Stroop facilitation (SF), so that SF = MRT (neutral) – MRT (congruent). Therefore, the Stroop effect equals the simple algebraic sum of interference and facilitation,

Note that the congruent stimulus “RED in red” does not entail any conflict, yet it is often a major contributor to the Stroop effect. People usually respond “red” to “RED in red” faster than they respond “red” to “TABLE in red”(=SF), and this facilitation enhances the observed Stroop effect. The Stroop effect is not equivalent to interference and conflict. It is also possible that the entire Stroop effect is produced by facilitation (e.g., Eidels et al., 2010 ; Eidels, 2012 ). A recognized theory of the Stroop effect, Tectonic theory ( Melara and Algom, 2003 ), ascribes a major part of the Stroop effect to facilitation (rather than to interference).

It is worth pausing for a moment on the extreme theoretical version developed by Eidels (2012 ; see also Eidels et al., 2010) . Eidels shows that a behavioral Stroop effect can derive from independent processing of the word and the color (i.e., there is an independent horse race between the processing channels). In Eidels’ theory, the color horse does not know the position, speed, or, indeed, the very existence of the word horse. Eidels (2012) uses stochastic modeling based on the following simple idea: For congruent stimuli, both processing channels (word, color) count for the same (correct) response, whereas for incongruent stimuli, only the color channel does. For example, for the congruent stimulus, RED in red, the fastest channel wins the race producing the correct response for the experimenter, regardless if it comes from the color (correctly) or from the word (incorrectly, but undetectably). Again, processing is completely independent. If so, there cannot be interference (or facilitation) simply because there does not exist any cross-talk between the processing channels. The notion of control and conflict is gratuitous in Eidels’ theory.

Ignoring theory, our main point is that merely observing a Stroop effect does not reveal the ingredients of interference and facilitation. Partitioning the effect by including the baseline condition of neutral stimuli is essential for arguing the case of conflict. In this respect, the majority of control studies of the Stroop effect did not include a baseline. Consequently, the Stroop effect cannot serve as a pure assay of conflict and control because the effect entails a significant non-conflict (i.e., facilitation) component. As a result, control cannot serve as a (parsimonious) theory of the Stroop effect.

Pitfall 3: Physical Determinants of the Stroop Effect: The Relative Discriminability of the Words and the Colors

A major determinant of the Stroop effect is the relative salience or discriminability of the different words and ink colors used. When dimensional discriminability is matched, the time and accuracy needed to tell apart the words from one another is the same as the time and accuracy needed to tell apart the ink colors from one another. However, mismatched discriminability favoring words was present in virtually all control studies of the Stroop effect. Without dedicated preparation of the stimulus (not implemented in control studies), it takes participants longer to tell apart the ink colors from one another (e.g., red from green) than the words from one another (e.g., RED from GREEN). The presence of this asymmetry is critical because the more discriminable dimension disrupts performance on the less discriminable dimension ( Sabri et al., 2001 ). Consequently, the task-irrelevant words affect performance with the ink colors (=Stroop effect) not because word reading is the habitual response (which generates conflict), but simply because the words differ perceptually from one another more than do the colors from one another. This factor of relative dimensional salience has been ignored in the control literature with serious consequences for Stroop theory.

To recap, when the words are more salient than the colors (the default Stroop setup in the control literature), the usual Stroop effect appears. However, when the dimensions are made equally discriminable (by presenting appropriately matched values), the Stroop effect collapses. And, when the ink colors are made purposely more salient than the carrier words, a reverse Stroop effect emerges by which the ink colors intrude on word reading. We hasten to add that manipulations of salience entail nothing more than slight adjustment of the fonts (e.g., size, shape) and the colors (intensity, focality); they do not affect legibility or identification. Experimenters were able to produce a Stroop effect and a reverse Stroop effect or to eliminate the effect altogether at will ( Garner and Felfoldy, 1970 ; Pomerantz, 1983 ; Melara and Mounts, 1993 ; Algom et al., 1996 ; Pansky and Algom, 1999 , 2002 ; Sabri et al., 2001 ; Fitousi and Algom, 2006 ; Fitousi et al., 2009 ). A schematic summary of these results is provided in Figure 1 .

www.frontiersin.org

Figure 1 . Schematics of the influence of relative salience on the outcome of the Stroop experiment. (Left-hand panel) The words (W) are more discriminable than the ink colors (C), the default setup in control studies. As a result, the irrelevant words intrude on color naming, thereby generating the Stroop effect. (Middle panel) The word and the colors are matched in discriminability, resulting in the elimination of the Stroop asymmetry in interference favoring words. (Right-hand panel) The colors are more discriminable than the words, so that word reading is now subject to interference from the ink colors more than vice versa (= reverse Stroop effect).

The vital role of relative salience was discovered in a seminal work by Garner and Felfoldy (1970) . More recently, Melara and Algom (2003) culled a sample of 35 published results from the Stroop literature and examined the relation between the Stroop effect , on the one hand, and the difference in baseline salience between word and color, on the other hand. The color Baseline task measures pure color performance: neutral words (e.g., TABLE, STREET, and CLOCK) in different colors are presented for color identification. The word Baseline task measures pure word-reading performance: Color words in uniform black are presented for word identification. Performance in these Baseline tasks can be compared to assess the ease or difficulty of classification along each dimension. Note that the Baseline tasks are non-conflict tasks in which the stimuli are one-dimensional. The Pearson correlation found between the word-color difference at baseline and the Stroop effect amounted to 0.78. This means that well over half of the variance in published values of the Stroop effect derives from mismatched salience between word and color. This relation is illustrated in Figure 2 .

www.frontiersin.org

Figure 2 . The influence of stimulus makeup on the Stroop effect: the larger the baseline word-color difference in salience (favoring word), the larger the Stroop effect.

The effect of relative dimensional salience is evident already in Stroop’s classic study ( Stroop, 1935 ). Stroop’s participants named the colors of 100 squares (pure color condition) in 63.3 s, on average, but read 100 words in black (pure word condition) in 41 s, on average – a staggering 22 s mismatch in task difficulty favoring words. When Stroop combined the two dimensions to produce color-word stimuli, word reading remained almost the same as in the pure word condition (mean of 43 s), but color naming was worse in the combined condition than in the pure color condition (mean of 110 s). The literature focused on this asymmetry in interference rather than on the prior asymmetry in baseline performance. However, given the summary of Figure 1 , it is the latter that produced the former. Stroop’s results thus form a special case of the law by which the more salient dimension intrudes on the less salient dimension more than vice versa.

Implications for the Control Approach

The results obtained with respect to the factor of relative salience are devastating for a control-based explanation of the Stroop effect. Conflict and control are said to depend on the number of conflict stimuli presented, those that produce the Stroop effect. In contrast to this notion, the literature shows that the Stroop effect can differ dramatically even when the number of conflict stimuli is kept constant. The Stroop result depends critically on the input-driven feature of word-color salience – with the same number of conflict stimuli presented. The condition entailing equal discriminability of word and color ( Figure 1 , middle panel) is particularly notable. In this condition, word and color are of equal salience, so that the typical perceptual advantage favoring the word dimension is removed. Despite the presence of a large number of conflict stimuli, the Stroop effect evaporates. In summary, the overall Stroop results mandate a stimulus-driven explanation. When the nominally irrelevant dimension (word) is more salient than the target dimension (color), attention to the color is compromised and expressed as the Stroop effect. However, this result is neither robust nor inevitable ( Dishon-Berkovits and Algom, 2000 ; Melara and Algom, 2003 ). The upshot is that control cannot serve as a viable explanation of the Stroop effect.

Pitfall 4: Color-Word Correlation and Word-Response Contingency Render Central Control Gratuitous

Another major factor affecting the Stroop effect is the number of congruent and incongruent stimuli included in the set. Any imbalance in the respective frequencies introduces a color-word correlation over the experimental presentations. This contextual effect has been attributed to conflict and control. By contrast, we show that the effect is data driven. Let us note that virtually all Stroop studies in the literature entail a biased design in the sense that there is a difference in the frequency of congruent and incongruent stimuli – so that the study entails a color-word correlation. The presence of this correlation renders the nominally irrelevant word predictive of the target ink color. On a trial, first noticing the word provides the participant a greater than chance probability of guessing the to-be reported color. By attending to the irrelevant word, the participant thus maximizes color performance. Because the Stroop effect gauges the influence of the irrelevant word (if there is no such influence, the Stroop effect is zero), a large color-word correlation encourages attention to the word, thereby producing a large Stroop effect. Notably, this large Stroop effect is generated by data-driven correlation, not by central control.

It might come as a surprise to realize that biased designs are used in the vast majority of published Stroop studies. Consider the standard and most popular Stroop design in the literature. Four color words are combined with the corresponding four colors in a factorial design to yield the basic matrix of 16 color-word stimuli (see Figure 3 ). Of these 16 stimuli, four are congruent (in the diagonal of the matrix) and 12 are incongruent (off diagonal). In the face of this asymmetry, investigators typically present an equal number of congruent and incongruent stimuli in the experimental block. The typical block thus includes 36 congruent and 36 incongruent stimuli. Note that this parity is only possible by presenting each congruent stimulus more often the each incongruent stimulus. In the popular design, each congruent stimulus is presented nine times, whereas each incongruent is presented three times to create the matched frequency of 36 presentations. The a priori probability of a color given a word is not equal across all colors, so that the word becomes predictive of the target color. A color-word correlation thus is created in this standard Stroop design.

www.frontiersin.org

Figure 3 . Anatomy of the standard Stroop experiment: Four color words are combined factorially with four ink colors to produce 16 color-words combinations. The entries are frequencies of presentations in 72 trials in the typical “balanced” experiment where trials in the congruent and incongruent conditions occur with equal frequency (36 congruent stimuli and 36 incongruent stimuli). The four combinations on the minor diagonal are congruent stimuli, whereas the 12 off-diagonal combinations are incongruent stimuli. The only way to equate the frequency of congruent and incongruent stimuli in the experimental block – the popular practice – is to present each congruent stimulus more often than each incongruent stimulus (in this case, three times as often). This design creates a correlation over the experimental trials between the nominally irrelevant words and the target ink colors.

In point of fact, biased Stroop designs started with Stroop himself ( Stroop, 1935 ). In his experimental block, Stroop used only incongruent stimuli. None of the color words appeared in its own color. Unwittingly, Stroop introduced a correlation between words and colors in his list. Noticing first that the word was RED, the participant could safely infer that the ink color is not red. A sizable correlation was thus created, which, in turn, generated the large Stroop effect observed (see Figure 4 ).

www.frontiersin.org

Figure 4 . Allocation of colors to words to form the set of color-word stimuli in two experimental situations. The left-hand panel depicts a “negative” correlation, in which only incongruent stimuli are included in the set. This was Stroop’s experimental design in his original study ( Stroop, 1935 ). The negative slope of the regression line illustrates the fact that one dimension is predictive of the other. The right-hand panel depicts a “positive” correlation, in which the conditional probability of a color (word) given a word (color) is greatest for the congruent combinations. This predictive relation is illustrated by the positive slope of the regression line. This relation lurks in the standard most popular Stroop design in the literature.

In an effort to estimate the influence on the Stroop effect of word-color correlation, Melara and Algom (2003) calculated the correlations lurking in the designs of 35 experiments from the literature. They plotted the Stroop effect against the built-in correlation in the design. The results are noteworthy: the correlation between the Stroop effect and the word-color contingency in the design amounted to 0.69. This means that close to 50% of the variability in the published Stroop effects is attributable to the word-color correlation built into the design of the experiment ( Figure 5 ).

www.frontiersin.org

Figure 5 . The relation between the color-word correlation built into the experimental design, usually by unequal presentation of congruent and incongruent stimuli (measured by the contingency coefficient, C) and the Stroop effect. The larger the correlation built into the design, the larger the Stroop effect.

If a built-in correlation exists in most standard Stroop studies, the correlation is even more marked and extreme in control studies. As we just recounted, the standard 50–50% congruency design (with four colors and four color words) already entails an appreciable correlation between the words and the colors. The grossly imbalanced congruency structure created in control studies produces an even larger color-word correlation. The common design in control studies typically entails 80% (in)congruent stimuli, which translates to a sizeable color-word correlation. Perception of this correlation suffices to explain the results.

The upshot is that the notion of fine grain, centrally imposed control is gratuitous when explaining the Stroop effect. When a correlation makes the words predictive of the colors, people attend to the word, so that exclusive attention to the color is compromised – and a large Stroop effect emerges. People are eminently sensitive to correlations between stimuli in their environment, and the Stroop effect is a manifestation of this sensitivity ( Kareev, 2000 ).

Directional Proportion-Congruity (PC) Effects

Proponents of control or conflict point to the directional effects observed in biased designs: the larger the proportion of incongruent stimuli in the set, the smaller the Stroop effect. At first glance, color-word correlation cannot generate this asymmetric outcome (the PC effect). The PC effect is a major source of evidence presented in support of the control and conflict monitoring account of the Stroop effect. On close scrutiny tough, the PC effect results from a correlation between specific words and specific responses in the experiment. In all 2 (word) × 2 (color) designs or in designs in which incongruent stimuli come in a favored color (e.g., the word RED comes mostly in green), the larger the relative number of incongruent stimuli, the larger the correlation between a given word and a given response. This relation is termed the contingency-learning account of Stroop and PC effects ( Schmidt and Besner, 2008 ; Schmidt, 2016a , b , 2019 ; Schmidt et al., 2018 ). The contingency account readily explains the PC effect:

… in the mostly congruent condition, words are presented most often in their congruent color (e.g., RED 75% of the time in red). As such, color words are strongly predictive of the congruent response, which benefits congruent trials. On incongruent trials (e.g., RED in green), however, the word mispredicts the color response, resulting in a cost. The net result is an increased Stroop effect. In the mostly incongruent condition, the situation is reversed. Depending on the exact manipulation, color words might be presented most often in a specific incongruent color (e.g., GREEN most often in red). Thus, words are accurately predictive of the incongruent response, and mispredict a congruent response. The net effect is a reduced congruency Stroop effect. What is most interesting about the contingency learning account of the PC effect is that it is unrelated to conflict, control… [On this account], learning of stimulus–response correspondences is all that matters. ( Schmidt, 2016a , p. 1, emphasis added)

Schmidt’s stimulus-driven account shows that the correlation created in biased Stroop designs between the words and the (color) responses readily explains the PC effects, which are otherwise attributed to conflict and control. Applying Occam’s razor, Schmidt’s account is favored over the central control account. We should mention that in general contingency learning is not related to attention per se. However, it is an important contextual factor within the Stroop domain (after all, Stroop is a test of selective attention). Within the Stroop task, contingency affects the selectivity of attention to the stimulus attributes, hence the magnitude of the Stroop effect observed.

Are Color-Word Correlation and Word-Response Contingency Both Necessary?

The color-word correlation account by Melara and Algom (2003) and the word-response contingency account by Schmidt (2019) explain variations in the magnitude of the Stroop effect without any reference to the notions of control and conflict adaptation. The two accounts actually complement each other. On both views, the Stroop effect is the result of perception of correlation or contingency in the data (see also Lorentz et al., 2016 ). The correlation and contingency accounts rest on a common principle, but a word seems in order to clarify their distinct roles in the Stroop domain.

Contingency learning best explains the PC effects observed in 2 (word) × 2 (color) designs and in multi-valued designs with favorite pairings of incongruent stimuli. Color-word correlation readily explains the Stroop results obtained in the standard 4 (word) × 4 (color) designs that do not include favorite incongruent pairings. This account also explains the appearance of the Stroop effect in so-called balanced designs entailing 50–50% of congruent and incongruent stimuli. In the study by Dishon-Berkovits and Algom (2000) , incongruent stimuli appeared only once under some conditions (so that contingency learning was impossible), yet the authors showed how color-word correlation produced their results in this unusual matrix. In summary, both the correlation and the contingency varieties are useful in accounting for Stroop results. Significantly, they do so without appeal to central control, conflict, or conflict adaptation.

The Gratton Effect

As we recounted at the outset, the Gratton effect ( Gratton et al., 1992 ) or more appropriately, the Congruency Sequence effect ( Schmidt, 2013 , 2019 ; Weissman et al., 2014 ), comprises arguably the strongest piece of evidence marshaled in support of the conflict monitoring account. To reconstruct the chronology, the original finding by Gratton and her colleagues ( Gratton et al., 1992 ) has lain dormant for almost a decade when it was resuscitated and brought to the fore by Botvinick et al. (2001) to support their newly formed theory of central conflict monitoring. Since the publication of the Botvinick et al. model, research on the Gratton effect has intensified appreciably, sustaining a vigorous debate on the source of the effect: genuine on-line conflict monitoring or yet another trial-sequence-based facilitation (e.g., Effler, 1978 ; MacLeod, 1991 ). Given the role of the Gratton effect in deciding the fate of the conflict-monitoring model as a Stroop theory, we devote some space to elucidate the ongoing debate.

The Gratton effect is the sequential variation by which the RT to a Stroop-incongruent stimulus is faster after experiencing another Stroop-incongruent stimulus than after experiencing a Stroop-congruent stimulus (e.g., Mordkoff, 2012 ; Weissman et al., 2014 ; Schmidt, 2019 ). Less attention has been given to the parallel observation that RT to a Stroop-congruent stimulus is usually faster after experiencing another Stroop-congruent stimulus than after experiencing a Stroop-incongruent stimulus (e.g., Mayr et al., 2003 ). This latter observation alone should have cast doubts on the validity of the conflict monitoring model as a Stroop theory. After all, congruent-congruent sequences do not entail (high) conflict, yet these sequences affect Stroop performance to the same extent as do incongruent-incongruent sequences. The possibility that both types of sequences are accounted by factors unrelated to conflict becomes all the more likely. The focus on incongruent-incongruent sequences in the literature comes from the theoretical stress on conflict and its on-line resolution. On that view, the role of fine-grain central control during Stroop performance is to enhance target (color) processing and reduce task-irrelevant (word) processing on a trial-by-trial basis. It is these top-down penetrations that produce the Gratton effect: experiencing conflict instantly triggers control activity, which results in better performance on the immediately following trial.

The Mayr et al. Challenge

Barely a year after the formal development of the central-conflict-monitoring model ( Botvinick et al., 2001 ), Mayr et al. (2003) challenged the ability of the model to provide a valid account of the Gratton effect. In their seminal study, Mayr et al. (2003) pinpointed correctly a central (if implicit at that point) assumption of the conflict monitoring model: The conflict that regulates performance is stimulus-independent . According to the conflict monitoring model, the incongruent-incongruent sequence of RED in green-RED in green (complete repetition) should produce the same adaptation as the incongruent-incongruent sequence of RED in green-BLUE in yellow (complete change). According to conflict monitoring theory, it is the conflict that counts, not the means of generating it. Mayr et al. (2003) have shown in contrast that the Gratton effect is profoundly stimulus dependent.

Mayr et al. (2003) used the flanker task [2(targets) × 2(flankers)], noting that complete repetitions comprise 50% of the incongruent-incongruent sequences in any standard flanker task (as do 50% of the congruent-congruent sequences). They recorded the typical Gratton effect in their experiment. However, when the authors examined their data separately for sequences of complete repetition and sequences entailing change, they found the Gratton effect only for the former. Mayr et al. (2003) concluded that “stimulus specific repetition … can provide a complete explanation of the … pattern observed” (p. 451). The authors then conceived a second flanker experiment where immediate complete repetitions were eliminated altogether and where response repetitions were also eliminated (by presenting the flanker display horizontally or vertically on alternate trials and requiring appropriate left-right or up-down responses). Note that the absence of repetitions is irrelevant for the conflict monitoring account, but it is critical for accounts based on input-driven processes (in particular, on priming of complete repetitions). The latter account predicts that eliminating repetitions should eliminate the Gratton effect. Consistent with this prediction, no Gratton effect was observed in Mayr et al.’s (2003) second experiment.

Mayr et al. (2003) noticed a further feature of the data that was inconsistent with the conflict monitoring account. Although immediate repetitions were avoided in their second experiment, such repetitions could and did occur between trial n −2 and trial n . Stimulus-driven accounts predict that an attenuated Gratton effect should still appear on such trial n −2 to trial n repetitions. The conflict monitoring account, by contrast, lacks a mechanism that allows for adaptation to occur across non-conflicting intermediate trials. The results disconfirmed the central-control model, showing instead the presence of adaptation across non-adjacent repetitions. Mayr et al. (2003) stated in their conclusion that “conflict-triggered control is not necessary to explain the [Gratton] effect” (p. 452), that “regulative demands are bypassed by stimulus-driven repetitions” (p. 452), thereby justifying their title on the presence of the Gratton effect “in the absence of executive control.”

Recent Gratton Research

Mayr et al.’s (2003) formative study heavily impacted Gratton research in the ensuing two decades (see Schmidt, 2019 , for a review of this research). The Mayr et al. (2003) study made it clear that the standard 2 (targets) × 2 (flankers) flanker task is hopelessly biased by stimulus-stimulus and stimulus-response correlations. The same confounds apply to the Simon task ( Simon, 1969; Simon and Berbaum, 1990 ; see also Hatukai and Algom, 2017 ) and to the small-set version [2 (words) × (colors)] of the Stroop task. To remove the biases from the Stroop-, Simon-, and the flanker-task (by far the most popular test used), succeeding investigators applied both of Mayr et al.’s (2003) strategies: statistical and experimental. The first approach allows for stimulus repetitions (complete or of component features) to occur but removes them statistically in subsequent analysis (e.g., Schmidt and De Houwer, 2011 ; see also Mordkoff, 2012 ). In the second approach, stimulus and response repetitions are not presented or allowed in the experiment itself. To exclude repetitions from the experimental design, most researchers employed Mayr et al.’s (2003) alternate horizontal-vertical procedure, often extending the flanker design in time (e.g., Schmidt and Weissman, 2014 ). The overall results obtained (in both approaches) do not support the conflict monitoring account.

Because our goal in this critique is conceptual scrutiny, we next highlight just a few important points (again, see Schmidt, 2019 , for a detailed review of recent research). The goal of studies adopting the second “experimental approach” was to test the presence of the Gratton effect under sterile, confound-free stimulus conditions. If the Gratton effect still emerges under such conditions, the central control account is bestowed powerful support. Consequently, strenuous attempts have been made to purge all species of stimulus- and response-based contingencies from the experiment. Unfortunately, the elimination of the confounds came at the cost of eliminating the flanker task itself, i.e., deforming it in a significant way. The popular tactic has been using Mayr et al.’s (2003) horizontal-vertical alternation and extending the task in time, so that the target display is preceded by an advance cue (e.g., Kunde, and Wühr, 2006 ; Schmidt and Weissman, 2014 ; Weissman et al., 2014 ). However, this tactic likely compromised the nature of the flanker task as an interference design, so that the results obtained probably hinged on the perceived validity of the advance cue. We note in parenthesis that the alternation procedure itself might invite unrelated processes into the experiment (e.g., benefits/costs of switching; see also, Schmidt and De Houwer, 2011 ). It is moot whether the “Gratton effect” observed in such temporal prime-probe tasks is truly comparable with the original effect observed in the standard flanker task. The following Gedanken experiment can clarify this issue, i.e., how the “Gratton effect” can be observed in the absence of conflict or interference.

Suppose that the target display is a shape in color and that the task is to name the color. On different trials, the shape can be a triangle or a circle and its color can be red or green. Suppose further that the display is preceded by a prime, a patch of red or green color. Clearly, a red triangle is not a conflict stimulus, yet a spurious “Gratton effect” may well be observed in this conflict-free task. The prime-probe experiments in the literature, while tightly controlled for stimulus and response confounds, might not comprise a real test of the source of the Gratton effect. The results obtained in the confound-free, prime-probe, and temporal flaker experiments are commensurably mixed and difficult to interpret. Some studies reported the Gratton effect (e.g., Schmidt and Weissman, 2014 ; Weissman et al., 2014 ), but further features of the results are difficult to interpret and are certainly inconsistent with a conflict monitoring account. For example, Weissman et al. (2014) did not find a correlation between the Gratton effect and the flanker effect and have sometimes recorded a negative Graton effect (a larger flanker effect after incongruent-incongruent sequences). Note that a negative Gratton effect is impossible under conflict monitoring.

Considering the Stroop effect itself, methodological problems have been plaguing that research, too. Following the Mayr et al. (2003) study, the 2 (words) × 2 (colors) task is no longer feasible due to the stimulus and response correlations inhering in this design. The popular 4 (words) × 4 (colors) design (see Figure 2 ) obviously is more appropriate, but there exists the problem of the relative number of congruent stimuli. As we shown, the popular 50%–50% congruent-incongruent ratio entails a sizeable correlation, biasing performance ( Dishon-Berkovits and Algom, 2000 ; Melara and Algom, 2003 ; Schmidt and Besner, 2008 ). Only a truly random allocation of the colors to the words can eliminate this bias. Random combinations in a 4 × 4 design entail a rate of 25% congruent stimuli. However, even this regime is open to further biases related to stimulus sequences. Removing all confounds from the Stroop task (if at all possible) remains a daunting task ( Mordkoff, 2012 ; see also Sabri et al., 2001 ; Melara and Algom, 2003 ; Hommel et al., 2004 ; Schmidt and De Houwer, 2011 ). Existing research did not match those exacting standards. For example, Weissman et al. (2014) used four color words and four colors but paired each word with only two of the colors. The study by Mayr and Awh (2009) came close with the authors using a large set of 6 (words) × 6 (colors) and changing the rate of congruent stimuli across separate blocks of the Stroop task. The block with lowest rate included 30% congruent stimuli, a figure which still deviated appreciably from random allocation (the full matrix of 36 color-word combinations includes six congruent stimuli or 17%, not 30%; see also Schmidt and De Houwer, 2011 ). The problems granted, most important for the present concerns is the uniform absence of adaptation or the Gratton effect in the classic Stroop task, a consistent result in studies using either the statistical approach or the experimental approach [we should mention that Duthoo et al. (2014) recorded the Gratton effect in their Stroop tasks, but, again, the control against biases was less than compelling].

We conclude with four final observations. First, the hallmark of modern Gratton research is the stimulus dependence of adaptation. Minor changes in preparation and paradigm can determine the presence or magnitude of the Gratton effect. For example, in prime-probe studies, the spatial location of the prime and the probe (same, different) greatly affects the outcome. In a similar vein, stimulus overlap and response overlap in cross-task Gratton studies are a major determinant of adaptation. These observations violate the basic assumption of the conflict monitoring account on the stimulus-independence of adaptation. Second, another basic (if unarticulated) assumption of conflict monitoring is that adaptation is task-independent . In violation of this assumption, recent research has shown that adaptation is singularly task-dependent. The Gratton effect can be observed in the Simon task but not in the Stroop or in the flanker task using the same design within the same study ( Weissman et al., 2014 ). Conflict adaptation typically does not generalize across tasks. And, when conflict in the Stroop task results in adaptation on the next conflict trial in the Simon task, the transfer is typically explained by shared features and task sources. Third, the observation that congruent-congruent sequences produce the same result as incongruent-incongruent sequences implies that the Gratton effect is not related to conflict. Our fourth and final observation is methodological. Extant Gratton research treats “interference tasks” such as those of Stroop, Simon, and flanker on the same footing. However, all interferences or conflict tasks are not the same ( Chajut et al., 2009 ). Thus, the flanker and Simon tasks entail spatial attention, with targets and distractors separated in space. The Stroop task, by contrast, does not entail spatial attention: The color and the word occupy the same location in space, so that space-based attention to isolate the target is impossible. In the Stroop task, people dissect mentally the stimulus object in order to respond to the task-relevant feature.

On balance, the available evidence with regard to the Stroop or Gratton effect is inconsistent with the theory of centrally guided conflict monitoring account. Instead, it is local, input-driven bottom-up processes that likely generate the Gratton phenomenon (when it is observed). It is important to bear in mind that there is in fact a long history of research on sequential effects in the Stroop task. Dalrymple-Alford and Budayr (1966) may have been the first authors to report such effects more than half of century ago. In subsequent research, a fair number of sequential effects have been documented, some entailing interference and some, like the Gratton effect, facilitation (see MacLeod, 1991 , for a review). Notably, none of the authors associated with the various effects thought it necessary to evoke the heavy machinery of centrally controlled conflict management as an explanatory device. Given the variety of sequential effects identified within basic Stroop research, the reader may well perceive that there is something not altogether satisfactory about the disproportionate exposure and study of a single facilitatory effect. The reason (not justification) for that one-sided research is obvious: the Gratton effect has been imported to a theory and domain, which, at its roots, is foreign to the Stroop effect.

Performance in the Stroop task and the resulting Stroop effect does not seem to involve higher-order cognitive level processes of control, nor does it seem likely that minute top-down penetrations determine responding in the Stroop and allied tasks. The particular theoretical embodiment assuming such trial-by-trial top-down penetrations, the account called conflict monitoring, is not optimally suited to explain the gamut of results obtained over the years in the vast Stroop literature. The conflict monitoring account even does not recognize the existence of major Stroop variables apart from the duo of the PC and Gratton effects (see MacLeod, 1991 and Melara and Algom, 2003 , for reviews of Stroop research). Focusing solely on that pair of effects, most monitoring studies are compromised by the input-based confounds noted. The few confound-free studies that did demonstrate adaptation (most did not) – allegedly supporting central control – ignored alternative input-based explanations, at once more plausible and parsimonious. We believe that the converging evidence provided by the findings reviewed in this article confirms the lawful dependence of the Stroop effect on input factors and seriously challenges centrally controlled conflict monitoring as a valid theory of the Stroop effect. All facets of the effect are explained in a straightforward fashion by input-driven selective attention (indeed, its failure). Concerning the PC and Gratton effects in particular, all that is truly involved is perception of color-word correlation and of word-response contingency.

This much granted, we realize that conflict monitoring modelers (e.g., Yeung et al., 2011 ) may agree with the importance of the factors uncovered in basic Stroop research but maintain that conflict monitoring also plays a role in addition to these factors. This way of reasoning is depicted in Figure 6 . Conflict monitoring theory basically entails that conflict (B) drives control (C) so that they produce the Stroop outcome including notably PC and Gratton effects (D). Monitoring modelers probably have no problems with the link between (A), the basic Stroop variables reviewed in this paper, and (B). At a first glance, the relation between (A) and (B), the primary theme of this review, might be regarded as orthogonal to the validity of the conflict monitoring account. However, the present review makes it eminently clear that one can get directly from (A) to (D), so that (B) and (C) are not needed. In other words, once one is willing to accept the principles learned from basic Stroop research, then conflict monitoring and control are superfluous added assumptions.

www.frontiersin.org

Figure 6 . Possible chain of reasoning accommodating both the basic Stroop findings reviewed in the paper and the conflict monitoring and control account. Briefly, basic Stroop variables (A) drive conflict (B), which, in turn, drives control (C), so that they produce (D) the Stroop outcome, including PC and Gratton effects. The conflict monitoring model basically entails that B and C produce D. However, since it is possible to get directly from A to D, the conflict monitoring model is gratuitous as a Stroop theory.

Of course, there is a trivial sense in which people willfully apply control over what they do and experience. They come to the lab as planned, they choose to perform with their eyes open, and they are in charge of many other perfunctory chores. In the Stroop task itself, people follow quite successfully the instructions to name the colors and ignore (overtly at the least) the words. Indeed, there are task-demand units already included in the computational model of Cohen et al. (1990) . For example, in the study by Bauer and Besner (1997) , the mental set espoused by the observer determined the Stroop outcome with the same stimuli and the same responses. We acknowledge of course these instances of control, but they do not serve (nor are they meant to serve) as a comprehensive theory of the Stroop effect.

Pursuant to the previous point, we also acknowledge that the control and conflict monitoring account include the notion of attention. However, “attention” in this model is a generic process, governed centrally (by a homunculus?), and, like “conflict,” is not rigorously defined. By contrast, attention as studied in the Stroop literature is a well-defined process of selectivity. It is concerned with determining the quality of focusing on the task relevant attribute while ignoring irrelevant information. The whole process is governed by bottom-up contextual factors.

Perhaps, also, there would be something instructive to be gained from the way that proponents of control theory come close to espousing the present view in certain cases. These researchers are just unable to jettison the underdefined concept of control even when clearly unwarranted to make their case. Thus, Julie Bugg, a leading investigator of control, proposed to classify the accounts of Stroop performance into expectation-based and strategically guided accounts versus experience-based and reactive adjustment accounts (e.g., Bugg et al., 2015 ). The latter class is comparable to the present approach, but then the authors hasten to add that “experience-based accounts also subsume conflict-monitoring accounts” ( Bugg et al., 2015 , p. 1350). The same indetermination marks Tom Braver’s influential model, the Dual Mechanisms of Control (DMC; Braver, 2012 ). Braver, a foremost researcher of control, proposes to distinguish between two species of control, “proactive control” and “reactive control.” The former acts strategically through top-down adjustments, whereas the latter acts locally in response to the stimulus that has just occurred. Concerning reactive control, Braver states that “[it] is stimulus driven and transient … is stimulus dependent … [and] is reliant on strong bottom-up … cues” ( Braver, 2012 , p. 108). Remove “control” from Braver’s depiction and you have the view that we are presenting here. The problem we noted is that there does not seem to be any process exempt from control in Braver’s (and in other proponents of control) view (thereby undermining the value of “control” as a useful scientific concept). Retaining “control” in all places and instances may be due to the peculiarity of these investigators’ disposition: associating each trifle mental act with a specific brain structure and activation (Braver, for one, claimed to pinpoint different loci and activation for proactive and reactive control). However, such activations have not been shown to be uniquely linked to a specific act or task, and, in any case, recording activation in brain loci does not ipso facto comprise a theory and explanation.

Our skeptical conclusions agree with those arrived by Schmidt (2019) and by Firestone (2013) and Firestone and Scholl (2016) in the general domain of alleged top-down influences in perception. To echo Firestone (2013) , the deepest shortcoming of central conflict monitoring theory is not the lack of support in most available evidence, but that it is simply the wrong kind of theory for the Stroop effect that it has appropriated from input-driven attention.

Author Contributions

Both authors contributed equally to the manuscript.

Preparation of this paper was supported, in part, by an Israel Science Foundation Grant (ISF-274-15) to DA.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank our two reviewers for their very helpful comments on earlier drafts of this project. In particular, James Schmidt’s expert input was invaluable in improving the present paper. We also thank Hagar Cohen for her generous assistance with all phases of the work on this project.

Abrahamse, E., Braem, S., Notebaert, W., and Verguts, T. (2016). Grounding cognitive control in associative learning. Psychol. Bull. 142, 693–728. doi: 10.1037/bul0000047

CrossRef Full Text | Google Scholar

Algom, D., Chajut, E., and Lev, S. (2004). A rational look at the emotional Stroop phenomenon: a generic slowdown, not a Stroop effect. J. Exp. Psychol. Gen. 133, 323–338. doi: 10.1037/0096-3445.133.3.323

PubMed Abstract | CrossRef Full Text | Google Scholar

Algom, D., Dekel, A., and Pansky, A. (1996). The perception of number from the separability of the stimulus: the Stroop effect revisited. Mem. Cogn. 24, 557–572. doi: 10.3758/BF03201083

Algom, D., and Fitousi, D. (2016). Half a century of research on Garner interference and the separability–integrality distinction. Psychol. Bull. 142, 1352–1383. doi: 10.1037/bul0000072

Algom, D., Zakay, D., Monar, O., and Chajut, E. (2009). Wheel chairs and arm chairs: a novel experimental design for the emotional Stroop effect. Cognit. Emot. 23, 1552–1564. doi: 10.1080/02699930802490243

Bauer, B., and Besner, D. (1997). Processing in the Stroop task: mental set as a determinant of performance. Can. J. Exp. Psychol. 51, 61–68.

PubMed Abstract | Google Scholar

Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., and Cohen, J. D. (2001). Conflict monitoring and cognitive control. Psychol. Rev. 108, 624–652.

Google Scholar

Botvinick, M. M., Cohen, J. D., and Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn. Sci. 8, 539–546. doi: 10.1016/j.tics.2004.10.003

Botvinick, M. M., Nystrom, L. E., Fissell, K., Carter, C. S., and Cohen, J. D. (1999). Conflict monitoring versus selection-for-action in anterior cingulate cortex. Nature 402, 179–181. doi: 10.1038/46035

Braver, T. S. (2012). The variable nature of cognitive control: a dual mechanisms framework. Trends Cogn. Sci. 16, 106–113. doi: 10.1016/j.tics.2011.12.010

Brown, T. L. (2011). The relationship between Stroop interference and facilitation effects: statistical artifacts, baselines, and a reassessment. J. Exp. Psychol. Hum. Percept. Perform. 37, 85–99. doi: 10.1037/a0019252

Bugg, J. M. (2014). Conflict-triggered top-down control: default mode, last resort, or no such thing? J. Exp. Psychol. Learn. Mem. Cogn. 40, 567–587. doi: 10.1037/a0035032

Bugg, J. M., and Chanani, S. (2011). List-wide control is not entirely elusive: evidence from picture–word Stroop. Psychon. Bull. Rev. 18, 930–936. doi: 10.3758/s13423-011-0112-y

Bugg, J. M., and Crump, M. J. C. (2012). In support of a distinction between voluntary and stimulus-driven control: a review of the literature on proportion congruent effects. Front. Psychol. 3:367.

Bugg, J. M., Diede, N. T., Cohen-Shikora, E. R., and Selmeczy, D. (2015). Expectations and experience: dissociable bases for cognitive control? J. Exp. Psychol. Learn. Mem. Cogn. 41, 1349–1373. doi: 10.1037/xlm0000106

Bugg, J. M., and Hutchison, K. A. (2013). Converging evidence for control of color-word Stroop interference at the item level. J. Exp. Psychol. Hum. Percept. Perform. 39, 433–449. doi: 10.1037/a0029145

Bugg, J. M., and Smallwood, A. (2016). The next trial will be conflicting! Effects of explicit congruency pre-cues on cognitive control. Psychol. Res. 80, 16–33. doi: 10.1007/s00426-014-0638-5

Carter, C. S., Braver, T., Barch, D. M., Botvinick, M., Noll, D., and Cohen, J. D. (1998). Anterior cingulate, error detection and performance monitoring: an event related fMRI study. J. Cogn. Neurosci. 10, 107–107.

Chajut, E., Schupak, A., and Algom, D. (2009). Are spatial and dimensional attention separate? Evidence from Posner, Stroop, and Eriksen tasks. Mem. Cogn. 37, 924–934. doi: 10.3758/MC.37.6.924

Cohen, J. D., Dunbar, K., and McClelleand, J. L. (1990). On the control of automatic processes: a parallel distributed processing account of the Stroop effect. Psychol. Rev. 97, 332–361.

Cohen, J. D., and Huston, T. A. (1994). “Progress in the use of interactive models for understanding attention and performance” in Attention and Performance . Vol. XV, eds. C. Umilta and M. Moscovitch (Cambridge, MA: MIT Press), 453–456.

Cohen-Shikora, E. R., Suh, J., and Bugg, J. M. (in press). Assessing the temporal learning account of the list-wide proportion congruence effect. J. Exp. Psychol. Learn. Mem. Cogn. doi: 10.1037/xlm0000670

Copi, I. M. (2015). Symbolic logic . 5th Edn. Indiana: Pearson.

Dalrymple-Alford, E. C., and Budayr, B. (1966). Examination of some aspects of the Stroop Color-Word test. Percept. Mot. Skills 23, 1211–1214.

Dishon-Berkovits, M., and Algom, D. (2000). The Stroop effect: it is not the robust phenomenon that you have thought it to be. Mem. Cogn. 28, 1437–1449. doi: 10.3758/BF03211844

Duthoo, W., Abrahamse, E. L., Braem, S., Boehler, C. N., and Notebaert, W. (2014). The heterogeneous world of congruency sequence effects: an update. Front. Psychol. 5:1001. doi: 10.3389/fpsyg.2014.01001

Effler, M. (1978). The influence of similarity in names of Stroop items. Arch. Psychol. 131, 21–37. (From Psychological Abstracts, 1981, 65, Abstract No. 11873).

Egner, T. (2008). Multiple conflict-driven control mechanisms in the human brain. Trends Cogn. Sci. 12, 374–380. doi: 10.1016/j.tics.2008.07.001

Egner, T. (2014). Creatures of habit (and control): a multi-level learning perspective on the modulation of congruency effects. Front. Psychol. 5:1247. doi: 10.3389/fpsyg.2014.01247

Eidels, A. (2012). Independent race of color and word can predict the Stroop effect. Aust. J. Psychol. 64, 189–198. doi: 10.1111/j.1742-9536.2012.00052.x

Eidels, A., Townsend, J. T., and Algom, D. (2010). Comparing perception of Stroop stimuli in focused versus divided attention: evidence for dramatic processing differences. Cognition 114, 129–150. doi: 10.1016/j.cognition.2009.08.008

Fan, J., McCandliss, B. D., Sommer, T., Raz, A., and Posner, M. I. (2002). Testing the efficiency and independence of attentional networks. J. Cogn. Neurosci. 14, 340–347. doi: 10.1162/089892902317361886

Firestone, C. (2013). How “paternalistic” is spatial perception? Why wearing a heavy backpack doesn’t—and couldn’t—make hills look steeper. Perspect. Psychol. Sci. 8, 455–473. doi: 10.1177/1745691613489835

Firestone, C., and Scholl, B. J. (2016). Cognition does not affect perception: evaluating the evidence for “top-down” effects. Behav. Brain Sci. 39:e229. doi: 10.1017/S0140525X15000965

Fitousi, D., and Algom, D. (2006). Size congruity effects with two-digit numbers: expanding the number line? Mem. Cogn. 34, 445–457. doi: 10.3758/BF03193421

Fitousi, D., Shaki, S., and Algom, D. (2009). The role of parity, physical size, and magnitude in numerical cognition: the SNARC effect revisited. Atten. Percept. Psychophys. 71, 143–155. doi: 10.3758/APP.71.1.143

Flowers, J. H., Warner, J. L., and Polansky, M. L. (1979). Response and encoding factors in “ignoring” irrelevant information. Mem. Cogn. 7, 86–94. doi: 10.3758/BF03197589

Garner, W. R. (1962). Uncertainty and structure as psychological concepts . New York, NY: Wiley.

Garner, W. R. (1970). The stimulus in information processing. Am. Psychol. 25, 350–358. doi: 10.1037/h0029456

Garner, W. R. (1974). The processing of information and structure . Oxford: Erlbaum.

Garner, W. R., and Felfoldy, G. L. (1970). Integrality of stimulus dimensions in various types of information processing. Cogn. Psychol. 1, 225–241. doi: 10.1016/0010-0285(70)90016-2

Glaser, W. R., and Glaser, M. O. (1989). Context effects in Stroop-like word and picture processing. J. Exp. Psychol. Gen. 118, 13–42. doi: 10.1037/0096-3445.118.1.13

Gratton, G., Coles, M. G. H., and Donchin, E. (1992). Optimizing the use of information strategic control of activation of responses. J. Exp. Psychol. Gen. 121, 480–506. doi: 10.1037/0096-3445.121.4.480

Grinband, J., Savitskaya, J., Wager, T. D., Teichert, T., Ferrera, V. P., and Hirsch, J. (2011a). Conflict, error likelihood, and RT: response to Brown & Yeung et al. NeuroImage 57, 320–322. doi: 10.1016/j.neuroimage.2011.04.027

Grinband, J., Savitskaya, J., Wager, T. D., Teichert, T., Ferrera, V. P., and Hirsch, J. (2011b). The dorsal medial frontal cortex is sensitive to time on task, not response conflict or error likelihood. NeuroImage 57, 303–311. doi: 10.1016/j.neuroimage.2010.12.027

Hatukai, T., and Algom, D. (2017). The Stroop incongruity effect: congruity relationship reaches beyond the Stroop task. J. Exp. Psychol. Hum. Percept. Perform. 43, 1098–1114. doi: 10.1037/xhp0000381

Hommel, B., Proctor, R. W., and Vu, K. P. L. (2004). A feature-integration account of sequential effects in the Simon task. Psychol. Res. 68, 1–17. doi: 10.1007/s00426-003-0132-y

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554–2558.

Hutchison, K. A., Bugg, J. M., Lim, Y. B., and Olsen, M. R. (2016). Congruency precues moderate item-specific proportion congruency effects. Atten. Percept. Psychophys. 78, 1087–1103. doi: 10.3758/s13414-016-1066-y

Jensen, A. R., and Rohwer, W. D. (1966). The Stroop Color-Word test: a review. Acta Psychol. 25, 36–93. doi: 10.1016/0001-6918(66)90004-7

Kareev, Y. (1995a). Through a narrow window: working memory capacity and the detection of covariation. Cognition 56, 263–269.

Kareev, Y. (1995b). Positive bias in the perception of covariation. Psychol. Rev. 102, 490–502.

Kareev, Y. (2000). Seven (indeed, plus or minus two) and the detection of correlations. Psychol. Rev. 107, 397–402. doi: 10.1037/0033-295X.107.2.397

Kareev, Y., Lieberman, I., and Lev, M. (1997). Through a narrow window: sample size and the perception of correlation. J. Exp. Psychol. Gen. 126, 278–287. doi: 10.1037/0096-3445.126.3.278

Kerns, J. G., Cohen, J. D., MacDonald, A. W., Cho, R. Y., Stenger, V. A., and Carter, C. S. (2004). Anterior cingulate conflict monitoring and adjustments in control. Science 303, 1023–1026. doi: 10.1126/science.1089910

Kleiman, T., Trope, Y., and Amodio, D. M. (2016). Cognitive control modulates attention to food cues: support for the control readiness model of selfcontrol. Brain Cogn. 110, 94–101. doi: 10.1016/j.bandc.2016.04.006

Klein, G. S. (1964). Semantic power measured through the interference of words with color-naming. Am. J. Psychol. 77, 576–588. doi: 10.2307/1420768

Kunde, W., and Wühr, P. (2006). Sequential modulations of correspondence effects across spatial dimensions and tasks. Mem. Cogn. 34, 356–367. doi: 10.3758/BF03193413

Levin, Y., and Tzelgov, J. (2016). Contingency learning is not affected by conflict experience: Evidence from a task conflict-free, item-specific Stroop paradigm. Acta Psychol. 164, 39–45. doi: 10.1016/j.actpsy.2015.12.009

Lindsay, D. S., and Jacoby, L. L. (1994). Stroop process dissociations: the relationship between facilitation and interference. J. Exp. Psychol. Hum. Percept. Perform. 20, 219–234. doi: 10.1037/0096-1523.20.2.219

Lorentz, E., McKibben, T., Ekstrand, C., Gould, L., Anton, K., and Borowsky, R. (2016). Disentangling genuine semantic Stroop effects in reading from contingency effects: on the need for two neutral baselines. Frontiers . doi: 10.3389/fpsyg.2016.00386

MacDonald, A. W., Cohen, J. D., Stenger, V. A., and Carter, C. S. (2000). Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science 288, 1835–1838. doi: 10.1126/science.288.5472.1835

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: an integrative review. Psychol. Bull. 109, 163–203. doi: 10.1037/0033-2909.109.2.163

MacLeod, C. M. (1992). The Stroop task: the “gold standard” of attentional measures. J. Exp. Psychol. Gen. 121, 12–14. doi: 10.1037/0096-3445.121.1.12

MacLeod, C. M., and MacDonald, P. A. (2000). Interdimensional interference in the Stroop effect: uncovering the cognitive and neural anatomy of attention. Trends Cogn. Sci. 4, 383–391. doi: 10.1016/S1364-6613(00)01530-8

Mayr, U., and Awh, E. (2009). The elusive link between conflict and conflict adaptation. Psychol. Res. 73, 794–802. doi: 10.1007/s00426-008-0191-1

Mayr, U., Awh, E., and Laurey, P. (2003). Conflict adaptation effects in the absence of executive control. Nat. Neurosci. 6, 450–452. doi: 10.1038/nn1051

McClain, L. (1983a). Effects of response type and set size on Stroop color-word performance. Percept. Mot. Skills 56, 735–743.

McClain, L. (1983b). Stimulus–response compatibility affects auditory Stroop interference. Percept. Psychophys. 33, 266–270.

Melara, R. D., and Algom, D. (2003). Driven by information: a tectonic theory of Stroop effects. Psychol. Rev. 110, 422–471. doi: 10.1037/0033-295X.110.3.422

Melara, R. D., and Mounts, J. R. W. (1993). Selective attention to Stroop dimensions: effects of baseline discriminability, response mode, and practice. Mem. Cogn. 21, 627–645. doi: 10.3758/BF03197195

Miller, E. K., and Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202.

Mordkoff, J. T. (2012). Observation: three reasons to avoid having half of the trials be congruent in a four-alternative forced-choice experiment on sequential modulation. Psychon. Bull. Rev. 19, 750–757. doi: 10.3758/s13423-012-0257-3

Pansky, A., and Algom, D. (1999). Stroop and Garner effects in comparative judgment of numerals: the role of attention. J. Exp. Psychol. Hum. Percept. Perform. 25, 39–58.

Pansky, A., and Algom, D. (2002). Comparative judgment of numerosity and numerical magnitude: attention preempts automaticity. J. Exp. Psychol. Learn. Mem. Cogn. 28, 259–274. doi: 10.1037/0278-7393.28.2.259

Petersen, S. E., and Posner, M. I. (2012). The attention system of the human brain: 20 years after. Annu. Rev. Neurosci. 35, 73–89. doi: 10.1146/annurev-neuro-062111-150525

Pomerantz, J. R. (1983). Global and local precedence: Selective attention in form and motion perception. J. Exp. Psychol. Gen. 112, 516–540. doi: 10.1037/0096-3445.112.4.516

Pomerantz, J. R., and Pristach, E. A. (1989). Emergent features, attention, and perceptual glue in visual form perception. J. Exp. Psychol. Hum. Percept. Perform. 15, 635–649. doi: 10.1037/0096-1523.15.4.635

Posner, M. I., and Petersen, S. E. (1990). The attention system of the humanbrain. Annu. Rev. Neurosci. 13, 25–42. doi: 10.1146/annurev.ne.13.030190.000325

Posner, M. I., and Raichle, M. E. (1994). Images of mind . New York: Scientific Researcher.

Roelofs, A. (2010). Attention and facilitation: converging information versus inadvertent reading in Stroop task performance. J. Exp. Psychol. Learn. Mem. Cogn. 36, 411–422. doi: 10.1037/a0018523

Sabri, M., Melara, R. D., and Algom, D. (2001). A confluence of contexts: asymmetric versus global failure of selective attention to Stroop dimensions. J. Exp. Psychol. Hum. Percept. Perform. 27, 515–537. doi: 10.1037/0096-1523.27.3.515

Schmidt, J. R. (2013). Questioning conflict adaptation: proportion congruent and Gratton effects reconsidered. Psychon. Bull. Rev. 20, 615–630. doi: 10.3758/s13423-012-0373-0

Schmidt, J. R. (2016a). Context-specific proportion congruent effects: an episodic learning account and computational model. Front. Psychol. 7:1806. doi: 10.3389/fpsyg.2016.01806

Schmidt, J. R. (2016b). Proportion congruency and practice: a contingency learning account of asymmetric list shifting effects. J. Exp. Psychol. Learn. Mem. Cogn. 42, 1496–1505. doi: 10.1037/xlm0000254

Schmidt, J. R. (2019). Evidence against conflict monitoring and adaptation: an updated review. Psychon. Bull. Rev. 26, 753–771. doi: 10.3758/s13423-018-1520-z

Schmidt, J. R., Augustinova, M., and De Houwer, J. (2018). Category learning in the colour-word contingency learning paradigm. Psychon. Bull. Rev. 25, 658–666. doi: 10.3758/s13423-018-1430-0

Schmidt, J. R., and Besner, D. (2008). The Stroop effect: why proportion congruent has nothing to do with congruency and everything to do with contingency. J. Exp. Psychol. Learn. Mem. Cogn. 34, 514–523. doi: 10.1037/0278-7393.34.3.514

Schmidt, J. R., and De Houwer, J. (2011). Now you see it, now you don’t: controlling for contingencies and stimulus repetitions eliminates the Gratton effect. Acta Psychol. 138, 176–186. doi: 10.1016/j.actpsy.2011.06.002

Schmidt, J. R., Notebaert, W., and Van den Bussche, E. (2015). Is conflict adaptation an illusion? Front. Psychol. 6:172. doi: 10.3389/fpsyg.2015.00172

Schmidt, J. R., and Weissman, D. H. (2014). Congruency sequence effect without feature integration or contingency learning confounds. PLoS One 9:e0102337. doi: 10.1371/journal.pone.0102337

Simon, J. R. (1969). Reactions toward the source of stimulation. J. Exp. Psychol. 81, 174–176.

Simon, J. R., and Berbaum, K. (1990). Effect of conflicting cues on information processing: the ‘Stroop effect’ vs. the ‘Simon effect’. Acta Psychol. 73, 159–170. doi: 10.1016/0001-6918(90)90077-S

Steinhauser, M., and Hiibner, R. (2009). Distinguishing response conflict and task conflict in the Stroop task: evidence from ex-Gaussian distribution analysis. J. Exp. Psychol. Hum. Percept. Perform. 35, 1398–1412. doi: 10.1037/a0016467

Stevens, S. S. (1951). Handbook of experimental psychology . New York, NY: Wiley.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. J. Exp. Psychol. 18, 643–662. doi: 10.1037/h0054651

Wegner, D. M., and Erber, R. (1992). The hyperaccessibility of suppressed thoughts. J. Pers. Soc. Psychol. 63, 903–912. doi: 10.1037/0022-3514.63.6.903

Wegner, D. M., Erber, R., and Zanakos, S. (1993). Ironic processes in the mental control of mood and mood-related thought. J. Pers. Soc. Psychol. 65, 1093–1104. doi: 10.1037/0022-3514.65.6.1093

Weissman, D. H., Jiang, J. F., and Egner, T. (2014). Determinants of congruency sequence effects without learning and memory confounds. J. Exp. Psychol. Hum. Percept. Perform. 40, 2022–2037. doi: 10.1037/a0037454

Yeung, N., Botvinick, M. M., and Cohen, J. D. (2004). The neural basis of error detection: conflict monitoring and the error-related negativity. Psychol. Rev. 111, 931–959. doi: 10.1037/0033-295X.111.4.931

Yeung, N., Cohen, J. D., and Botvinick, M. M. (2011). Errors of interpretation and modeling: a response to Grinband et al. NeuroImage 57, 316–319. doi: 10.1016/j.neuroimage.2011.04.029

Yeung, N., and Nieuwenhuis, S. (2009). Dissociating response conflict and error likelihood in the anterior cingulate cortext. J. Neurosci. 29, 14506–14510. doi: 10.1523/JNEUROSCI.3615-09.2009

Keywords: Stroop, control, conflict, salience, congruity, contingency

Citation: Algom D and Chajut E (2019) Reclaiming the Stroop Effect Back From Control to Input-Driven Attention and Perception. Front. Psychol . 10:1683. doi: 10.3389/fpsyg.2019.01683

Received: 08 May 2019; Accepted: 03 July 2019; Published: 02 August 2019.

Reviewed by:

Copyright © 2019 Algom and Chajut. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Daniel Algom, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Stroop Effect Experiment in Psychology

Charlotte Ruhl

Research Assistant & Psychology Graduate

BA (Hons) Psychology, Harvard University

Charlotte Ruhl, a psychology graduate from Harvard College, boasts over six years of research experience in clinical and social psychology. During her tenure at Harvard, she contributed to the Decision Science Lab, administering numerous studies in behavioral economics and social psychology.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

The Stroop effect is a psychological phenomenon demonstrating interference in reaction time of a task. It occurs when the name of a color is printed in a color not denoted by the name, making it difficult for participants to identify the color of the word quickly and accurately.

Take-home Messages

  • In psychology, the Stroop effect is the delay in reaction time between automatic and controlled processing of information, in which the names of words interfere with the ability to name the color of ink used to print the words.
  • The Stroop test requires individuals to view a list of words printed in a different color than the word’s meaning. Participants are tasked with naming the color of the word, not the word itself, as fast as they can.
  • For example, when presented with the word “green” written in red ink, it is much easier to name the word that is spelled instead of the color ink in which the word is written.
  • The interference, or the delay in response time, is measured by comparing results from the conflict condition (word and color mismatch) to a neutral condition (e.g., a block of color or a color word with matching ink). Subtracting the results from these two conditions helps to eliminate the influence of general motor responses.
  • Reading, a more powerful automatic process, takes some precedence over color naming, which requires higher cognitive demands.
  • Since psychologist John Ridley Stroop first developed this paradigm back in 1935, the Stroop task has since been modified to help understand additional brain mechanisms and expanded to aid in brain damage and psychopathology research.

stroop test

What Is The Stroop Effect?

The Stroop effect refers to a delay in reaction times between congruent and incongruent stimuli (MacLeod, 1991).

Congruency, or agreement, occurs when a word’s meaning and font color are the same. For example, if the word “green” is printed in green.

Incongruent stimuli are just the opposite. That is the word’s meaning and the color in which it is written do not align. For example, the word “green” might be printed in red ink.

The Stroop task asks individuals to name the color of the word instead of reading the word itself.

stroop effect experiment

The delay in reaction time reveals that it is much harder to name the color of a word when the word itself spells another color (the incongruent stimuli) than it is to name the color of the word when the word itself spells that same color (the congruent stimuli).

The First Stroop Experiment

The Stroop effect was first published in 1935 by American psychologist John Ridley Stroop, although discoveries of this phenomenon date back to the nineteenth century (Stroop, 1935).

Building off previous research, Stroop had two main aims in his groundbreaking paper:

  • To examine how incongruency between the color of the word and the word’s content will impair the ability to name the color.
  • To measure what effect practicing reacting to color stimuli in the presence of conflicting word stimuli would have upon the reaction times.

To empirically study these two major aims, Stroop ran three different experiments:

1) Experiment 1 :

Participants (70 college undergraduates) were tasked with reading the word aloud, irrespective of its color. In other words, participants must read aloud the word “green” even if written in a different color.

2) Experiment 2 :

The second experiment was the opposite of the first. Participants (100 college students) were first asked to name the color of individual squares (instead of the color of words) as a training mechanism for the subsequent task. Afterward, participants had to say the color of the word, regardless of its meaning – the opposite of the experiment 1 procedure.

3) Experiment 3 :

The third and final experiment integrated all of the previously mentioned tests with an undergraduate population of 32 participants.

The independent variable (IV) was the congruency of the font name and color.

  • Congruent (word name and font color are the same)
  • Incongruent (word name and font color are different)

The dependent variable (DV) was reaction time (ms) in reporting the letter color.

After running the three experiments, Stroop drew two main conclusions:

  • The interference of conflicting word stimuli upon the time for naming colors caused an increase of 47.0 seconds or 74.3 percent of the normal time for naming colors printed in just squares.
  • The interference of conflicting color stimuli upon the time for reading words caused an increase of only 2.3 seconds or 5.6 percent over the normal time for reading the same words printed in black.

These tests demonstrate a disparity in the speed of naming colors and reading the names of colors, which may be explained by a difference in training in the two activities.

The word stimulus has been associated with the specific response “to read,” while the color stimulus has been associated with various responses: “to admire,” “to name,” etc.

The observed results might reflect the fact that people have more experience consciously reading words than consciously labeling colors, illustrating a difference in the mechanisms that control these two processes.

How the Stroop Effect Works

Why does the Stroop effect occur? We can tell our brain to do lots of things – store memories, sleep, think, etc. – so why can’t we tell it to do something as easy as naming a color? Isn’t that something we learn to do at a very young age?

Researchers have analyzed this question and come up with multiple different theories that seek to explain the occurrence of the Stroop effect (Sahinoglu & Dogan, 2016).

Speed of processing theory:

The processing speed theory claims that people can read words much faster than they can name colors (i.e., word processing is much faster than color processing).

When we look at the incongruent stimuli (the word “green” printed in red, for example), our brain first reads the word, making it much more difficult to then have to name the color.

As a result, a delay occurs when trying to name the color because doing so is not our brain’s first instinct (McMahon, 2013).

Selective attention theory:

The theory of selective attention holds that recognizing colors, compared to reading words, requires more attention.

Because of this, the brain needs to use more attention when attempting to name a color, making this process take slightly longer (McMahon, 2013).

Automaticity:

A prevalent explanation for the Stroop effect is the automatic nature of reading. When we see a word, its meaning is almost instantly recognized. Thus, when presented with a conflicting color, there’s interference between the automatic reading process and the task of naming the ink color.

This theory argues that recognizing colors is not an automatic process , and thus there is a slight hesitancy when carrying out this action.

Automatic processing is processed in the mind that is relatively fast and requires few cognitive resources.

This type of information processing generally occurs outside of conscious awareness and is common when undertaking familiar and highly practiced tasks.

However, the brain is able to automatically understand the meaning of a word as a result of habitual reading (think back to Stroop’s initial study in 1935 – this theory explains why he wanted to test the effects of practice on the ability to name colors).

Word reading, being more automatic and faster than color naming, results in involuntary intrusions during the color-naming task. Conversely, reading isn’t affected by the conflicting print color.

Researchers in support of this theory posit that automatic reading does not need controlled attention but still uses enough of the brain’s attentional resources to reduce the amount left for color processing (Monahan, 2001).

In a way, this parallels the brain’s dueling modes of thinking – that of “System 1” and “System 2.” Whereas the former is more automatic and instinctive, the latter is slower and more controlled (Kahneman, 2011).

This is similar to the Stroop effect, in which we see a more automatic process trying to dominate over a more deliberative one. The interference occurs when we try to use System 2 to override System 1, thus producing that delay in reaction time.

Parallel distributed processing:

The fourth and final theory proposes that unique pathways are developed when the brain completes different tasks. Some of these pathways, such as reading words, are stronger than others, such as naming colors (Cohen et al., 1990).

Thus, interference is not an issue of processing speed, attention, or automaticity but rather a battle between the stronger and weaker neural pathways.

Additional Research

John Ridley Stroop helped lay the groundwork for future research in this field.

Numerous studies have tried to identify the specific brain regions responsible for this phenomenon, identifying two key regions: the anterior cingulate cortex (ACC) and dorsolateral prefrontal cortex (DLFPC).

Both MRI and fMRI scans show activity in the ACC and DLPFC while completing the Stroop test or related tasks (Milham et al., 2003).

The DLPFC assists with memory and executive functioning, and its role during the task are to activate color perception and inhibit word encoding. The ACC is responsible for selecting the appropriate response and properly allocating attentional resources (Banich et al., 2000).

Countless studies that repeatedly test the Stroop effect reveal a few key recurring findings (van Maanen et al., 2009):
  • Semantic interference : Naming the ink color of neutral stimuli (where the color is only shown in blocks, not as a written word) is faster than incongruent stimuli (where the word differs from its printed color).
  • Semantic facilitation : Naming the ink of congruent stimuli (where the word and its printed color are in agreement) is faster than for neutral stimuli.
  • Stroop asynchrony : The previous two findings disappear when reading the word, not naming the color, is the task at hand – supporting the claim that it is much more automatic to read words than to name colors.
Other experiments have slightly modified the original Stroop test paradigm to provide additional findings.

One study found that participants were slower to name the color of emotion words as opposed to neutral words (Larsen et al., 2006).

Another experiment examined the differences between participants with panic disorder and OCD. Even with using threat words as stimuli, they found that there was no difference among panic disorder, OCD, and neutral participants’ ability to process colors (Kampman et al., 2002).

A third experiment investigated the relationship between duration and numerosity processing instead of word and color processing.

Participants were shown two series of dots in succession and asked either (1) which series contained more dots or (2) which series lasted longer from the appearance of the first to the last dots of the series.

The incongruency occurred when fewer dots were shown on the screen for longer, and a congruent series was marked by a series with more dots that lasted longer.

The researchers found that numerical cues interfered with duration processing. That is, when fewer dots were shown for longer, it was harder for participants to figure out which set of dots appeared on the screen for longer (Dormal et al., 2006).

Thus, there is a difference between the processing of numerosity and duration. Together, these experiments illustrate not only all of the doors of research that Stroop’s initial work opened but also shed light on all of the intricate processing associations that occur in our brains.

Other Uses and Versions

The purpose of the Stroop task is to measure interference that occurs in the brain. The initial paradigm has since been adopted in several different ways to measure other forms of interference (such as duration and numerosity, as mentioned earlier).

Additional variations measure interference between picture and word processing, direction and word processing, digit and numerosity processing, and central vs. peripheral letter identification (MacLeod, 2015).

The below figure provides illustrations for these four variations:

stroop picture word  experiment

The Stroop task is also used as a mechanism for measuring selective attention, processing speed, and cognitive flexibility (Howieson et al., 2004).

The Stroop task has also been utilized to study populations with brain damage or mental disorders, such as dementia, depression, or ADHD (Lansbergen et al., 2007; Spreen & Strauss, 1998).

For individuals with depression, an emotional Stroop task (where negative words, such as “grief,” “violence,” and “pain,” are used in conjunction with more neutral words, such as “clock,” “door,” and “shoe”) has been developed.

Research reveals that individuals who struggle with depression are more likely to say the color of a negative word slower than that of a neutral word (Frings et al., 2010).

The versatility of the Stroop task paradigm lends itself to be useful in a wide variety of fields within psychology. What was once a test that only examined the relationship between word and color processing has since been expanded to investigate additional processing interferences and to contribute to the fields of psychopathology and brain damage.

The development of the Stroop task not only provides novel insights into the ways in which our brain mechanisms operate but also sheds light on the power of psychology to expand and build on past research methods as we continue to uncover more and more about ourselves.

Critical Evaluation

Dishon-Berkovits and Algom (2000) argue that the Stroop effect is not a result of automatic processes but is due to incidental correlations between the word and its color across stimuli.

They suggest that participants unconsciously recognize these correlations, using word cues to anticipate the correct color hue they should name.

When testing with word-word stimuli, Dishon-Berkovits and Algom created positive, negative, and zero correlations.

They observed that zero correlations nearly eliminated Stroop effects, implying that the effects might be more about the way stimuli are presented rather than true indicators of automaticity or attention.

However, their methodology raised concerns:

  • They had difficulty creating zero correlations with color-hue situations.
  • Their study didn’t include a neutral condition, which means interference and facilitation were not examined.
  • There’s a general finding that facilitation effects are smaller than interference effects, which their findings don’t necessarily support

Despite these considerations, the correlational approach does not invalidate Stroop’s original paradigm or the many studies based on it.

Stroop-based findings have been instrumental in understanding various clinical conditions like anxiety, schizophrenia, ADHD, dyslexia, PTSD, racial attributions, and others.

The takeaway is that while the theory proposed by Dishon-Berkovits and Algom introduces a fresh perspective, it does not negate the established findings and implications of the Stroop effect.

Instead, it encourages a deeper examination of how automaticity and attention might be influenced by certain environmental factors and correlations.

Describe why the Stroop test is challenging for us.

The Stroop test is challenging due to the cognitive conflict it creates between two mental processes: reading and color recognition. Reading is a well-learned, automatic process, whereas color recognition requires more cognitive effort.

When the word’s color and its semantic meaning don’t match, our brain’s automatic response to reading the word interferes with naming the color, causing a delay in response time and an increase in mistakes. This is known as the Stroop effect.

Banich, M. T., Milham, M. P., Atchley, R., Cohen, N. J., Webb, A., Wszalek, T., … & Magin, R. (2000). fMRI studies of Stroop tasks reveal unique roles of anterior and posterior brain systems in attentional selection . Journal of cognitive neuroscience, 12 (6), 988-1000.

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: a parallel distributed processing account of the Stroop effect . Psychological Review, 97 (3), 332.

Dishon-Berkovits, M., & Algom, D. (2000). The Stroop effect: It is not the robust phenomenon that you have thought it to be .  Memory & Cognition ,  28 , 1437-1449.

Dormal, V., Seron, X., & Pesenti, M. (2006). Numerosity-duration interference: A Stroop experiment . Acta psychologica, 121 (2), 109-124.

Frings, C., Englert, J., Wentura, D., & Bermeitinger, C. (2010). Decomposing the emotional Stroop effect . Quarterly journal of experimental psychology, 63 (1), 42-49.

Howieson, D. B., Lezak, M. D., & Loring, D. W. (2004). Orientation and attention. Neuropsychological assessment , 365-367.

Kahneman, D. (2011). Thinking, fast and slow . Macmillan.

Kampman, M., Keijsers, G. P., Verbraak, M. J., Näring, G., & Hoogduin, C. A. (2002). The emotional Stroop: a comparison of panic disorder patients, obsessive–compulsive patients, and normal controls, in two experiments. Journal of anxiety disorders, 16 (4), 425-441.

Lansbergen, M. M., Kenemans, J. L., & Van Engeland, H. (2007). Stroop interference and attention-deficit/hyperactivity disorder: a review and meta-analysis . Neuropsychology, 21 (2), 251.

Larsen, R. J., Mercer, K. A., & Balota, D. A. (2006). Lexical characteristics of words used in emotional Stroop experiments . Emotion, 6 (1), 62.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: an integrative review . Psychological bulletin, 109 (2), 163.

MacLeod, C. M. (2015). The stroop effect. Encyclopedia of Color Science and Technology.

McMahon, M. (2013). What Is the Stroop Effect. Retrieved November, 11 .

Milham, M. P., Banich, M. T., Claus, E. D., & Cohen, N. J. (2003). Practice-related effects demonstrate complementary roles of anterior cingulate and prefrontal cortices in attentional control . Neuroimage, 18 (2), 483-493.

Monahan, J. S. (2001). Coloring single Stroop elements: Reducing automaticity or slowing color processing? . The Journal of general psychology, 128 (1), 98-112.

Sahinoglu B, Dogan G. (2016). Event-Related Potentials and the Stroop Effect. Eurasian J Med , 48(1), 53‐57.

Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests: Administration, norms, and commentary . Oxford University Press.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions . Journal of experimental psychology, 18 (6), 643.

van Maanen, L., van Rijn, H., & Borst, J. P. (2009). Stroop and picture—word interference are two sides of the same coin . Psychonomic bulletin & review, 16 (6), 987-999.

Further information

  • Exampe of a stroop effect lab report
  • Picture-word interference is a Stroop effect: A theoretical analysis and new empirical findings

Print Friendly, PDF & Email

The reverse Stroop effect

Affiliation.

  • 1 Department of Psychology, Swarthmore College, PA 19081, USA. [email protected]
  • PMID: 10780025
  • DOI: 10.3758/bf03210730

In classic Stroop interference, manual or oral identification of sensory colors presented as incongruent color words is delayed relative to simple color naming. In the experiment reported here, this effect was shown to all but disappear when the response was simply to point to a matching patch of color. Conversely, strong reverse Stroop interference occurred with the pointing task. That is, when the sensory color of a color word was incongruent with that word, responses to color words were delayed by an average of 69 msec relative to a word presented in gray. Thus, incongruently colored words interfere strongly with pointing to a color patch named by the words, but little interference from incongruent color words is found when the goal is to match the color of the word. These results suggest that Stroop effects arise from response compatibility of irrelevant information rather than automatic processing or habit strength.

Publication types

  • Research Support, Non-U.S. Gov't
  • Color Perception / physiology*
  • Random Allocation
  • Reaction Time
  • Visual Perception / physiology*

Experiment in Cognition: Stroop Effect Research Paper

The current research paper is aimed at investigating the effect of interference in cognitive processes, which is known under the title of the Stroop effect. The study introduces the overview of the research conducted on the topic within a century of scientific work in the field of psychology. A variety of modifications of the experiment are discussed, and the theoretical explanations behind the investigated cognitive effect are explicitly articulated. The method section depicts the specifics of the experimental design, including the characteristics of the participants, the materials used, and the procedure. The results are demonstrated in the form of a table that demonstrates the descriptive statistics of the experiment. Finally, the findings are discussed to validate the hypothesis and explain the theory behind the identified cognitive effect.

Introduction

The human cognitive processes have long been the focus of psychological research. The way the brain processes different kinds of information affects the psychological and emotional spheres of individuals’ lives and constitutes a relevant research topic. The issues of people’s abilities to multitask and manage some physical or cognitive processes automatically present a number of research cases for psychologists. The patterns and specific features of brain work in general, and the functions of attention in particular, are at the center of scientists’ attention. One such pattern is the Stroop effect named after an American psychologist of the first half of the twentieth century who introduced the phenomenon of cognitive interference to the field of neuropsychology. In this research paper, the fundamentals of the Stroop effect, as they are presented by Stroop (1935), will be used as a basis for an experiment investigating the impact of cognitive interference on the time of response to stimuli.

The experiment conducted within the realm of this research study is a modification of the original experiment presented by Stroop (1935) in his article “Studies of interference in serial verbal reactions.” The work by Stroop (1935) was comprised of three similar experiments. The first one required reading the names of the colors when the ink color was different from the one named in the card. The second one asked the participants to name the color of the ink in which the words were typed under the circumstances where the colors named and the inked used for typing the same card were different. The third experiment investigated how practice influenced the time of response to interfering stimuli. Consequently, the results showed that the time spent on naming colors of the ink was 74 percent longer than that spent on reading the names (Stroop, 1935, p. 651). However, practice showed to have a positive effect on decreasing the time of reaction, implying the training implications of the cognitive processes.

The experiment discussed above became a significant benchmark in cognitive psychology that brought light onto the very important issues of cognition particularities. It incepted the phenomenon called the Stroop effect, meaning the delay in response to incongruent stimuli that explains the basic principles of how the human brain disseminates attention and processes information. Many researchers in the fields of neuropsychology, emotional psychology, and other branches of science continue to refer to the Stroop effect. Several variations have entered the scholarly circles since the first publication of Stroop’s (1935) article. Much investigation has been made to expand the clinical, theoretical, and scientific implications of the issue. Indeed, according to MacLeod (1991), such psychologists as McCown and Arnoult, Regan, Kipnis and Glickman, Golden, and others have significantly modified the original experiment by changing the conditions of the process, the stimuli, or the presentation of the word. In such a manner, they attempted to measure more complex issues related to interference. The Stroop effect contributes to the studying of individual differences in cognition, mental conditions, the flexibility of attention, and other psychological particularities.

Today, the Stroop task is very well-known and is applied in both clinical settings and for educational or training purposes. Mainly it is used to test “the ability to inhibit cognitive interference, which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus” (Scarpina & Tagini, 2017, p. 1). More variations continue to appear, where the semantics of the stimuli is altered to test emotional spheres, or the words are substituted with pictures, which allows obtaining more substantial results (MacLeod, 1991). However, the original test, which is the serial color-word test, remains highly relevant for testing cognitive abilities and the particularities of attention in individuals. Therefore, the standard version of the experiment is going to be carried out within this research study.

The experimental procedure allows for retrieving the exact information about the time spent by a participant in responding to the colors of the letter strings under different conditions. The difference in conditions is defined by either the absence or presence of interfering stimuli. As the academic literature on the issues suggests, the reason why participants spend more time naming the colors of letter strings under incongruent conditions is due to the automaticity of the reading process. Indeed, Stroop (1935), in his third experiment, came to the conclusion that practice influences performance in a positive way. The researcher stated that the differences in response delay when the subjects’ “habitual reaction pattern is interfering with reactions to a stimulus for which the subjects do not have a habitual reaction pattern” (Stroop, 1935, p. 658). In other words, people have an automatic habit of reading and understanding the meaning of the words but have to make an effort to name the colors since this process is not automatic.

This explanation refers to the reading habits and might be presented in a modified form. Indeed, the issues of the reading process automaticity are claimed by Megherbi, Elbro, Oakhill, Segui, and New (2017). The scholars investigated the Stroop effect in children whose reading skills are very low. The absence of the effect in such conditions proves the complexity of the cognitive processes that deal with decoding interfering stimuli when participating in the experiment. The researchers argue that this process contains two effects, including “the obligatory decoding of the distracting words” and the ability to “block out, to suppress, or inhibit the potential distraction” (Megherbi et al., 2017, p.657). Therefore, the delay in response to incongruent cards represents the multi-staged cognitive processes.

However, the neuropsychological perspective on the nature of the Stroop effect provides a more detailed explanation of the processes taking place in the human brain when completing the Stroop task. According to Banich (2019), there is a cascade-of-control model that includes four processes behind the Stroop effect. Each of them is processed in separate brain regions and deals with different goals necessary for task completion. Firstly, prefrontal and posterior brain regions come in conflict when processing the information that is either relevant or irrelevant for the task. Secondly, the mid regions of the brain process the working memory to include the information necessary for the task. Thirdly, separate areas of the brain are involved in giving responses in the later stages. Fourthly, “rostral dorsal regions of the anterior cingulate cortex … evaluate the appropriateness of the response selected and send feedback to lateral prefrontal regions to make adjustments in control as needed” (Banich, 2019, p. 2). Thus the inclusion of various loci of the brain to respond to inconsistent stimuli explains the existence of the Stroop effect.

All the above-mentioned theoretical explanations of the Stroop task implications allow for anticipating the results of the current research. Since the experiment will involve three different conditions, including neutral, consistent, and inconsistent, the response time will be different for each of them due to the level of interference between stimuli. The hypothesis for the current research is based on the literature review and helps to assume that the time spent on responding to inconsistent stimuli would be longer than the time spent on responding to consistent stimuli or neutral stimuli. Also, the response time under consistent conditions will be shorter than under neutral one due to the facilitating effect of color-word congruence. It is anticipated that the results of the study will be compatible with those obtained in the classical serial color-word experiment.

Participant

For the experiment held in a digital setting, it is sufficient to engage one participant to investigate the particularities of his or her cognition and attention. The participant in the current study is a healthy 24-year-old student at Athabasca University. The participant is a healthy individual without any physical or mental issues that might affect the results of the procedure. Before the beginning of the task, the student was instructed as per the conditions and rules relevant to the successful completion of the experiment. Also, the individual was informed that no personal information would be disclosed upon the procedure; only the results in the form of response time and errors will be collected. The participant has signed the informed consent form, thus agreeing to cooperate.

The experiment is conducted digitally with the utilization of electronic cards. Therefore, the materials used for the procedure include a computer and software that presents the experiment. The program contains three types of cards representing each of the three task conditions. The four colors used either as the ink color or as the words naming the colors are red, green, blue, and yellow. The first set of cards is aimed at testing responses in a neutral condition and contains colored letter strings in the form of Xs. The second type of cards includes those measuring interference and consists of the words the meanings of which do not match the ink color in which they are typed. The third set of cards is used to test a facilitating condition and presents the cards with words, meanings, and the ink color of which are consistent. Overall, there are 36 cards of each type, which are displayed twice; ultimately, a participant is asked to respond to 216 cards.

The procedure of the experiment is estimated to last for a maximum of thirty minutes. In the beginning, the participant is instructed according to the specifications of the experiment and its features. The informed consent is signed to retrieve the agreement for processing the information collected during the experiment. The task that is necessary to complete during the procedure is to name the color of the letter strings on the screen regardless of the meaning of the words typed in those colors. The colors used for the experiments are limited to four (red, green, blue, and yellow), each attributed to a particular button on the computer keyboard. More specifically, the red color is represented by ‘z’s, the green color is represented by ‘x’s, the blue color is represented by ‘.’, and the yellow color is represented by ‘/’. The symbols synchronized with the colors are displayed at the top of the screen throughout the whole experiment; however, it is advised to mark the respective buttons with colored patches for better recognition.

The participant is explained that it is required to respond to the color of the letters displayed on the screen by pressing a respective button as quickly as possible. Prior to the beginning of the test, a training session is run. Each card is shown on the screen for 1500 milliseconds, the pause between consecutive cards is represented by a white ‘x’s on the black screen and lasts for 500 milliseconds, after which the next card is shown. If the participant does not respond within the designated time or makes a mistake, a short sound is played, and the next card appears on the screen. Each set of cards (two sets containing 36 cards per one of the three conditions) is displayed randomly with pauses between them. The participant is obliged to press any button to start a new set of cards.

Upon the completion of the experiment, the results table is displayed that demonstrates the data retrieved during the procedure. Descriptive data portraying the time spent on each response, the code of the color, the text of a stimulus, and the type of conditions are available to review. Also, the summarized data showing the meantime of delay in response, standards deviation, and the number of trials (indicating errors) is presented. Finally, the group results collected from all participants taking part in the experiment and saved on a server are available for a researcher to compare the results of the participant with the average.

The participant has followed the instructions thoroughly and responded to each card attentively. The color patches were attached to the buttons of the keyboard for convenience. The pauses between different sets of cards were minimal. The summarized data is demonstrated in the form of descriptive statistics in Table 1 and shows the exact results of the task completion under three different conditions. The mean time of response is indicated in milliseconds spent per card. The standard deviation depicts the difference between the longest and the shortest reaction within the specific condition. The number of trials shows if there were errors made during the procedure.

As it is seen from the table, the neutral condition required more time to respond than the facilitation condition and let time than the interference condition. The interference condition or the one with the inconsistent representation of words and colors took the longest time of reaction. Finally, the facilitation condition or the one with the consistent representation of colors and words was the easiest task and took the least time to react. The number of errors (two) is insignificant and does not affect the results of the study. Overall, the results retrieved during the experiment on the Stroop effect are consistent with those demonstrating the average summarized data collected from 1476 participants. The group means time in condition 1 is 744.23, 809.65 in condition two, and 679.88 in condition three.

The current research hypothesized that the time spent on the responses under an inconsistent condition would be longer than that for responding under neutral or consistent conditions. The hypothesis was proved and showed that the dependent variable (the time of reaction) changed depending on the change in independent variables (colors of the words). The mean time for reacting to the stimuli where words and colors are different is 986.63. The time of reaction on the stimuli where the word meanings and the ink color match is 675.86. Therefore, the reaction to incongruent stimuli is delayed by 68.5 percent in comparison to the reaction to congruent word-color representation. These results match the initial findings made by Stroop (1935). As for the neutral condition where the participant had to name the color of letters X strings, the delay in the reaction was longer than in a facilitating condition.

To explain the Stroop effect identified within the conducted experiment, it is relevant to refer to the theory of automatic reading reaction and the complexity of the brain processes related to selective attention. In accordance with the claims made by Megherbi et al. (2017), reading is an automatic and uncontrolled reaction to words that occurs effortlessly and provokes an immediate reaction. However, the naming of the colors is not an automatic process; it requires a more complex analysis of information than involves blocking the data retrieved upon reading to respond according to the task requirements. Moreover, the fact that the reaction needs to be demonstrated not verbally, but by pressing a respective button, the brain takes time to validate the choice of the button. Therefore, the delay in the reaction under the inconsistent condition represents the Stroop effect. On the contrary, when the word is written in the color it names (for example, when the word ‘red’ is typed in red ink), which complies with the facilitating condition, the reaction is significantly faster since it is based solely on automatic reading.

The results of the experiment are valid and consistent with those obtained within the original study conducted by Stroop (1935). The indicators of reaction time might be used to interpret the individual cognitive characteristics of the participant. Nonetheless, the limitations of the research might include the engagement of only one participant, whose results were compared to those of a group. Also, the possible confusion when pressing a button representing a particular color might have biased the speed of reaction. However, the overall findings prove the validity of the Stroop effect and might be used for further research in cognitive psychology.

Banich, M. T. (2019). The Stroop effect occurs at multiple points along a cascade of control: Evidence from cognitive neuroscience approaches. Frontiers in Psychology, 10, 1-12. Web.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109 , 163-203.

Megherbi, H., Elbro, C., Oakhill, J., Segui, J., & New, B. (2017). The emergence of automaticity in reading: Effects of orthographic depth and word decoding ability on an adjusted Stroop measure. Journal of Experimental Child Psychology, 166 , 652-663.

Scarpina, F., & Tagini, S. (2017). The Stroop color and word test. Frontiers in Psychology, 8 , 1-8. Web.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18 , 643-662.

  • Chicago (A-D)
  • Chicago (N-B)

IvyPanda. (2021, August 8). Experiment in Cognition: Stroop Effect. https://ivypanda.com/essays/experiment-in-cognition-stroop-effect/

"Experiment in Cognition: Stroop Effect." IvyPanda , 8 Aug. 2021, ivypanda.com/essays/experiment-in-cognition-stroop-effect/.

IvyPanda . (2021) 'Experiment in Cognition: Stroop Effect'. 8 August.

IvyPanda . 2021. "Experiment in Cognition: Stroop Effect." August 8, 2021. https://ivypanda.com/essays/experiment-in-cognition-stroop-effect/.

1. IvyPanda . "Experiment in Cognition: Stroop Effect." August 8, 2021. https://ivypanda.com/essays/experiment-in-cognition-stroop-effect/.

Bibliography

IvyPanda . "Experiment in Cognition: Stroop Effect." August 8, 2021. https://ivypanda.com/essays/experiment-in-cognition-stroop-effect/.

  • The Stroop Test and Its Impact on Neuroscience
  • Stroop Reaction time on Adults with ADHD
  • The Stroop Effect of Colored Objects
  • Stroop Experiment in Information Processing
  • Stroop Effect on Memory Function
  • Stroop Experiment: Congruent and Incongruent Words
  • Impact of Age and Sex on Performance of Stroop Activity
  • Advanced Addiction Psychology Contemporary Perspectives
  • Neuropsychological Diagnosis and Its Main Goals
  • Description and Criteria for Traumatic Brain Injury
  • Cogmed Working Memory Training in Children
  • Human Learning and Non-Human Animal Studies
  • Nudges and Human Decision-Making
  • Mindfulness as a Practice in Therapy and Daily Life
  • Prisoner’s Dilemma in Examples

This paper is in the following e-collection/theme issue:

Published on 19.2.2024 in Vol 26 (2024)

Media Use Behavior Mediates the Association Between Family Health and Intention to Use Mobile Health Devices Among Older Adults: Cross-Sectional Study

Authors of this article:

Author Orcid Image

Original Paper

  • Jinghui Chang 1 * , PhD   ; 
  • Yanshan Mai 2 *   ; 
  • Dayi Zhang 2   ; 
  • Xixi Yang 1   ; 
  • Anqi Li 1 , MSc   ; 
  • Wende Yan 2   ; 
  • Yibo Wu 3 , PhD   ; 
  • Jiangyun Chen 1 , PhD  

1 School of Health Management, Southern Medical University, Guangzhou, China

2 School of Public Health, Southern Medical University, Guangzhou, China

3 School of Public Health, Peking University, Beijing, China

*these authors contributed equally

Corresponding Author:

Jiangyun Chen, PhD

School of Health Management

Southern Medical University

Number 1023, South Shatai Road

Baiyun District

Guangzhou, 510515

Phone: 86 1 858 822 0304

Email: [email protected]

Background: With the advent of a new era for health and medical treatment, characterized by the integration of mobile technology, a significant digital divide has surfaced, particularly in the engagement of older individuals with mobile health (mHealth). The health of a family is intricately connected to the well-being of its members, and the use of media plays a crucial role in facilitating mHealth care. Therefore, it is important to examine the mediating role of media use behavior in the connection between the family health of older individuals and their inclination to use mHealth devices.

Objective: This study aims to investigate the impact of family health and media use behavior on the intention of older individuals to use mHealth devices in China. The study aims to delve into the intricate dynamics to determine whether media use behavior serves as a mediator in the relationship between family health and the intention to use mHealth devices among older adults. The ultimate goal is to offer well-founded and practical recommendations to assist older individuals in overcoming the digital divide.

Methods: The study used data from 3712 individuals aged 60 and above, sourced from the 2022 Psychology and Behavior Investigation of Chinese Residents study. Linear regression models were used to assess the relationships between family health, media use behavior, and the intention to use mHealth devices. To investigate the mediating role of media use behavior, we used the Sobel-Goodman Mediation Test. This analysis focused on the connection between 4 dimensions of family health and the intention to use mHealth devices.

Results: A positive correlation was observed among family health, media use behavior, and the intention to use mHealth devices (r=0.077-0.178, P<.001). Notably, media use behavior was identified as a partial mediator in the relationship between the overall score of family health and the intention to use mHealth devices, as indicated by the Sobel test (z=5.451, P<.001). Subgroup analysis further indicated that a complete mediating effect was observed specifically between family health resources and the intention to use mHealth devices in older individuals with varying education levels.

Conclusions: The study revealed the significance of family health and media use behavior in motivating older adults to adopt mHealth devices. Media use behavior was identified as a mediator in the connection between family health and the intention to use mHealth devices, with more intricate dynamics observed among older adults with lower education levels. Going forward, the critical role of home health resources must be maximized, such as initiatives to develop digital education tailored for older adults and the creation of media products specifically designed for them. These measures aim to alleviate technological challenges associated with using media devices among older adults, ultimately bolstering their inclination to adopt mHealth devices.

Introduction

The 2022 United Nations report on “World Population Prospects” predicted that by 2050, the global population will reach 9.7 billion. Within this demographic shift, 1.5 billion individuals aged 65 and above are anticipated, constituting 16% of the total population [ 1 ]. Notably, the trend of population aging is intensifying. In the context of population dynamics, China, as a heavily populated nation, is undergoing significant and intricate transformations. The Seventh National Population Census of China revealed that there are 264 million individuals aged 60 or older in the country, comprising 18.7% of the overall population [ 2 ]. This underscores the profound changes in China’s demographic landscape. The rapidly increasing aging rate in China poses substantial challenges for the future development of the country’s medical services. Over 180 million older adults in China grapple with chronic diseases, and a staggering 75% of them contend with multiple chronic illnesses [ 3 ]. This places older individuals in a high-risk and vulnerable category, imposing considerable financial and operational burdens on China’s medical and health sector.

Mobile health (mHealth) devices typically encompass mHealth programs and wearable devices [ 4 ]. Functioning as portable tools leveraging internet communication technology, these devices continuously monitor diverse physiological conditions. They have the capability to track and record users’ daily lifestyle and health status data in real-time [ 5 ]. These real-time data are instrumental for users to make informed adjustments to their health behaviors, facilitated by prompt feedback on health information [ 6 ]. The utilization of mHealth devices addresses the emerging need for self-monitoring and self-management within the expanding medical service market, aligning with heightened health awareness among consumers. These devices play a pivotal role in enabling early diagnosis, intervention, clinical treatment, and monitoring of various diseases by continuously supervising vital signs in real-time. However, it is noteworthy that despite the potential benefits, mHealth devices are not widely embraced by older individuals [ 7 ]. Consequently, the robust functionalities and inherent advantages of these devices remain underutilized within this demographic group. Emerging as an inevitable outcome of the internet era and the aging society, mHealth holds substantial potential to offer a promising solution to meet the escalating demands for medical services in developing countries [ 8 ]. Recognizing that older individuals constitute the most frequent and substantial users of health services [ 9 ], it becomes imperative to cultivate a new social trend, encouraging the integration of older individuals with mHealth [ 10 ].

Prior research has demonstrated that mHealth can significantly enhance the health, well-being, and longevity of older individuals in the digital era. However, it also introduces a new social governance challenge—the digital divide among older individuals [ 11 , 12 ]. This divide arises from challenges in accessing or utilizing information infrastructure coupled with a lower level of digital education, resulting in difficulties for older individuals to stay abreast of social, economic, and technological advancements [ 13 ]. As outlined in the 50th Statistical Report on the Development of the Internet in China by the China Internet Network Information Center, individuals aged 60 and above constitute the predominant group of non-netizens, comprising 41.6% of this demographic [ 14 ]. A confluence of personal, family, social, and technological factors collectively contributes to the estrangement of older individuals from engaging with new media, such as the internet [ 15 ]. Research indicates that the motivation for older individuals to actively seek health information on the internet is closely tied to their interactions with family or friends [ 16 ]. Older adults primarily rely on their families for social support, and the cohesion within the family unit significantly influences their overall health status [ 17 , 18 ].

Family health represents a collective resource that emerges from the interconnected well-being of each family member, encompassing their health, interactions, capacities, and the family’s overall physical, social, emotional, economic, and medical resources [ 19 ]. As an interdisciplinary concept, evaluating family health necessitates a thorough examination of various factors, including but not limited to family functioning, emotional support, financial resources, and access to external services [ 20 ]. Existing literature demonstrates that family support plays a pivotal role in motivating older individuals to seek medical services [ 21 ]. Additionally, family function and overall health serve as crucial indicators for assessing the mental well-being of older individuals [ 22 ]. Communication within the family, involving interactions with children, grandchildren, and peer groups, influences older individuals’ inclination to adopt smart senior care solutions [ 23 ]. While numerous articles predominantly explore family health from a singular dimension [ 24 - 26 ], there exists a research gap concerning the specific influence of family health on older individuals’ intention to adopt mHealth devices.

The evolution of mHealth is intricately linked to the technical backing of media. Media technology plays a dual role—it not only generates visual data representing health conditions detected by mHealth devices [ 27 ] but also serves as a platform for the public to exchange and share medical information. In the case of older adults, their acceptance of new health services and access to health information are influenced in distinct ways by the utilization of media devices [ 28 , 29 ]. A Chinese empirical analysis revealed a fundamental correlation between media use and the health level of older adults [ 30 ]. Social media communication is considered an intervention measure to alleviate the loneliness experienced by older adults, achieved by enhancing social support and contact levels, thereby fostering positive responses to emerging technologies [ 31 , 32 ]. Furthermore, the utilization of mobile phones and other media significantly influences disparities in medical care. Increasing the frequency of contact and sustained use of media by older individuals can contribute to unlocking the considerable potential of mobile medical technology in the health care of older individuals [ 33 ].

In summary, there is an immediate and practical need to reduce the digital divide among older adults. The willingness of older individuals to embrace mHealth devices, as reflected in surveys, signifies their acceptance of new health technologies and, to a certain extent, their integration into the era of mHealth. Previous research on factors influencing the intention to use mHealth devices among older adults has predominantly centered on understanding the behavioral motivations and mechanisms behind users’ intentions to use, emphasizing the impact of technical and social aspects on actual usage behavior [ 34 ]. Research on influencing factors has primarily delved into age, gender, education level, BMI, income, and health status, among other individual aspects [ 35 - 37 ]. However, there is a paucity of studies examining external environmental factors, notably the influence of family and social dynamics, particularly among the older adult population in China. A previous study indicated that family internet access enhances older adults’ cognitive function and increases the frequency of media use [ 38 ]. Moreover, family support has been identified as a crucial factor aiding older adults in overcoming barriers to the utilization of mHealth services [ 39 ]. Considering the substantial impact of family factors on the proactive health information-seeking behavior of older individuals [ 40 - 43 ], it becomes imperative to delve deeper into the relationship between family health, media use behavior, and the older individual’s intention to use mHealth devices. Additionally, exploring the mediating role of media use behavior between family health and the older individual’s intention to use mHealth devices is crucial. This comprehensive investigation aims to facilitate the integration of older individuals into the “digital age” starting from the family level, foster the adoption of mHealth in the health care sector, enhance societal healthy aging, and contribute to the realization of the objectives outlined in the “Healthy China 2030 Plan.”

In this study, information pertaining to family health, media use behavior, and the intention to use mHealth devices among older adults was gathered from the Psychology and Behavior Investigation of Chinese Residents (PBICR) study. The primary objective of this study was to examine the impact of family health and media use behavior on the intention of older individuals to use mHealth devices in China. Furthermore, the study aimed to assess whether media use behavior acts as a mediating factor in the relationship between family health and the intention to use mHealth devices among older adults. Drawing upon the insights gained from the literature review, the following hypotheses were formulated: (1) family health has a direct impact on the intention to use mHealth devices among older adults; (2) family health exerts an indirect influence on the intention to use mHealth devices through the mediating factor of media use behavior; in other words, media use behavior serves as a mediator in the relationship between family health and the intention to use mHealth devices.

Study Design and Setting

The data for this study were sourced from the PBICR survey, a comprehensive cross-sectional survey initiated by the Peking University School of Public Health in 2022. The survey encompasses 148 cities spanning 23 provinces, 5 autonomous regions, and 4 municipalities directly under the central government in China. Using a multistage sampling approach, the survey uses a stratified sampling method in cities, districts, counties, and communities, and uses a quota sampling method from the community level down to the individual level.

The survey was carried out by adeptly trained investigators. Electronic questionnaires (developed previously [ 44 ]) were distributed directly to the public through one-on-one, face-to-face interactions on-site. Respondents could access the questionnaire by scanning the provided QR code. In situations where face-to-face investigations were impeded due to the constraints of the COVID-19 epidemic, investigators distributed the electronic questionnaire on a one-on-one basis through instant communication tools such as WeChat (Tencent Holdings Ltd.). Additionally, online video investigations were conducted through platforms such as Tencent Meeting (Tencent Holdings Ltd.)and WeChat video [ 45 ].

Within the PBICR survey, investigators underwent comprehensive training in sampling methods, research tools, and quality control. Only those investigators who strictly adhered to the trained survey procedures were deemed qualified and eligible to participate in the study. Furthermore, during the data processing phase, 2 researchers were designated to perform logical checks. Questionnaires that did not meet the predetermined screening criteria were excluded, ensuring the quality and reliability of the data. Additionally, in this study, further screening was implemented to eliminate questionnaires completed in an excessively short time, those containing outliers, or those with missing values.

In the 2022 PBICR survey, a total of 23,414 questionnaires were collected. Following logical checks and the elimination of outliers, 21,916 questionnaires were deemed valid. For the purposes of this study, the focus will be confined to the age group of 60 years and above. Consequently, the final sample size included 3712 older adults after sorting.

Participants

A total of 21,916 questionnaires were collected, with the screening criterion being individuals aged 60 years and above, ensuring the absence of missing data and logic errors. Following a meticulous summary and screening process, 3712 valid survey responses were obtained for analysis in this study.

The inclusion criteria for participants in this study were as follows: (1) age between 18 and 60 years old; (2) possession of the nationality of the People’s Republic of China; (3) status as a Chinese permanent resident with an annual travel time of 1 month or less; (4) willing participation in the study and voluntary completion of the informed consent form; (5) ability to independently complete the questionnaire survey or do so with the assistance of investigators; (6) capacity to comprehend the meaning of each item in the questionnaire.

The exclusion criteria for participants in this study were as follows: (1) individuals with unconsciousness or mental disorders; (2) individuals with cognitive impairment; (3) those currently participating in other similar research projects; and (4) individuals unwilling to collaborate or reluctant to participate in the study.

Ethics Approval

The study adhered to the principles outlined in the Declaration of Helsinki. Ethical approval for all experimental protocols was granted by the ethics research committees of the Health Culture Research Center of Shaanxi (approval number JKWH-2022-02) and Second Xiangya Hospital of Central South University (approval number 2022-K050). The cover page of the questionnaire provided a clear explanation of the study’s purpose and assured participants of anonymity, confidentiality, and the right to refuse participation. Informed consent was obtained from all participants involved in the study.

The questionnaire cover used in this study provided a detailed explanation of the study’s purpose and ensured participants of anonymity, confidentiality, and the right to refuse participation. All participants were required to voluntarily sign an informed consent form before engaging in the study. While respondents did not directly benefit from the survey, their input contributed to a more comprehensive and systematic understanding of the physical and mental health status of the public. The data from this study will be strictly managed and used in accordance with the Statistics Law of the People’s Republic of China. The research data are intended for academic purposes only, and when the research findings are published, no information about individual participants will be disclosed or adversely affected.

Measurements

General situation survey information.

The basic demographic information of the older individuals included gender, age rank, nationality, religion, BMI rank, political status, status of occupation, education level, chronic diseases, and family type (conjugal family, core family, backbone family, and other family).

Family types were defined as follows:

  • Conjugal family: a family consisting of only husband and wife.
  • Core family: a family consisting of parents and unmarried children.
  • Backbone family: a family consisting of parents and married children.
  • Other family: other families including joint families, single-parent families, DINK (dual income, no kids) families, and single families.

Short-Form of the Family Health Scale

The assessment of family health in this study used the Chinese version of The Short-Form of the Family Health Scale (FHS-SF), developed by Crandall et al [ 20 ]. Wang et al [ 46 ] introduced the FHS-SF cross-culturally to create a Chinese version as a quantitative tool for evaluating family health issues in China. The scale comprises 10 items, encompassing 4 dimensions: family social and emotional health processes, family health lifestyle, family health resources, and family external social supports. A 5-point Likert scale was used for each item of the FHS-SF, with response options ranging from 1=strongly disagree to 5=strongly agree. Items with negative wording were scored in reverse. The final score on the scale ranged from 10 to 50, where higher scores indicated higher levels of family health. Wang et al [ 46 ] reported that the Cronbach α for the FHS-SF was .83. Additionally, the Cronbach α for the 4 subscales ranged from .70 to .90, and the retest reliability of the scale was 0.75.

In our study, the composite reliability values for the 4 dimensions were 0.912, 0.848, 0.781, and 0.806, respectively. All these values surpass the reliability threshold of 0.7. The average variance extracted values for the dimensions were 0.775, 0.736, 0.553, and 0.677, respectively, all of which exceed the threshold of 0.5. The Cronbach α of the FHS-SF was .90, and the factor loadings ranged from 0.73 to 0.90, all within an acceptable range.

Media Use Behavior Scale

The frequency of media use in this study was gauged using the Media Use Behavior Scale developed by the PBICR survey of Peking University. The scale encompasses various media channels such as newspapers, radio, television, the internet, and mobile phones. Comprising 6 items related to social contact, self-presentation, social behavior, leisure and entertainment, access to information, and business transactions, the scale uses options that signify the degree of media use frequency, ranging from “1=infrequent” to “5=frequent.” The total score on the scale ranges from 6 to 30, with higher scores indicative of more frequent use of the media [ 45 ].

In this study, the composite reliability for the Media Use Behavior Scale was 0.894, and the average variance extracted was 0.585. The Cronbach α for the Media Use Behavior Scale was .89, indicating strong internal consistency. Additionally, the standardized factor loadings obtained from the validation factor analysis were above 0.50, all falling within acceptable limits.

Intention to Use mHealth Devices

The intention to use mHealth devices in this study was assessed through subjective evaluations. Participants were required to provide a numerical response ranging from 0 to 100 based on their individual subjective awareness. This formed a continuous variable, where a higher numerical value indicated a stronger intention to use mHealth devices.

Data Analysis

Continuous variables were assessed for normality using the Kolmogorov-Smirnov test and presented as the median and IQR. Categorical variables were reported in terms of frequency and percentage. Nonparametric methods were used to test the differences in characteristics related to the total score of the intention to use mHealth devices. Specifically, the Mann-Whitney U test was used for dichotomous variables, while the Kruskal-Wallis H test was used for multicategorical variables. The partial correlation coefficient between family health scores, media use behavior scores, and intention to use mHealth devices scores was calculated using a regression model. Linear regression models were used to assess the association between family health scores and media use behavior/intention to use mHealth devices scores, both with and without adjustment for covariates. The associations between media use behavior and intention to use mHealth devices scores were also examined. The results are reported as coefficients along with 95% CIs. Covariates, determined based on previous studies and general knowledge, were included in the models for adjustment. To examine the mediating role of media use behavior scores in the association between family health scores and intention to use mHealth devices scores, we conducted a Sobel-Goodman Mediation Test. This analysis was performed while controlling for all selected covariates. The significance of the indirect effect, direct effect, and the total effect was determined using the bootstrap algorithm.

All P values were 2-sided, with a significance level (α) of .05 used to define statistical significance. The data were analyzed using IBM SPSS Statistics 26 and R version 4.1.3 (R Foundation).

Subgroup Analysis

Indeed, empirical studies have consistently indicated a positive association between education and health. Individuals with higher levels of education often exhibit a tendency to adopt healthier lifestyles, and their increased income may lead to greater investment in health-related expenses [ 47 ]. Furthermore, education is closely linked to varying levels of internet participation. Generally, individuals with higher educational attainment are more likely to use online platforms for accessing health-related information [ 48 ]. In diverse educational and cultural backgrounds, individuals may exhibit varying levels of concern regarding health risks, subsequently influencing their acceptance of health care technology [ 49 ]. Additionally, preliminary analysis in our study revealed significant differences in the total score of family health across different education levels ( P <.001). Building on the established influence of education on health behavior and media use, as outlined in the existing literature and supported by our results, this paper intends to analyze education level as a subgroup. The aim is to comprehensively explore the mediating role of media use behavior among older adults with different education levels in the relationship between family health and their intention to use mHealth devices.

General Characteristics

A total of 3712 older individuals aged 60 and above participated in this study, with an average age of 69.23 (SD 6.13) years. The majority of older adults (3036/3712, 81.79%) fell within the age range of 60-74 years. Basic demographic data for the 3712 older adult participants are detailed in Table 1 . Among them, 1839 were males (49.54%) and 1873 were females (50.46%). The majority identified as Han nationality (3370/3712, 90.79%) and nonreligious (3416/3712, 92.03%), with the majority expressing mass political views (3151/3712, 84.89%). There were noteworthy differences in the willingness to use mHealth devices among older adults with varying political statuses, occupational statuses, and chronic disease conditions ( P <.001). However, no significant differences were observed in the willingness to use mHealth devices among older adults with different family types ( P =.97; Table 1 ).

a Median (IQR) was used to describe the continuous variable, whereas n (%) was used to describe the categorical variable.

Association Analysis

After adjusting for covariates, the intention to use mHealth devices exhibited a positive correlation with the total score of family health ( r =0.077, P <.001) and the media use behavior score ( r =0.178, P <.001). Additionally, the total score of family health was positively correlated with the media use behavior score ( r =0.079, P <.001; Table 2 ).

a The model was adjusted for various covariates, including religion, BMI rank, political status, occupational status, education degree, and chronic diseases. Variables achieved statistical significance at P ≤.05.

b N/A: not applicable.

Relationship Between Family Health and Media Use Behavior Score/Intention to Use mHealth Devices

In the linear regression models before adjustment, the 4 dimensions of family health (ie, family socialization, family healthy lifestyle, family health resources, and family external social support) and the total score were significantly ( P <.001) associated with media use behavior. Moreover, they were significantly ( P <.001) related to the intention to use mHealth devices, except for family health resources ( P= .15). After adjusting for gender and age rank, as well as political status, nationality, religion, BMI rank, occupation status, education level, family type, and chronic diseases, all dimensions remained statistically significant ( P <.001) except for family health resources ( P= .29; Table 3 ).

a Data were adjusted for gender and age rank, political status, nation, religion, BMI rank, status of occupation, education degree, family type, and chronic diseases.

Relationship Between Media Use Behavior Score and Intention to Use mHealth Devices

In the linear regression models before adjustment, media use behavior was significantly ( P <.001) associated with the intention to use mHealth devices. After adjusting for gender and age rank, as well as political status, nationality, religion, BMI rank, occupation status, education level, family type, and chronic diseases, the association remained statistically significant ( P <.001; Table 4 ).

Mediation Analysis

The family health total score demonstrated a positive association with the intention to use mHealth devices among older adults. Mediation analysis, including media use behavior, revealed that the relationship between the total score of family health and the intention to use mHealth devices was mediated through media use behavior. In this study, media use behavior partially mediated the association between family health and the intention to use mHealth devices. The mediating variable accounted for nearly a quarter (22.46/100) of the association when adjusting for covariates. The total score of family health was associated with media use behavior (β=.088, P <.001) and intention to use mHealth devices (β=.244, P <.001). Additionally, media use behavior was linked to the intention to use mHealth devices (β=.810, P <.001). The final mediation models depicting the independent variable (total score of family health), the mediating variable (media usage behavior), and the dependent variable (intention to use mHealth devices) are illustrated in Figure 1 .

research paper on stroop effect

The 4 dimensions of family health were positively associated with the use of mHealth devices among older adults, except for the dimension of family health resources, which had a nonsignificant association ( P= .72). The mediation analysis involving media use behavior indicated that the direct and total effects of family health resources were not significant ( P =.72 and P =.20, respectively). Media use behavior acted as a full mediator when adjusting for covariates. Media use behavior partially mediated the relationship between family social, family healthy lifestyle, family external social support, and the intention to use mHealth devices, with mediating effects of 35.18/100, 31.78/100, and 31.33/100, respectively, under adjusted covariates ( Table 5 ).

a The Sobel-Goodman Mediation Test was applied in adjusted models for religion, BMI rank, political status, occupation status, education level, and chronic diseases.

b The Sobel test was used to assess the hypothesis that the indirect role was equal to 0, adjusting for covariates such as religion, BMI rank, political status, occupation status, education level, and chronic diseases. Values reach statistical significance at P ≤.05.

Subgroup analyses based on education degrees are presented in Table 6 . Among the older adult population with primary school education and below, media use behavior showed no mediating effect between the total score of family health and the intention to use mHealth devices ( z =–0.942; indirect effect=–0.019, P =.35; direct effect=0.252, P =.007). Additionally, the mediating effect of media use behavior between family healthy lifestyles and the intention to use mHealth devices was not significant ( z =1.953, P =.052). Media use behavior fully mediated the association between family health resources scores and intention to use mHealth devices scores in different education degrees among the older adult population: primary school and below degree older adult population ( z =–5.832; indirect effect=–0.331, P <.001; direct effect=0.218, P= .29), middle school/vocational school/high school degree older adult population ( z =–3.439; indirect effect=–0.136, P <.001; direct effect=–0.066, P =.76), and college and above degree older adult population ( z =–2.516; indirect effect=–0.212, P= .01; direct effect=0.026, P =.93).

a The Sobel-Goodman Mediation Test was applied in adjusted models for religion, BMI rank, political status, status of occupation, and chronic diseases.

Principal Findings

Previous studies have consistently demonstrated that family factors play a crucial role in influencing the frequency of media use and the acceptance of mHealth among older adults [ 50 ]. The findings of our study further confirm that family health positively contributes to increasing the willingness of older adults to use mHealth devices. Additionally, a high frequency of media use behavior emerges as a significant driver for the utilization of mHealth devices, a behavior that is profoundly influenced by the state of family health. The results align with previous research on the digital divide among older adults, indicating that those with higher family health scores tend to engage in more frequent media contact behaviors. This heightened connectivity to the internet makes them more adaptable to a big data–based mHealth environment, fostering a greater willingness to use mHealth devices. Before conducting the mediation analysis, the study also observed, through univariate analysis, that older individuals over 90 years and those who were unemployed exhibited a lower willingness to use mobile medical devices. The results confirm the existence of differences in the digital divide among age groups, especially with older age groups experiencing inequalities in social and economic support [ 51 , 52 ]. These disparities may further impact their access to and utilization of media devices.

In addition to the descriptive findings, this study delves into the intricate relationship between family health and the willingness to use mHealth devices, uncovering the mediating role of media use behavior. Primarily, the study supports the positive impact of media use behavior, which partially mediates the influence of overall family health levels on the intention to use mHealth devices. Furthermore, the results indicate that media use behavior serves as a fully mediating variable in the dimension of family health resources. In essence, the findings suggest that older adults lacking family health resources completely lose their willingness to use mHealth devices, primarily due to their challenges in accessing or using media. This underscores the crucial role of family health resources in integrating older adults into the internet sphere and enabling them to benefit from mHealth technology. The study emphasizes the practical importance of addressing resource-related health inequities, with financial support from the family being identified as a critical factor in the daily lives of seniors [ 52 ]. To address the imbalance in the distribution of resources among families in different regions at the societal level, it is crucial for the government to assist socioeconomically disadvantaged older adults in gaining greater access to various devices. This can be achieved through economic empowerment initiatives and the development of policies aimed at bridging the digital divide [ 53 ].

Building upon the crucial role of media contacts in linking family health resources and the willingness to use mHealth devices among the older population, there is an opportunity to further motivate the desire for mHealth device usage. Leveraging the positive influence of family health resources to increase the frequency of media exposure can enhance the motivation of older individuals. Effective communication within the family emerges as a catalyst for improving the technology literacy and information-seeking skills of older adults [ 16 ]. Family members play a crucial role in supporting seniors to build confidence in using internet technology while alleviating their anxiety and fear of new technologies. Encouraging older adults to adapt and learn information technology, such as WeChat and health-related mobile apps, through straightforward and repeated demonstrations can be an effective strategy [ 54 ]. Additionally, family support may help mitigate the economic challenges associated with using health care services by influencing older adults’ subjective perceptions of financial accessibility [ 55 ]. To address financial challenges and enhance older adults’ access to technology, a comprehensive approach can be adopted. This involves leveraging both the financial support within the family and external economic resources. Encouraging family members to provide suitable financial assistance to each other, coupled with ensuring stable financial security for older individuals, can be achieved by gradually increasing pensions for retirees. This approach aims to augment the purchasing power of older adults, enabling them to acquire media devices and enhancing their ability to use technological devices in the health care sector to a greater extent.

The subgroup analysis further indicated that media use behavior did not mediate the relationship between the total family health score and the intention to use mHealth devices among older adults with primary school education or below. However, it did partially mediate the association among those with primary school education and above, aligning with the study hypothesis. Given that the older adult population with low education levels may experience relatively weak cognitive function and lack personal health literacy [ 56 , 57 ], the mechanisms by which they are influenced by family, social, and economic environments in the acceptance of new health technologies become more intricate. Conversely, older adults with a high school education or higher often perceive themselves as having an above-average ability to learn, making them less uncomfortable with the changing social environment brought about by technological developments [ 58 ]. Moreover, older individuals with limited education often lack access to information technology education or the ability to operate mobile devices [ 59 ]. For these individuals, exposure to media devices or mHealth devices is relatively homogeneous. Consequently, they may lack a progressive transition from regular media contact behaviors to the use of mHealth devices.

Disparities in internet participation levels due to education constitute a significant barrier hindering older adults from using media devices to access the mHealth era. To bridge the “digital divide” and enhance the effective use of mHealth devices among older individuals, it is imperative to consider implementing relevant education measures. These measures can focus on improving their ability to use smart technology, thus empowering them to navigate and benefit from the advancements in health care technology. In alignment with the comprehensive “Smart Senior Care” action plan in China [ 60 ], communities can implement health education initiatives through a blend of technology-supported learning and traditional lectures. For instance, using touchscreen tablets for courses on healthy diet and nutrition guidance can enhance the older individual’s interest in the internet while imparting essential health and hygiene knowledge [ 61 ]. This approach serves to bridge the transition from traditional modes of access to mobile health care. Adopting adaptive behaviors and learning strategies can further enhance the efficiency and effectiveness of mobile health care apps [ 62 ]. In the mHealth era, the design of mHealth devices should be tailored to the cognitive abilities and mindset of older individuals. Full consideration should be given to their eHealth literacy, incorporating improvements in usability, emphasizing the responsiveness of operations, and integrating monitoring functions that align with the physical activities of older individuals [ 63 ]. Such considerations aim to enhance the overall satisfaction of older individuals with mobile health care apps [ 64 ]. Moreover, due to prevailing stereotypes about older people, digital platforms often harbor ageist mechanisms that categorize them as users uninterested in technology [ 65 ]. This results in an unfavorable digital environment for older individuals. In general, the development and application of internet technology must not overlook the realistic capacity and objective demands of older individuals [ 66 ]. Digital platforms should strive to create more inclusive algorithms and use statistical models of social digital media practices that cater to all literacy levels [ 65 ]. This may involve reducing complex and lengthy text that is difficult to understand, avoiding in-depth and complex hierarchical options, and adopting simple page designs [ 67 ] to mitigate the impact of technological differences on the accessibility of digital health care for older adults.

Strength and Limitations

This study contributes significantly to the existing literature by evaluating the connection between family health, media use behavior, and the intention to use mHealth devices among older adults, using cross-sectional data from the PBICR survey. The findings of this study support our hypothesis that media use behavior serves as a mediator between family health status and the intention to use mHealth devices among older adults. Furthermore, a subgroup analysis based on education level revealed that the impact of family health on the willingness to use mHealth devices through media use behavior was not significant among older adults with lower education levels, indicating a nuanced mechanism at play. All of the aforementioned studies contribute to the body of research on the digital divide among older individuals.

Despite comprehensive consideration, the results of this study have several limitations. First, due to the exploratory cross-sectional design, no causal inferences can be drawn. Second, the majority of seniors included in this study were in the young-old age group (60 to 74 years old), lacking representation of the entire age spectrum of older adults and potentially neglecting variations in social background associated with age factors. Third, the results obtained in this study may be influenced by economic factors and psychological variables. As mHealth devices represent an evolving component of the health system, their development trajectory is still undergoing exploration. It is possible that various latent factors influencing the relationship between family health, media use behavior, and the intention to use mHealth devices are yet to be uncovered.

Conclusions

In conclusion, this study highlights the substantial impact of family health and media use behavior on the intention of older adults to use mHealth devices. Media use behavior acts as a mediator in the relationship between family health and the intention to use mHealth devices, with more intricate dynamics observed among older adults with lower educational levels. These findings emphasize that robust family health, particularly sufficient family health resources, plays a crucial role in enhancing the media engagement of older individuals, ultimately fostering their interest in embracing mHealth devices. The insights from this work provide valuable recommendations for bridging the gap in digital health adoption among older adults. Furthermore, encouraging teaching by family members can create a supportive environment for seniors to embrace mobile technology, while financial support can enhance their accessibility to health-related mobile devices. Additionally, developing age-specific digital education programs and media products tailored to the needs and preferences of older individuals can contribute to overcoming technological barriers and fostering a positive digital experience for older adults in the realm of mobile health care. These strategies align with the goal of promoting inclusive and user-friendly digital solutions for seniors, ensuring they can benefit from advancements in health technology.

Acknowledgments

This study was conducted with the support of data from the Psychology and Behavior Investigation of Chinese Residents (PBICR). We appreciate all the participants who showed great patience in answering the questionnaires. None of the portions of this article used generative artificial intelligence. This work was supported by the 2023 Guangdong Province Education Science Planning Project (Specialized in Higher Education; 2023GXJK252), the Science and Technology Program of Guangzhou (grant numbers 2023A04J2267 and 2024A04J02668), the Guangdong Basic and Applied Basic Research Foundation (grant number 2021A1515110743), the Health Economics Association of Guangdong Province (grant number 2023-WJMZ-51), the Student Innovation and Entrepreneurship Training Program of Guangdong Province (grant number S202312121283), the Key Laboratory of Philosophy and Social Sciences of Guangdong Higher Education Institutions for Health Policies Research and Evaluation (grant number 2015WSY0010), and the Research Base for Development of Public Health Service System of Guangzhou.

Data Availability

The data sets generated and analyzed during this study are not publicly available because the data still need to be used for other research but are available from the corresponding author on reasonable request.

Authors' Contributions

JHC, YBW, and JYC designed and conducted this study. YBW collected data. YSM, AQL, and XXY participated in the data screening. DYZ and WDY conducted data analysis. JHC and YSM wrote the first draft of the paper. JYC contributed to supervising data analysis and developing the manuscript. All authors made contributions to the critical revision of the manuscript. The authors read and approved the final manuscript.

Conflicts of Interest

None declared.

  • World population porospects 2022: summary of results. United Nations Department of Economic and Social Affairs, Population Division. 2022. URL: https:/​/www.​un.org/​development/​desa/​pd/​sites/​www.un.org.development.desa.pd/​files/​wpp2022_summary_of_results.​pdf [accessed 2022-07-26]
  • Communique of the Seventh National Population Census (No.5). National Bureau of Statistics. May 11, 2021. URL: https://www.stats.gov.cn/sj/zxfb/202302/t20230203_1901085.html [accessed 2021-05-11]
  • Guiding opinions on establishing and improving the health service system for the elderly. National Health Commission/State Council Information Office. Nov 01, 2019. URL: http://www.scio.gov.cn/xwfb/bwxwfb/gbwfbh/wsjkwyh/202307/t20230703_721062.html [accessed 2024-02-11]
  • Wang X, Wu Y, Meng Z, Li J, Xu L, Sun X, et al. Willingness to use mobile health devices in the post-COVID-19 era: nationwide cross-sectional study in China. J Med Internet Res. Feb 17, 2023;25:e44225. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tran V, Riveros C, Ravaud P. Patients' views of wearable devices and AI in healthcare: findings from the ComPaRe e-cohort. NPJ Digit Med. 2019;2:53. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Piwek L, Ellis DA, Andrews S, Joinson A. The rise of consumer health wearables: promises and barriers. PLoS Med. Feb 2016;13(2):e1001953. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kekade S, Hseieh C, Islam MM, Atique S, Mohammed Khalfan A, Li Y, et al. The usefulness and actual use of wearable devices among the elderly population. Comput Methods Programs Biomed. Jan 2018;153:137-159. [ CrossRef ] [ Medline ]
  • Li H, Zhang T, Chi H, Chen Y, Li Y, Wang J. Mobile health in China: current status and future development. Asian J Psychiatr. Aug 2014;10:101-104. [ CrossRef ] [ Medline ]
  • Choi N. Relationship between health service use and health information technology use among older adults: analysis of the US National Health Interview Survey. J Med Internet Res. Apr 20, 2011;13(2):e33. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mo XT, Deng ZH. An empirical study on mobile health service acceptance behavior of middle-aged and elderly users in Wuhan. Chinese Journal of Health Statistics. 2015;32(02):324-327.
  • Yang JH, Liu YF. Longevity bonus of digital era: the capability and endogenous motivation of old people's digital life. Administration Reform. 2022;1(1):26-36. [ CrossRef ]
  • Lu JH, Wei XD. Analysis framework, concept, and pathways of digital divide governance for older adults: from the perspective of digital divide and knowledge gap theory. Population Research. 2021;45(03):17-30.
  • Huang CX. Status, challenges and countermeasures of the digital divide in older adults. People's Tribune. 2020;29:126-128.
  • The 50th Statistical Report on Internet Development in China. China Internet Network Information Center(CNNIC). 2022. URL: https://www.cnnic.net.cn/n4/2022/0914/c88-10226.html [accessed 2022-08-31]
  • Yang B, Jin DC. The manifestation, motivation and solution of elderly digital divide. Academic Journal of Zhongzhou. 2021(12):74-80.
  • Magsamen-Conrad K, Dillon JM, Billotte Verhoff C, Faulkner SL. Online health-information seeking among older populations: family influences and the role of the medical professional. Health Commun. Jul 2019;34(8):859-871. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kim S, Sok SR. Relationships among the perceived health status, family support and life satisfaction of older Korean adults. Int J Nurs Pract. Aug 2012;18(4):325-331. [ CrossRef ] [ Medline ]
  • Phillips DR, Feng Z. Challenges for the aging family in the People's Republic of China. Can J Aging. Sep 2015;34(3):290-304. [ CrossRef ] [ Medline ]
  • Weiss-Laxer NS, Crandall A, Okano L, Riley AW. Building a foundation for family health measurement in national surveys: a modified Delphi expert process. Matern Child Health J. Mar 2020;24(3):259-266. [ CrossRef ] [ Medline ]
  • Crandall A, Weiss-Laxer NS, Broadbent E, Holmes EK, Magnusson BM, Okano L, et al. The Family Health Scale: reliability and validity of a short- and long-form. Front Public Health. 2020;8:587125. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yuan B, Zhang T, Li J. Family support and transport cost: understanding health service among older people from the perspective of social-ecological model. Arch Public Health. Jul 19, 2022;80(1):173. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Leung K, Chen C, Lue B, Hsu S. Social support and family functioning on psychological symptoms in elderly Chinese. Arch Gerontol Geriatr. 2007;44(2):203-213. [ CrossRef ] [ Medline ]
  • Zhang Z, Mao YH, Hu YC. A study on willingness to use intelligent elderly care services from the perspective of elderly digital divide. Northwest Population Journal. 2023;21:1-12.
  • Molina-Mula J, Gallo-Estrada J, González-Trujillo A. Self-perceptions and behavior of older people living alone. Int J Environ Res Public Health. Nov 24, 2020;17(23):8739. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nguyen T, Irizarry C, Garrett R, Downing A. Access to mobile communications by older people. Australas J Ageing. Jun 2015;34(2):E7-E12. [ CrossRef ] [ Medline ]
  • Bisschop MI, Kriegsman DMW, van Tilburg TG, Penninx BWJH, van Eijk JTM, Deeg DJH. The influence of differing social ties on decline in physical functioning among older people with and without chronic diseases: the Longitudinal Aging Study Amsterdam. Aging Clin Exp Res. Apr 2003;15(2):164-173. [ CrossRef ] [ Medline ]
  • Martínez-Pérez B, de la Torre-Díez I, López-Coronado M. Mobile health applications for the most prevalent conditions by the World Health Organization: review and analysis. J Med Internet Res. Jun 14, 2013;15(6):e120. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li C, Neugroschl J, Zhu CW, Aloysi A, Schimming CA, Cai D, et al. Design considerations for mobile health applications targeting older adults. J Alzheimers Dis. 2021;79(1):1-8. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Devos P, Min Jou A, De Waele G, Petrovic M. Design for personalized mobile health applications for enhanced older people participation. European Geriatric Medicine. Dec 2015;6(6):593-597. [ CrossRef ]
  • Wang H, Sun X, Wang R, Yang Y, Wang Y. The impact of media use on disparities in physical and mental health among the older people: an empirical analysis from China. Front Public Health. 2022;10:949062. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang K, Kim K, Silverstein NM, Song Q, Burr JA. Social media communication and loneliness among older adults: the mediating roles of social support and social contact. Gerontologist. Aug 13, 2021;61(6):888-896. [ CrossRef ] [ Medline ]
  • Ma Y, Liang C, Gu D, Zhao S, Yang X, Wang X. How social media use at work affects improvement of older people's willingness to delay retirement during transfer from demographic bonus to health bonus: causal relationship empirical study. J Med Internet Res. Feb 10, 2021;23(2):e18264. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang C. Smartphones and telemedicine for older people in China: opportunities and challenges. Digit Health. 2022;8:20552076221133695. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lee SM, Lee D. Healthcare wearable devices: an analysis of key factors for continuous use intention. Serv Bus. Oct 15, 2020;14(4):503-531. [ CrossRef ]
  • Krebs P, Duncan DT. Health app use among US mobile phone owners: a national survey. JMIR Mhealth Uhealth. Nov 04, 2015;3(4):e101. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li J, Ma Q, Chan AH, Man S. Health monitoring through wearable technologies for older adults: smart wearables acceptance model. Appl Ergon. Feb 2019;75:162-169. [ CrossRef ] [ Medline ]
  • Stühmann LM, Paprott R, Heidemann C, Baumert J, Hansen S, Zahn D, et al. Health app use and its correlates among individuals with and without type 2 diabetes: nationwide population-based survey. JMIR Diabetes. May 20, 2020;5(2):e14396. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li Y, Han W, Hu M. Does internet access make a difference for older adults' cognition in urban China? The moderating role of living arrangements. Health Soc Care Community. Jul 2022;30(4):e909-e920. [ CrossRef ] [ Medline ]
  • Mizrachi Y, Shahrabani S, Nachmani M, Hornik A. Obstacles to using online health services among adults age 50 and up and the role of family support in overcoming them. Isr J Health Policy Res. Aug 21, 2020;9(1):42. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tu J, Shen M, Zhong J, Yuan G, Chen M. The perceptions and experiences of mobile health technology by older people in Guangzhou, China: a qualitative study. Front Public Health. 2021;9:683712. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Navabi N, Ghaffari F, Jannat-Alipoor Z. Older adults' attitudes and barriers toward the use of mobile phones. Clin Interv Aging. 2016;11:1371-1378. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Harris T, Cook DG, Victor CR, Beighton C, Dewilde S, Carey IM. Linking survey data with computerised records to predict consulting by older people. Br J Gen Pract. Dec 2004;54(509):928-931. [ FREE Full text ] [ Medline ]
  • Gao M, Li Y, Zhang S, Gu L, Zhang J, Li Z, et al. Does an empty nest affect elders' health? Empirical evidence from China. Int J Environ Res Public Health. Apr 27, 2017;14(5):463. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wenjuanxing. URL: https://www.wjx.cn/ [accessed 2024-02-07]
  • Wang Y, Kaierdebieke A, Fan S, Zhang R, Huang M, Li H, et al. Study protocol: a cross-sectional study on psychology and behavior investigation of Chinese residents, PBICR. Psychosom Med Res. 2022;4(3):19. [ CrossRef ]
  • Wang F, Wu Y, Sun X, Wang D, Ming W, Sun X, et al. Reliability and validity of the Chinese version of a short form of the family health scale. BMC Prim Care. May 06, 2022;23(1):108. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wang W, Dong Y, Liu X, Zhang L, Bai Y, Hagist S. The more educated, the healthier: evidence from rural China. Int J Environ Res Public Health. Dec 13, 2018;15(12):2848. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Oh YS, Choi EY, Kim YS. Predictors of smartphone uses for health information seeking in the Korean elderly. Soc Work Public Health. 2018;33(1):43-54. [ CrossRef ] [ Medline ]
  • Nadal C, Sas C, Doherty G. Technology acceptance in mobile health: scoping review of definitions, models, and measurement. J Med Internet Res. Jul 06, 2020;22(7):e17256. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chew TH, Chin CP, Leau Y. Untangling factors influencing social networking sites use among older adults: a literature review. Univ Access Inf Soc. Mar 17, 2022;22(3):687-698. [ CrossRef ]
  • Hunsaker A, Hargittai E. A review of internet use among older adults. New Media & Society. Jul 16, 2018;20(10):3937-3954. [ CrossRef ]
  • García MF, Hessel P, Rodríguez-Lesmes P. Wealth and inequality gradients for the detection and control of hypertension in older individuals in middle-income economies around 2007-2015. PLoS One. 2022;17(7):e0269118. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Guo W, Chen L, Perez C. Economic status, family dependence, and health outcomes of older people in western rural China. J Gerontol Soc Work. Oct 2019;62(7):762-775. [ CrossRef ] [ Medline ]
  • Shi Y, Ma D, Zhang J, Chen B. In the digital age: a systematic literature review of the e-health literacy and influencing factors among Chinese older adults. Z Gesundh Wiss. 2023;31(5):679-687. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Di X, Wang L, Yang L, Dai X. Impact of economic accessibility on realized utilization of home-based healthcare services for the older adults in China. Healthcare (Basel). Feb 17, 2021;9(2):218. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu H, Byles JE, Xu X, Zhang M, Wu X, Hall JJ. Evaluation of successful aging among older people in China: results from China health and retirement longitudinal study. Geriatr Gerontol Int. Aug 2017;17(8):1183-1190. [ CrossRef ] [ Medline ]
  • Quenzel G, Vogt D, Schaeffer D. Differences in health literacy of adolescents with lower educational attainment, older people and migrants. Gesundheitswesen. Nov 2016;78(11):708-710. [ CrossRef ] [ Medline ]
  • Berkowsky RW, Sharit J, Czaja SJ. Factors predicting decisions about technology adoption among older adults. Innov Aging. Jan 2018;2(1):igy002. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jun W. A study on cause analysis of digital divide among older people in Korea. Int J Environ Res Public Health. Aug 14, 2021;18(16):8586. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • National Health Commission; National Council on the Aging. Notice on the in-depth implementation of the “Smart Help for the Elderly” Action in 2022. National Health Commission. 2022. URL: http://www.nhc.gov.cn/lljks/zcwj2/202206/24a5b60b8789409c9053b38e4aab19e7.shtml [accessed 2022-06-16]
  • Chiu C, Kuo S, Lin D. Technology-embedded health education on nutrition for middle-aged and older adults living in the community. Glob Health Promot. Sep 2019;26(3):80-87. [ CrossRef ] [ Medline ]
  • Yu Y, Yan XD, Z X, Zhou SL. What they gain depends on what they do: an exploratory empirical research on effective use of mobile healthcare applications. Presented at: Hawaii International Conference on System Sciences; January 8-11, 2019, 2019; Maui, HI. URL: https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1477&context=hicss-52 [ CrossRef ]
  • Ma Z, Gao Q, Yang M. Adoption of wearable devices by older people: changes in use behaviors and user experiences. International Journal of Human–Computer Interaction. Aug 31, 2022;39(5):964-987. [ CrossRef ]
  • Zhang X, Yan X, Cao X, Sun Y, Chen H, She J. The role of perceived e-health literacy in users’ continuance intention to use mobile healthcare applications: an exploratory empirical study in China. Information Technology for Development. Mar 09, 2017;24(2):198-223. [ CrossRef ]
  • Rosales A, Fernández-Ardèvol M. Ageism in the era of digital platforms. Convergence (Lond). Dec 2020;26(5-6):1074-1087. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhou X, Chen L. Digital health care in China and access for older people. The Lancet Public Health. Dec 2021;6(12):e873-e874. [ CrossRef ]
  • Gao Q, Ebert D, Chen X, Ding Y. Design of a mobile social community platform for older Chinese people in Urban areas. Hum Factors Man. Jun 27, 2012;25(1):66-89. [ CrossRef ]

Abbreviations

Edited by T de Azevedo Cardoso; submitted 18.06.23; peer-reviewed by R Sun, X Zhang; comments to author 08.08.23; revised version received 29.08.23; accepted 28.01.24; published 19.02.24.

©Jinghui Chang, Yanshan Mai, Dayi Zhang, Xixi Yang, Anqi Li, Wende Yan, Yibo Wu, Jiangyun Chen. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 19.02.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Front Psychol

The Stroop Color and Word Test

Federica scarpina.

1 “Rita Levi Montalcini” Department of Neuroscience, University of Turin, Turin, Italy

2 IRCCS Istituto Auxologico Italiano, Ospedale San Giuseppe, Piancavallo, Italy

Sofia Tagini

3 CiMeC Center for the Mind/Brain Sciences, University of Trento, Rovereto, Italy

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify the theoretical adequacy of the various scoring methods used to measure the Stroop effect. We present a systematic review of studies that have provided normative data for the SCWT. We referred to both electronic databases (i.e., PubMed, Scopus, Google Scholar) and citations. Our findings show that while several scoring methods have been reported in literature, none of the reviewed methods enables us to fully assess the Stroop effect. Furthermore, we discuss several normative scoring methods from the Italian panorama as reported in literature. We claim for an alternative scoring method which takes into consideration both speed and accuracy of the response. Finally, we underline the importance of assessing the performance in all Stroop Test conditions (word reading, color naming, named color-word).

Introduction

The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used for both experimental and clinical purposes. It assesses the ability to inhibit cognitive interference, which occurs when the processing of a stimulus feature affects the simultaneous processing of another attribute of the same stimulus (Stroop, 1935 ). In the most common version of the SCWT, which was originally proposed by Stroop in the 1935, subjects are required to read three different tables as fast as possible. Two of them represent the “congruous condition” in which participants are required to read names of colors (henceforth referred to as color-words) printed in black ink (W) and name different color patches (C). Conversely, in the third table, named color-word (CW) condition, color-words are printed in an inconsistent color ink (for instance the word “red” is printed in green ink). Thus, in this incongruent condition, participants are required to name the color of the ink instead of reading the word. In other words, the participants are required to perform a less automated task (i.e., naming ink color) while inhibiting the interference arising from a more automated task (i.e., reading the word; MacLeod and Dunbar, 1988 ; Ivnik et al., 1996 ). This difficulty in inhibiting the more automated process is called the Stroop effect (Stroop, 1935 ). While the SCWT is widely used to measure the ability to inhibit cognitive interference; previous literature also reports its application to measure other cognitive functions such as attention, processing speed, cognitive flexibility (Jensen and Rohwer, 1966 ), and working memory (Kane and Engle, 2003 ). Thus, it may be possible to use the SCWT to measure multiple cognitive functions.

In the present article, we present a systematic review of the SCWT literature in order to assess the theoretical adequacy of the different scoring methods proposed to measure the Stroop effect (Stroop, 1935 ). We focus on Italian literature, which reports the use of several versions of the SCWT that vary in in terms of stimuli, administration protocol, and scoring methods. Finally, we attempt to indicate a score method that allows measuring the ability to inhibit cognitive interference in reference to the subjects' performance in SCWT.

We looked for normative studies of the SCWT. All studies included a healthy adult population. Since our aim was to understand the various available scoring methods, no studies were excluded on the basis of age, gender, and/or education of participants, or the specific version of SCWT used (e.g., short or long, computerized or paper). Studies were identified using electronic databases and citations from a selection of relevant articles. The electronic databases searched included PubMed (All years), Scopus (All years) and Google Scholar (All years). The last search was run on the 22nd February, 2017, using the following search terms: “Stroop; test; normative.” All studies written in English and Italian were included.

Two independent reviewers screened the papers according to their titles and abstracts; no disagreements about suitability of the studies was recorded. Thereafter, a summary chart was prepared to highlight mandatory information that had to be extracted from each report (see Table ​ Table1 1 ).

Summary of data extracted from reviewed articles; those related to the Italian normative data are in bold .

One Author extracted data from papers while the second author provided further supervision. No disagreements about extracted data emerged. We did not seek additional information from the original reports, except for Caffarra et al. ( 2002 ), whose full text was not available: relevant information have been extracted from Barletta-Rodolfi et al. ( 2011 ).

We extracted the following information from each article:

  • Year of publication.
  • Indexes whose normative data were provided.

Eventually, as regards the variables of interest, we focused on those scores used in the reviewed studies to assess the performance at the SCWT.

We identified 44 articles from our electronic search and screening process. Eleven of them were judged inadequate for our purpose and excluded. Four papers were excluded as they were written in languages other than English or Italian (Bast-Pettersen, 2006 ; Duncan, 2006 ; Lopez et al., 2013 ; Rognoni et al., 2013 ); two were excluded as they included children (Oliveira et al., 2016 ) and a clinical population (Venneri et al., 1992 ). Lastly, we excluded six Stroop Test manuals, since not entirely procurable (Trenerry et al., 1989 ; Artiola and Fortuny, 1999 ; Delis et al., 2001 ; Golden and Freshwater, 2002 ; Mitrushina et al., 2005 ; Strauss et al., 2006a ). At the end of the selection process we had 32 articles suitable for review (Figure ​ (Figure1 1 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-08-00557-g0001.jpg

Flow diagram of studies selection process .

From the systematic review, we extracted five studies with Italian normative data. Details are reported in Table ​ Table1. 1 . Of the remaining 27 studies that provide normative data for non-Italian populations, 16 studies (Ivnik et al., 1996 ; Ingraham et al., 1988 ; Rosselli et al., 2002 ; Moering et al., 2004 ; Lucas et al., 2005 ; Steinberg et al., 2005 ; Seo et al., 2008 ; Peña-Casanova et al., 2009 ; Al-Ghatani et al., 2011 ; Norman et al., 2011 ; Andrews et al., 2012 ; Llinàs-Reglà et al., 2013 ; Morrow, 2013 ; Lubrini et al., 2014 ; Rivera et al., 2015 ; Waldrop-Valverde et al., 2015 ) adopted the scoring method proposed by Golden ( 1978 ). In this method, the number of items correctly named in 45 s in each conditions is calculated (i.e., W, C, CW). Then the predicted CW score (Pcw) is calculated using the following formula:

equivalent to:

Then, the Pcw value is subtracted from the actual number of items correctly named in the incongruous condition (CW) (i.e., IG = CW − Pcw): this procedure allows to obtain an interference score (IG) based on the performance in both W and C conditions. Thus, a negative IG value represents a pathological ability to inhibit interference, where a lower score means greater difficulty in inhibiting interference.

Six articles (Troyer et al., 2006 ; Bayard et al., 2011 ; Campanholo et al., 2014 ; Bezdicek et al., 2015 ; Hankee et al., 2016 ; Tremblay et al., 2016 ) adopted the Victoria Stroop Test. In this version, three conditions are assessed: the C and the CW correspond to the equivalent conditions of the original version of the test (Stroop, 1935 ), while the W condition includes common words which do not refer to colors. This condition represents an intermediate inhibition condition, as the interference effect between the written word and the color name is not present. In this SCWT form (Strauss et al., 2006b ), for each condition, the completion time and the number of errors (corrected, non-corrected, and total errors) are recorded and two interference scores are computed:

Five studies (Strickland et al., 1997 ; Van der Elst et al., 2006 ; Zalonis et al., 2009 ; Kang et al., 2013 ; Zimmermann et al., 2015 ) adopted different SCWT versions. Three of them (Strickland et al., 1997 ; Van der Elst et al., 2006 ; Kang et al., 2013 ) computed, independently, the completion time and the number of errors for each condition. Additionally, Van der Elst et al. ( 2006 ), computed an interference score based on the speed performance only:

where WT, CT, and CWT represent the time to complete the W, C, and CW table, respectively. Zalonis et al. ( 2009 ) recorded: (i) the time; (ii) the number of errors and (iii) the number of self-corrections in the CW. Moreover, they computed an interference score subtracting the number of errors in the CW conditions from the number of items properly named in 120 s in the same table. Lastly, Zimmermann et al. ( 2015 ) computed the number of errors and the number of correct answers given in 45 s in each conditions. Additionally, they calculated an interference score derived by the original scoring method provided by Stroop ( 1935 ).

Of the five studies (Barbarotto et al., 1998 ; Caffarra et al., 2002 ; Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) that provide normative data for the Italian population, two are originally written in Italian (Caffarra et al., 2002 ; Valgimigli et al., 2010 ), while the others are written in English (Barbarotto et al., 1998 ; Amato et al., 2006 ; Brugnolo et al., 2015 ). An English translation of the title and abstract of Caffarra et al. ( 2002 ) is available. Three of the studies consider the performance only on the SCWT (Caffarra et al., 2002 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) while the others also include other neuropsychological tests in the experimental assessment (Barbarotto et al., 1998 ; Amato et al., 2006 ). The studies are heterogeneous in that they differ in terms of administered conditions, scoring procedures, number of items, and colors used. Three studies adopted a 100-items version of the SCWT (Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) which is similar to the original version proposed by Stroop ( 1935 ). In this version, in every condition (i.e., W, C, CW), items are arranged in a matrix of 10 × 10 columns and rows; the colors are red, green, blue, brown, and purple. However, while two of these studies administered the W, C, and CW conditions once (Amato et al., 2006 ; Valgimigli et al., 2010 ), Barbarotto et al. ( 1998 ) administered the CW table twice, requiring participants to read the word during the first administration and then to name the ink color during the consecutive administration. Additionally, they also administered a computerized version of the SCWT in which 40 stimuli are presented in each condition; red, blue, green, and yellow are used. Valgimigli et al. ( 2010 ) and Caffarra et al. ( 2002 ) administered shorter paper versions of the SCWT including only three colors (i.e., red, blue, green). More specifically, the former administered only the C and CW conditions including 60 items each, arranged in six columns of 10 items. The latter employed a version of 30 items for each condition (i.e., W, C, CW), arranged in three columns of 10 items each.

Only two of the five studies assessed and provided normative data for all the conditions of the SCWT (i.e., W, C, CW; Caffarra et al., 2002 ; Brugnolo et al., 2015 ), while others provide only partial results. Valgimigli et al. ( 2010 ) provided normative data only for the C and CW condition, while Amato et al. ( 2006 ) and Barbarotto et al. ( 1998 ) administered all the SCWT conditions (i.e., W, C, CW) but provide normative data only for the CW condition, and the C and CW condition respectively.

These studies use different methods to compute subjects' performance. Some studies record the time needed, independently in each condition, to read all (Amato et al., 2006 ) or a fixed number (Valgimigli et al., 2010 ) of presented stimuli. Others consider the number of correct answers produced in a fixed time (30 s; Amato et al., 2006 ; Brugnolo et al., 2015 ). Caffarra et al. ( 2002 ) and Valgimigli et al. ( 2010 ) provide a more complex interference index that relates the subject's performance in the incongruous condition with the performance in the others. In Caffarra et al. ( 2002 ), two interference indexes based on reading speed and accuracy, respectively, are computed using the following formula:

Furthermore, in Valgimigli et al. ( 2010 ) an interference score is computed using the formula:

where DC represents the correct answers produced in 20 s in naming colors and DI corresponds to the correct answers achieved in 20 s in the interference condition. However, they do not take into account the performance on the word reading condition.

According to the present review, multiple SCWT scoring methods are available in literature, with Golden's ( 1978 ) version being the most widely used. In the Italian literature, the heterogeneity in SCWT scoring methods increases dramatically. The parameters of speed and accuracy of the performance, essential for proper detection of the Stroop Effect, are scored differently between studies, thus highlighting methodological inconsistencies. Some of the reviewed studies score solely the speed of the performance (Amato et al., 2006 ; Valgimigli et al., 2010 ). Others measure both the accuracy and speed of performance (Barbarotto et al., 1998 ; Brugnolo et al., 2015 ); however, they provide no comparisons between subjects' performance on the different SCWT conditions. On the other hand, Caffarra et al. ( 2002 ) compared performance in the W, C, and CW conditions; however, they computed speed and accuracy independently. Only Valgimigli et al. ( 2010 ) present a scoring method in which an index merging speed and accuracy is computed for the performance in all the conditions; however, the Authors assessed solely the performance in the C and the CW conditions, neglecting the subject's performance in the W condition.

In our opinion, the reported scoring methods impede an exhaustive description of the performance on the SCWT, as suggested by clinical practice. For instance, if only the reading time is scored, while accuracy is not computed (Amato et al., 2006 ) or is computed independently (Caffarra et al., 2002 ), the consequences of possible inhibition difficulties on the processing speed cannot be assessed. Indeed, patients would report a non-pathological reading speed in the incongruous condition, despite extremely poor performance, even if they do not apply the rule “naming ink color,” simply reading the word (e.g., in CW condition, when the stimulus is the word/red/printed in green ink, patient says “Red” instead of “Green”). Such behaviors provide an indication of the failure to maintain consistent activation of the intended response in the incongruent Stroop condition, even if the participants properly understand the task. Such scenarios are often reported in different clinical populations. For example, in the incongruous condition, patients with frontal lesions (Vendrell et al., 1995 ; Stuss et al., 2001 ; Swick and Jovanovic, 2002 ) as well as patients affected by Parkinson's Disease (Fera et al., 2007 ; Djamshidian et al., 2011 ) reported significant impairments in terms of accuracy, but not in terms of processing speed. Counting the number of correct answers in a fixed time (Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ) may be a plausible solution.

Moreover, it must be noted that error rate (and not the speed) is an index of inhibitory control (McDowd et al., 1995 ) or an index of ability to maintain the tasks goal temporarily in a highly retrievable state (Kane and Engle, 2003 ). Nevertheless, computing exclusively the error rate (i.e., the accuracy in the performance), without measuring the speed of performance, would be insufficient for an extensive evaluation of the performance in the SCWT. In fact, the behavior in the incongruous condition (i.e., CW) may be affected by difficulties that are not directly related to an impaired ability to suppress the interference process, which may lead to misinterpretation of the patient's performance. People affected by color-blindness or dyslexia would represent the extreme case. Nonetheless, and more ordinarily, slowness, due to clinical circumstances like dysarthria, mood disorders such as depression, or collateral medication effect, may irremediably affect the performance in the SCWT. In Parkinson's Disease, ideomotor slowness (Gardner et al., 1959 ; Jankovic et al., 1990 ) impacts the processing speed in all SCWT conditions, determining a global difficulty in the response execution rather than a specific impairment in the CW condition (Stacy and Jankovic, 1992 ; Hsieh et al., 2008 ). Consequently, it seems necessary to relate the performance in the incongruous condition to word reading and color naming abilities, when inhibition capability has to be assessed, as proposed by Caffarra et al. ( 2002 ). In this method the W score and C score were subtracted from CW score. However, as previously mentioned, the scoring method suggested by Caffarra et al. ( 2002 ) computes errors and speed separately. Thus, so far, none of the proposed Italian normative scoring methods seem adequate to assess patients' performance in the SCWT properly and informatively.

Examples of more suitable interference scores can be found in non-Italian literature. Stroop ( 1935 ) proposed that the ability to inhibit cognitive interference can be measured in the SCWT using the formula:

where, total time is the overall time for reading; mean time per word is the overall time for reading divided by the number of items; and the number of uncorrected errors is the number of errors not spontaneously corrected. Gardner et al. ( 1959 ) also propose a similar formula:

where 100 refers to the number of stimuli used in this version of the SCWT. When speed and errors are computed together, the correct recognition of patients who show difficulties in inhibiting interference despite a non-pathological reading time, increases. However, both the mentioned scores (Stroop, 1935 ; Mitrushina et al., 2005 ) may be susceptible to criticism (Jensen and Rohwer, 1966 ). In fact, even though accuracy and speed are merged into a global score in these studies (Stroop, 1935 ; Mitrushina et al., 2005 ), they are not computed independently. In Gardner et al. ( 1959 ) the number of errors are computed in relation to the mean time per item and then added to the total time, which may be redundant and lead to a miscomputation.

The most adopted scoring method in the international panorama is Golden ( 1978 ). Lansbergen et al. ( 2007 ) point out that the index IG might not be adequately corrected for inter-individual differences in the reading ability, despite its effective adjustment for color naming. The Authors highlight that the reading process is more automated in expert readers, and, consequently, they may be more susceptible to interference (Lansbergen et al., 2007 ), thus, requiring that the score is weighted according to individual reading ability. However, experimental data suggests that the increased reading practice does not affect the susceptibility to interference in SCWT (Jensen and Rohwer, 1966 ). Chafetz and Matthews ( 2004 )'s article might be useful for a deeper understanding of the relationship between reading words and naming colors, but the debate about the role of reading ability on the inhibition process is still open. The issue about the role of reading ability on the SCWT performance cannot be adequately satisfied even if the Victoria Stroop Test scoring method (Strauss et al., 2006b ) is adopted, since the absence of the standard W condition.

In the light of the previous considerations, we recommend that a scoring method for the SCWT should fulfill two main requirements. First, both accuracy and speed must be computed for all SCWT conditions. And secondly, a global index must be calculated to relate the performance in the incongruous condition to reading words and color naming abilities. The first requirement can be achieved by counting the number of correct answers in each condition in within a fixed time (Amato et al., 2006 ; Valgimigli et al., 2010 ; Brugnolo et al., 2015 ). The second requirement can be achieved by subtracting the W score and C score from CW score, as suggested by Caffarra et al. ( 2002 ). None of the studies reviewed satisfies both these requirements.

According to the review, the studies with Italian normative data present different theoretical interpretations of the SCWT scores. Amato et al. ( 2006 ) and Caffarra et al. ( 2002 ) describe the SCWT score as a measure of the fronto-executive functioning, while others use it as an index of the attentional functioning (Barbarotto et al., 1998 ; Valgimigli et al., 2010 ) or of general cognitive efficiency (Brugnolo et al., 2015 ). Slowing to a response conflict would be due to a failure of selective attention or a lack in the cognitive efficiency instead of a failure of response inhibition (Chafetz and Matthews, 2004 ); however, the performance in the SCWT is not exclusively related to concentration, attention or cognitive effectiveness, but it relies to a more specific executive-frontal domain. Indeed, subjects have to process selectively a specific visual feature blocking out continuously the automatic processing of reading (Zajano and Gorman, 1986 ; Shum et al., 1990 ), in order to solve correctly the task. The specific involvement of executive processes is supported by clinical data. Patients with anterior frontal lesions, and not with posterior cerebral damages, report significant difficulties in maintaining a consistent activation of the intended response (Valgimigli et al., 2010 ). Furthermore, Parkinson's Disease patients, characterized by executive dysfunction due to the disruption of dopaminergic pathway (Fera et al., 2007 ), reported difficulties in SCWT despite unimpaired attentional abilities (Fera et al., 2007 ; Djamshidian et al., 2011 ).

According to the present review, the heterogeneity in the SCWT scoring methods in international literature, and most dramatically in Italian literature, seems to require an innovative, alternative and unanimous scoring system to achieve a more proper interpretation of the performance in the SCWT. We propose to adopt a scoring method in which (i) the number of correct answers in a fixed time in each SCWT condition (W, C, CW) and (ii) a global index relative to the CW performance minus reading and/or colors naming abilities, are computed. Further studies are required to collect normative data for this scoring method and to study its applicability in clinical settings.

Author contributions

Conception of the work: FS. Acquisition of data: ST. Analysis and interpretation of data for the work: FS and ST. Writing: ST, and revising the work: FS. Final approval of the version to be published and agreement to be accountable for all aspects of the work: FS and ST.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The Authors thank Prerana Sabnis for her careful proofreading of the manuscript.

  • Al-Ghatani A. M., Obonsawin M. C., Binshaig B. A., Al-Moutaery K. R. (2011). Saudi normative data for the Wisconsin Card Sorting test, Stroop test, test of non-verbal intelligence-3, picture completion and vocabulary (subtest of the wechsler adult intelligence scale-revised) . Neurosciences 16 , 29–41. [ PubMed ] [ Google Scholar ]
  • Amato M. P., Portaccio E., Goretti B., Zipoli V., Ricchiuti L., De Caro M. F., et al.. (2006). The Rao's brief repeatable battery and stroop test: normative values with age, education and gender corrections in an Italian population . Mult. Scler. 12 , 787–793. 10.1177/1352458506070933 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Andrews K., Shuttleworth-Edwards A., Radloff S. (2012). Normative indications for Xhosa speaking unskilled workers on the Trail Making and Stroop Tests . J. Psychol. Afr. 22 , 333–341. 10.1080/14330237.2012.10820538 [ CrossRef ] [ Google Scholar ]
  • Artiola L., Fortuny L. A. I. (1999). Manual de Normas Y Procedimientos Para la Bateria Neuropsicolog . Tucson, AZ: Taylor & Francis. [ Google Scholar ]
  • Barbarotto R., Laiacona M., Frosio R., Vecchio M., Farinato A., Capitani E. (1998). A normative study on visual reaction times and two Stroop colour-word tests . Neurol. Sci. 19 , 161–170. 10.1007/BF00831566 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barletta-Rodolfi C., Gasparini F., Ghidoni E. (2011). Kit del Neuropsicologo Italiano . Bologna: Società Italiana di Neuropsicologia. [ Google Scholar ]
  • Bast-Pettersen R. (2006). The Hugdahl Stroop Test: A normative stud y involving male industrial workers . J. Norwegian Psychol. Assoc. 43 , 1023–1028. [ Google Scholar ]
  • Bayard S., Erkes J., Moroni C. (2011). Collège des psychologues cliniciens spécialisés en neuropsychologie du languedoc roussillon (CPCN Languedoc Roussillon). Victoria Stroop Test: normative data in a sample group of older people and the study of their clinical applications in the assessment of inhibition in Alzheimer's disease . Arch. Clin. Neuropsychol. 26 , 653–661. 10.1093/arclin/acr053 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bezdicek O., Lukavsky J., Stepankova H., Nikolai T., Axelrod B. N., Michalec J., et al.. (2015). The Prague Stroop Test: normative standards in older Czech adults and discriminative validity for mild cognitive impairment in Parkinson's disease . J. Clin. Exp. Neuropsychol. 37 , 794–807. 10.1080/13803395.2015.1057106 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brugnolo A., De Carli F., Accardo J., Amore M., Bosia L. E., Bruzzaniti C., et al.. (2015). An updated Italian normative dataset for the Stroop color word test (SCWT) . Neurol. Sci. 37 , 365–372. 10.1007/s10072-015-2428-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Caffarra P., Vezzaini G., Dieci F., Zonato F., Venneri A. (2002). Una versione abbreviata del test di Stroop: dati normativi nella popolazione italiana . Nuova Rivis. Neurol. 12 , 111–115. [ Google Scholar ]
  • Campanholo K. R., Romão M. A., Machado M. A. D. R., Serrao V. T., Coutinho D. G. C., Benute G. R. G., et al. (2014). Performance of an adult Brazilian sample on the Trail Making Test and Stroop Test . Dement. Neuropsychol. 8 , 26–31. 10.1590/S1980-57642014DN81000005 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chafetz M. D., Matthews L. H. (2004). A new interference score for the Stroop test . Arch. Clin. Neuropsychol. 19 , 555–567. 10.1016/j.acn.2003.08.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Delis D. C., Kaplan E., Kramer J. H. (2001). Delis-Kaplan Executive Function System (D-KEFS) . San Antonio, TX: Psychological Corporation. [ Google Scholar ]
  • Djamshidian A., O'Sullivan S. S., Lees A., Averbeck B. B. (2011). Stroop test performance in impulsive and non impulsive patients with Parkinson's disease . Parkinsonism Relat. Disord. 17 , 212–214. 10.1016/j.parkreldis.2010.12.014 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Duncan M. T. (2006). Assessment of normative data of Stroop test performance in a group of elementary school students Niterói . J. Bras. Psiquiatr. 55 , 42–48. 10.1590/S0047-20852006000100006 [ CrossRef ] [ Google Scholar ]
  • Fera F., Nicoletti G., Cerasa A., Romeo N., Gallo O., Gioia M. C., et al.. (2007). Dopaminergic modulation of cognitive interference after pharmacological washout in Parkinson's disease . Brain Res. Bull. 74 , 75–83. 10.1016/j.brainresbull.2007.05.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gardner R. W., Holzman P. S., Klein G. S., Linton H. P., Spence D. P. (1959). Cognitive control: a study of individual consistencies in cognitive behaviour . Psychol. Issues 1 , 1–186. [ Google Scholar ]
  • Golden C. J. (1978). Stroop Color and Word Test: A Manual for Clinical and Experimental Uses . Chicago, IL: Stoelting Co. [ Google Scholar ]
  • Golden C. J., Freshwater S. M. (2002). The Stroop Color and Word Test: A Manual for Clinical and Experimental Uses . Chicago, IL: Stoelting. [ Google Scholar ]
  • Hankee L. D., Preis S. R., Piers R. J., Beiser A. S., Devine S. A., Liu Y., et al.. (2016). Population normative data for the CERAD word list and Victoria Stroop Test in younger-and middle-aged adults: cross-sectional analyses from the framingham heart study . Exp. Aging Res. 42 , 315–328. 10.1080/0361073X.2016.1191838 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hsieh Y. H., Chen K. J., Wang C. C., Lai C. L. (2008). Cognitive and motor components of response speed in the Stroop test in Parkinson's disease patients . Kaohsiung J. Med. Sci. 24 , 197–203. 10.1016/S1607-551X(08)70117-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ingraham L. J., Chard F., Wood M., Mirsky A. F. (1988). An Hebrew language version of the Stroop test . Percept. Mot. Skills 67 , 187–192. 10.2466/pms.1988.67.1.187 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ivnik R. J., Malec J. F., Smith G. E., Tangalos E. G., Petersen R. C. (1996). Neuropsychological tests' norms above age 55: COWAT, BNT, MAE token, WRAT-R reading, AMNART, STROOP, TMT, and JLO . Clin. Neuropsychol. 10 , 262–278. 10.1080/13854049608406689 [ CrossRef ] [ Google Scholar ]
  • Jankovic J., McDermott M., Carter J., Gauthier S., Goetz C., Golbe L., et al.. (1990). Parkinson Study Group. Variable expression of Parkinson's disease: a base-line analysis of DATATOP cohort . Neurology 40 , 1529–1534. [ PubMed ] [ Google Scholar ]
  • Jensen A. R., Rohwer W. D. (1966). The Stroop Color-Word Test: a Review . Acta Psychol. 25 , 36–93. 10.1016/0001-6918(66)90004-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kane M. J., Engle R. W. (2003). Working-memory capacity and the control of attention: the contributions of goal neglect, response competition, and task set to Stroop interference . J. Exp. Psychol. Gen. 132 , 47–70. 10.1037/0096-3445.132.1.47 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kang C., Lee G. J., Yi D., McPherson S., Rogers S., Tingus K., et al.. (2013). Normative data for healthy older adults and an abbreviated version of the Stroop test . Clin. Neuropsychol. 27 , 276–289. 10.1080/13854046.2012.742930 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lansbergen M. M., Kenemans J. L., van Engeland H. (2007). Stroop interference and attention-deficit/hyperactivity disorder: a review and meta-analysis . Neuropsychology 21 :251. 10.1037/0894-4105.21.2.251 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Llinàs-Reglà J., Vilalta-Franch J., López-Pousa S., Calvó-Perxas L., Garre-Olmo J. (2013). Demographically adjusted norms for Catalan older adults on the Stroop Color and Word Test . Arch. Clin. Neuropsychol. 28 , 282–296. 10.1093/arclin/act003 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lopez E., Salazar X. F., Villasenor T., Saucedo C., Pena R. (2013). Validez y datos normativos de las pruebas de nominación en personas con educación limitada , in Poster Presented at The Congress of the “Sociedad Lationoamericana de Neuropsicologıa” (Montreal, QC: ). [ Google Scholar ]
  • Lubrini G., Periañez J. A., Rios-Lago M., Viejo-Sobera R., Ayesa-Arriola R., Sanchez-Cubillo I., et al.. (2014). Clinical Spanish norms of the Stroop test for traumatic brain injury and schizophrenia . Span. J. Psychol. 17 :E96. 10.1017/sjp.2014.90 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lucas J. A., Ivnik R. J., Smith G. E., Ferman T. J., Willis F. B., Petersen R. C., et al.. (2005). Mayo's older african americans normative studies: norms for boston naming test, controlled oral word association, category fluency, animal naming, token test, wrat-3 reading, trail making test, stroop test, and judgment of line orientation . Clin. Neuropsychol. 19 , 243–269. 10.1080/13854040590945337 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • MacLeod C. M., Dunbar K. (1988). Training and Stroop-like interference: evidence for a continuum of automaticity . J. Exp. Psychol. Learn. Mem. Cogn. 14 , 126–135. 10.1037/0278-7393.14.1.126 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McDowd J. M., Oseas-Kreger D. M., Filion D. L. (1995). Inhibitory processes in cognition and aging , in Interference and Inhibition in Cognition , eds Dempster F. N., Brainerd C. J. (San Diego, CA: Academic Press; ), 363–400. [ Google Scholar ]
  • Mitrushina M., Boone K. B., Razani J., D'Elia L. F. (2005). Handbook of Normative Data for Neuropsychological Assessment . New York, NY: Oxford University Press. [ Google Scholar ]
  • Moering R. G., Schinka J. A., Mortimer J. A., Graves A. B. (2004). Normative data for elderly African Americans for the Stroop color and word test . Arch. Clin. Neuropsychol. 19 , 61–71. 10.1093/arclin/19.1.61 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morrow S. A. (2013). Normative data for the stroop color word test for a north american population . Can. J. Neurol. Sci. 40 , 842–847. 10.1017/S0317167100015997 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Norman M. A., Moore D. J., Taylor M., Franklin D., Jr., Cysique L., Ake C., et al.. (2011). Demographically corrected norms for African Americans and Caucasians on the hopkins verbal learning test–revised, brief visuospatial memory test–revised, stroop color and word test, and wisconsin card sorting test 64-card version . J. Clin. Exp. Neuropsychol. 33 , 793–804. 10.1080/13803395.2011.559157 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oliveira R. M., Mograbi D. C., Gabrig I. A., Charchat-Fichman H. (2016). Normative data and evidence of validity for the Rey Auditory Verbal Learning Test, Verbal Fluency Test, and Stroop Test with Brazilian children . Psychol. Neurosci. 9 , 54–67. 10.1037/pne0000041 [ CrossRef ] [ Google Scholar ]
  • Peña-Casanova J., Qui-ones-Ubeda S., Gramunt-Fombuena N., Quintana M., Aguilar M., Molinuevo J. L., et al.. (2009). Spanish multicenter normative studies (NEURONORMA Project): norms for the Stroop color-word interference test and the Tower of London-Drexel . Arch. Clin. Neuropsychol. 24 , 413–429. 10.1093/arclin/acp043 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rivera D., Perrin P. B., Stevens L. F., Garza M. T., Weil C., Saracho C. P., et al.. (2015). Stroop color-word interference test: normative data for the Latin American Spanish speaking adult population . Neurorehabilitation 37 , 591–624. 10.3233/NRE-151281 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rognoni T., Casals-Coll M., Sánchez-Benavides G., Quintana M., Manero R. M., Calvo L., et al.. (2013). Spanish normative studies in a young adult population (NEURONORMA young adults Project): norms for the Boston Naming Test and the Token Test . Neurología 28 , 73–80. 10.1016/j.nrl.2012.02.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rosselli M., Ardila A., Santisi M. N., Arecco Mdel R., Salvatierra J., Conde A., et al.. (2002). Stroop effect in Spanish–English bilinguals . J. Int. Neuropsychol. Soc. 8 , 819–827. 10.1017/S1355617702860106 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Seo E. H., Lee D. Y., Kim S. G., Kim K. W., Youn J. C., Jhoo J. H., et al.. (2008). Normative study of the Stroop Color and Word Test in an educationally diverse elderly population . Int. J. Geriatr. Psychiatry 23 , 1020–1027 10.1002/gps.2027 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shum D. H. K., McFarland K. A., Brain J. D. (1990). Construct validity of eight tests of attention: comparison of normal and closed head injured samples . Clin. Neuropsychol. 4 , 151–162. 10.1080/13854049008401508 [ CrossRef ] [ Google Scholar ]
  • Stacy M., Jankovic J. (1992). Differential diagnosis of parkinson's disease and the parkinsonism plus syndrome . Neurol. Clin. 10 , 341–359. [ PubMed ] [ Google Scholar ]
  • Steinberg B. A., Bieliauskas L. A., Smith G. E., Ivnik R. J. (2005). Mayo's older Americans normative studies: age-and IQ-adjusted norms for the trail-making test, the stroop test, and MAE controlled oral word association test . Clin. Neuropsychol. 19 , 329–377. 10.1080/13854040590945210 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Strauss E., Sherman E. M., Spreen O. (2006a). A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . Oxford: American Chemical Society. [ Google Scholar ]
  • Strauss E., Sherman E. M. S., Spreen O. (2006b). A Compendium of Neuropsychological Tests, 3rd Edn. New York, NY: Oxford University Press. [ Google Scholar ]
  • Strickland T. L., D'Elia L. F., James R., Stein R. (1997). Stroop color-word performance of African Americans . Clin. Neuropsychol. 11 , 87–90. 10.1080/13854049708407034 [ CrossRef ] [ Google Scholar ]
  • Stroop J. R. (1935). Studies of interference in serial verbal reactions . J. Exp. Psychol. 18 , 643–662. 10.1037/h0054651 [ CrossRef ] [ Google Scholar ]
  • Stuss D. T., Floden D., Alexander M. P., Levine B., Katz D. (2001). Stroop performance in focal lesion patients: dissociation of processes and frontal lobe lesion location . Neuropsychologia 39 , 771–786. 10.1016/S0028-3932(01)00013-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Swick D., Jovanovic J. (2002). Anterior cingulate cortex and the Stroop task: neuropsychological evidence for topographic specificity . Neuropsychologia 40 , 1240–1253. 10.1016/S0028-3932(01)00226-3 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tremblay M. P., Potvin O., Belleville S., Bier N., Gagnon L., Blanchet S., et al. (2016). The victoria stroop test: normative data in Quebec-French adults and elderly . Arch. Clin. Neuropsychol. 31 , 926–933. 10.1093/arclin/acw029 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Trenerry M. R., Crosson B., DeBoe J., Leber W. R. (1989). Stroop Neuropsychological Screening Test . Odessa, FL: Psychological Assessment Resources. [ Google Scholar ]
  • Troyer A. K., Leach L., Strauss E. (2006). Aging and response inhibition: normative data for the Victoria Stroop Test . Aging Neuropsychol. Cogn. 13 , 20–35. 10.1080/138255890968187 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Valgimigli S., Padovani R., Budriesi C., Leone M. E., Lugli D., Nichelli P. (2010). The Stroop test: a normative Italian study on a paper version for clinical use . G. Ital. Psicol. 37 , 945–956. 10.1421/33435 [ CrossRef ] [ Google Scholar ]
  • Van der Elst W., Van Boxtel M. P., Van Breukelen G. J., Jolles J. (2006). The Stroop Color-Word Test influence of age, sex, and education; and normative data for a large sample across the adult age range . Assessment 13 , 62–79. 10.1177/1073191105283427 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vendrell P., Junqué C., Pujol J., Jurado M. A., Molet J., Grafman J. (1995). The role of prefrontal regions in the Stroop task . Neuropsychologia 33 , 341–352. 10.1016/0028-3932(94)00116-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Venneri A., Molinari M. A., Pentore R., Cotticelli B., Nichelli P., Caffarra P. (1992). Shortened Stroop color-word test: its application in normal aging and Alzheimer's disease . Neurobiol. Aging 13 , S3–S4. 10.1016/0197-4580(92)90135-K [ CrossRef ] [ Google Scholar ]
  • Waldrop-Valverde D., Ownby R. L., Jones D. L., Sharma S., Nehra R., Kumar A. M., et al.. (2015). Neuropsychological test performance among healthy persons in northern India: development of normative data . J. Neurovirol. 21 , 433–438. 10.1007/s13365-015-0332-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zajano M. J., Gorman A. (1986). Stroop interference as a function of percentage of congruent items . Percept. Mot. Skills 63 , 1087–1096. 10.2466/pms.1986.63.3.1087 [ CrossRef ] [ Google Scholar ]
  • Zalonis I., Christidi F., Bonakis A., Kararizou E., Triantafyllou N. I., Paraskevas G., et al.. (2009). The stroop effect in Greek healthy population: normative data for the Stroop Neuropsychological Screening Test . Arch. Clin. Neuropsychol. 24 , 81–88. 10.1093/arclin/acp011 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zimmermann N., Cardoso C. D. O., Trentini C. M., Grassi-Oliveira R., Fonseca R. P. (2015). Brazilian preliminary norms and investigation of age and education effects on the Modified Wisconsin Card Sorting Test, Stroop Color and Word test and Digit Span test in adults . Dement. Neuropsychol. 9 , 120–127. 10.1590/1980-57642015DN92000006 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Share full article

Advertisement

Supported by

What New Love Does to Your Brain

Roses are red, violets are blue. Romance can really mess with you.

An illustration of two heads facing each other; flowers grow out of the heads and they join together in the middle.

By Dana G. Smith

New love can consume our thoughts, supercharge our emotions and, on occasion, cause us to act out of character.

“People pine for love, they live for love, they kill for love and they die for love,” said Helen Fisher, a senior research fellow at the Kinsey Institute at Indiana University. “It’s one of the most powerful brain systems the human animal has ever evolved.”

Scientists have studied what is happening in our brains when we are in those early, heady days of infatuation, and whether it can actually alter how we think and what we do. Their findings suggest that song lyrics and dramatic plotlines don’t overstate it: New love can mess with our heads.

Experts define “romantic love” as a connection deeper than lust, but distinct from the attachment associated with a long-term partnership. In a few of the small studies that have examined this googly-eyed state, researchers put people in the early stages of a romantic relationship (typically less than a year) in M.R.I. scanners to see what was happening in their brains while they looked at pictures of their paramours. They found that the participants showed increased activity in areas of the brain that are rich in the neurochemical dopamine and control feelings of wanting and desire. These regions are also activated by drugs like cocaine, leading some experts to liken love to a sort of “ natural addiction .”

Studies on prairie voles (yes, you read that right) back up these findings. The rodents are one of the few mammal species that mate for life, so researchers sometimes use them as a scientific model for human partnerships. Studies show that when these animals pair up, the brain’s reward system is similarly activated, triggering the release of dopamine.

“Romantic love does not emanate from your cerebral cortex, where you do your thinking; it does not emanate from the brain regions in the middle of your head, linked with the limbic areas, linked with emotions,” said Dr. Fisher, who conducted one of the first human studies on the topic and, along with her role at the Kinsey Institute, is the chief science adviser to Match.com. “It’s based in the brain regions linked with drive, with focus, with motivation.”

This type of dopamine activity may explain why, in the early stages of love, you have the irresistible urge to be with your beloved constantly — what the addiction literature calls “craving.” Indeed, preliminary research conducted by Sandra Langeslag, an associate professor in behavioral neuroscience at the University of Missouri, St. Louis, suggests that some people crave their lover like they crave a drug.

In one of the few studies to directly compare love and addiction, which is still ongoing and has not yet been published, Dr. Langeslag showed 10 people who vaped nicotine either pictures of their lover or pictures of other people vaping (a classic experiment used to invoke craving). The participants ranked their desire to be with their partner higher than their desire to vape.

Other research by Dr. Langeslag’s lab looked at the single-mindedness of love — of being unable to think about anything besides your paramour. In a series of small studies on people in the throes of new love, Dr. Langeslag found that participants reported thinking about the object of their desire roughly 65 percent of their waking hours and said they had trouble focusing on unrelated topics. However, when people were prompted with information related to their beloved, they showed increased attention and had enhanced memory .

There is also some evidence that love can render people oblivious to a new partner’s faults — the “love is blind” phenomenon. Lucy Brown, a professor of neuroscience at Albert Einstein College of Medicine, found that when some study participants were shown pictures of their lover early in a relationship, they had less activity in a part of the prefrontal cortex that is important for decision-making and evaluating others. The findings suggest that we might “suspend negative judgments of the person we’re in love with,” she said.

If love can alter our motivation and attention, perhaps it’s no surprise that people sometimes go to extremes when they’re in its thrall. But giving into your obsession with your lover isn’t necessarily “irrational” behavior, at least from an evolutionary perspective, Dr. Langeslag said.

Scientists believe humans evolved to have these types of responses — which seem to be consistent across age, gender and culture — because bonding and mating are essential for the survival of the species.

“Romantic love is a drive,” Dr. Fisher said. “It’s a basic mating drive that evolved millions of years ago to send your DNA onto tomorrow. And it can overlook just about anything.”

Dana G. Smith is a Times reporter covering personal health, particularly aging and brain health. More about Dana G. Smith

A Guide to Better Romantic Relationships

Looking to build a long-lasting partnership we can help..

Overwhelmed by dating apps, profiles and not-quite-matches? Here’s how to start fresh .

We asked 14 psychologists, counselors and therapists for book recommendations that can help nourish relationships. These seven titles rose to the top of the list .

Ignoring a partner in favor of your phone, or “phubbing,” can lead to feelings of distrust and ostracism. Here’s how to stop .

Fighting with your partner? These sentences can help you share grievances in a more constructive way . And here are the things you should avoid saying .

Managing libido differences  is a common part of relationships. Here’s some advice that may help .

Do you worry that you and your partner are growing apart? Here are simple but helpful questions to ask before it is too late .

IMAGES

  1. Testing the Stroop Effect Research Paper Example

    research paper on stroop effect

  2. (PDF) The Stroop Effect From a Mixture of Reading Processes: A Fixed

    research paper on stroop effect

  3. Stroop Effect Research Paper Example APA Style

    research paper on stroop effect

  4. (DOC) Hypothesis stroop effect

    research paper on stroop effect

  5. I Need Help on My Research Paper on the Stroop Effect (400 Words

    research paper on stroop effect

  6. The Stroop Effect

    research paper on stroop effect

VIDEO

  1. ടെൻഷൻ കുറയ്ക്കാനും Brain Power കൂട്ടാനും ഈ വ്യായാമം ചെയ്ത് നോക്കൂ

  2. Stroop Effect

  3. Paper Effect *TUTORIAL* #graphicdesign #adobeillustrator #adobeillustratortutorial

  4. Stroop effect വർക്ക് ആണ് ഒന്ന് നോക്കു നിങ്ങൾക്ക് മനസിലാവും

  5. Behavioral Economics

  6. Stroop Effect Demo

COMMENTS

  1. The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection

    PMID: 34389901 The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection Benjamin A. Parris, 1 Nabil Hasshim, 1,2,5 Michael Wadsley, 1 Maria Augustinova, 3 and Ludovic Ferrand 4

  2. The loci of Stroop effects: a critical review of methods and evidence

    12 Altmetric Explore all metrics Abstract Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation).

  3. Stroop effects from newly learned color words: effects of memory

    We also included control stimuli to test whether the novel word Stroop effect is driven by inhibition, facilitation, or a combination of both. Despite a significantly shortened learning phase and the inclusion of control trials, the novel-word Stroop effect was still present, and notably so in the Stroop block immediately after learning.

  4. What Stroop tasks can tell us about selective attention from childhood

    The Stroop effect refers to our tendency to experience difficulty (conflict or interference) naming a physical colour (we use the term 'hue') when it is used to spell the name of a different colour (the incongruity effect), but not when we simply read out colour words (Stroop, 1935 ).

  5. The Stroop effect occurs at multiple points along a cascade of control

    The Stroop effect occurs at multiple points along a cascade of control: Evidence from cognitive neuroscience approaches. Banich, M. T. (2019). The Stroop effect occurs at multiple points along a cascade of control: Evidence from cognitive neuroscience approaches. Frontiers in Psychology, 10, Article 2164. https://

  6. (PDF) Half A Century of Research on the Stroop Effect

    DOI: 10.1037/0033-2909.109.2.163 Source PubMed Authors: Colin M Macleod University of Waterloo Abstract and Figures The literature on interference in the Stroop Color-Word Task, covering over 50...

  7. [PDF] The Stroop Effect

    The Stroop effect is one of the best known phenomena in all of cognitive science and indeed in psychology more broadly. It is also one of the most long standing, having been reported by John Ridley Stroop in the published version of his dissertation in 1935 [1].

  8. Reclaiming the Stroop Effect Back From Control to Input-Driven

    According to a growing consensus, the Stroop effect is understood as a phenomenon of conflict and cognitive control. A tidal wave of recent research alleges that incongruent Stroop stimuli generate conflict, which is then managed and resolved by top-down cognitive control.

  9. Meta-Analysis of Social Presence Effects on Stroop Task ...

    In this paper, we conducted a meta-analytic review to examine the impact of social presence on individuals' performance on the Stroop task, shedding light on the cognitive processes underlying social facilitation. We followed PRISMA guidelines to identify and include 33 relevant studies in a multivariate random-effects meta-analysis.

  10. The loci of stroop effects: A critical review of methods and evidence

    Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in ...

  11. What Stroop tasks can tell us about selective attention from childhood

    A rich body of research concerns causes of Stroop effects plus applications of Stroop. However, several questions remain. We included assessment of errors with children and adults (N = 316), who completed either a task wherein each block employed only trials of one type (unmixed task) or where every block was comprised of a mix of the congruent, neutral, and incongruent trials. Children ...

  12. (PDF) Replicating the Stroop Effect

    Replicating the Stroop Effect Authors: Gabriela Gomez Nova Southeastern University A replication study based on J. Ridley Stroop's original 1935 experiment titled "Studies of Interference in...

  13. eStroop: Implementation, Standardization, and Systematic Comparison of

    The Stroop effect is a well-documented phenomenon, demonstrating both interference and facilitation effects. ... In this paper, we implement "eStroop": a new digital version based on verbal responses, measuring the main processes involved in the traditional effect. eStroop features four categories of stimuli in four different colors: (1 ...

  14. The Stroop Effect Occurs at Multiple Points Along a Cascade of Control

    For purposes of this paper, we will define the Stroop effect as the inference that occurs between two dimensions of stimulus, one of which is task-relevant and one of which is task-irrelevant.

  15. [PDF] Half a century of research on the Stroop effect: an integrative

    It is concluded that recent theories placing the explanatory weight on parallel processing of the irrelevant and the relevant dimensions are likely to be more sucessful than are earlier theories attempting to locate a single bottleneck in attention. The literature on interference in the Stroop Color-Word Task, covering over 50 years and some 400 studies, is organized and reviewed. In so doing ...

  16. The Stroop effect and mental imagery

    According to the second one, the Stroop effect is about conflict monitoring and control: there are control mechanisms that detect the conflict between the linguistic and the color stimulus and they prioritize the processing of the language stimulus ( Botvinick et al., 2001 ). 1

  17. Stroop Effect Experiment in Psychology

    The Stroop effect refers to a delay in reaction times between congruent and incongruent stimuli (MacLeod, 1991). Congruency, or agreement, occurs when a word's meaning and font color are the same. For example, if the word "green" is printed in green. Incongruent stimuli are just the opposite.

  18. PDF The Stroop Effect

    The Stroop effect is one of the best known phenomena in all of cognitive science and indeed in psychology more broadly. It is also one of the most long standing, having been reported by John Ridley Stroop in the published version of his dissertation in 1935 [1].

  19. Half a century of research on the Stroop effect: An integrative review

    The literature on interference in the Stroop Color and Word Test, covering over 50 yrs and some 400 studies, is organized and reviewed. In so doing, a set of 18 reliable empirical findings is isolated that must be captured by any successful theory of the Stroop effect. Existing theoretical positions are summarized and evaluated in view of this critical evidence and the 2 major candidate ...

  20. The Stroop Effect on Color and Word Identification

    Effect of Abrupt Onsets Attentional Capture by the Color Word in the Stroop Task. April 2015 · Korean Journal of Cognitive and Biological Psychology. 박영은. 조양석. PDF | On Dec 12, 2015 ...

  21. Stroop Interference, Practice, and Aging

    As is true for the cognitive literature in general (cf. MacLeod, 1991 ), the Stroop effect is a mainstay of research on age-related differences in selective attention, automaticity, inhibitory processes, and executive control.

  22. The reverse Stroop effect

    In the experiment reported here, this effect was shown to all but disappear when the response was simply to point to a matching patch of color. Conversely, strong reverse Stroop interference occurred with the pointing task. That is, when the sensory color of a color word was incongruent with that word, responses to color words were delayed by ...

  23. Experiment in Cognition: Stroop Effect Research Paper

    The current research paper is aimed at investigating the effect of interference in cognitive processes, which is known under the title of the Stroop effect. The study introduces the overview of the research conducted on the topic within a century of scientific work in the field of psychology.

  24. Journal of Medical Internet Research

    Background: With the advent of a new era for health and medical treatment, characterized by the integration of mobile technology, a significant digital divide has surfaced, particularly in the engagement of older individuals with mobile health (mHealth). The health of a family is intricately connected to the well-being of its members, and the use of media plays a crucial role in facilitating ...

  25. The Stroop Color and Word Test

    The Stroop Color and Word Test (SCWT) is a neuropsychological test extensively used to assess the ability to inhibit cognitive interference that occurs when the processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute, well-known as the Stroop Effect. The aim of the present work is to verify ...

  26. How Love and Romance Affect Your Brain

    Roses are red, violets are blue. Romance can really mess with you. By Dana G. Smith New love can consume our thoughts, supercharge our emotions and, on occasion, cause us to act out of character ...