Graphical presentation of confounding in directed acyclic graphs
Abstract. Since confounding obscures the real effect of the exposure, it is important to adequately address confounding for making valid causal inferences from
Abstract
Since confounding obscures the real effect of the exposure, it is important to adequately address confounding for making valid causal inferences from observational data. Directed acyclic graphs (DAGs) are visual representations of causal assumptions that are increasingly used in modern epidemiology. They can help to identify the presence of confounding for the causal question at hand. This structured approach serves as a visual aid in the scientific discussion by making underlying relations explicit. This article explains the basic concepts of DAGs and provides examples in the field of nephrology with and without presence of confounding. Ultimately, these examples will show that DAGs can be preferable to the traditional methods to identify sources of confounding, especially in complex research questions.
INTRODUCTION
Traditionally, the gold standard of investigating a causal relationship is an experiment. For example, to investigate the effect of erythropoietin on blood pressure in patients with chronic kidney disease (CKD), the ideal experiment would be a randomized controlled trial. Randomization is especially important when investigating intended treatment effects to avoid confounding by indication [1]. By randomly assigning erythropoietin versus control treatment, we aim to make groups that are comparable with respect to their risk of developing hypertension. Provided the study is of sufficient size, all other factors influencing blood pressure will be more or less equally distributed between erythropoietin and control groups and therefore any difference in blood pressure at the end of the study can be attributed to the erythropoietin.
However, most questions on causal mechanisms of disease cannot be studied in randomized trials and we must rely on results of observational studies [2]. For instance, it is unethical to randomly expose people to cigarette smoke or lead exposure to study their effect on kidney function, as negative effects can be foreseen. Other determinants of interest, like sex, cannot be assigned. But unlike well-performed randomized trials, observational studies often suffer from an inherent incomparability between the exposed and the unexposed. For example, when studying the effect of smoking on the risk of renal disease the tendency of smokers having an unfavourable lifestyle, like high alcohol or salt intake, could distort the comparison. If these other factors are also causes of renal disease, the effect of the exposure, in this case smoking, is easily confounded by the effect of those other factors. This mixing of effects is better known as confounding [3].
For making valid causal inferences from observational data, it is important to adequately address confounding. However, confounding is not always easy to recognize. In the traditional definition, a confounder is a factor that is associated with the exposure, with the outcome and it is not in the causal path between the exposure and outcome [4]. Although this definition of a confounder is clear, we will show later that it may be insufficient in practice. Causal diagrams called directed acyclic graphs (DAGs) are increasingly used in modern epidemiology, mainly due to the popularization of this technique by Sander Greenland and, more recently, Miguel Hernan [5–9]. DAGs provide a structured way to present an overview of the causal research question and its context. They serve as a visual representation of causal assumptions by making underlying relations explicit [8]. DAGs can therefore help to identify the presence of confounding and ways to resolve it. This article aims to introduce DAGs as a useful tool to present a causal research question and to identify confounding.
First, the traditional definition of a confounder will be discussed. Then, the basic aspects of DAGs will be explained using several examples with and without presence of confounding. In addition, we will discuss how DAGs can be used to determine the most efficient way to deal with the identified confounding. For educational purposes, the DAGs in this article are used as simple examples and are assumed to represent the truth.
TRADITIONAL DEFINITION OF CONFOUNDING
Example 1. We are assessing the causal relationship between CKD and mortality. Is confounding by age present?
Traditionally, a confounder is defined by three criteria. First, it must have an association with the outcome, meaning that it should be a risk factor for the outcome. Second, it must be associated with the exposure. Last, it must not be in the causal path from exposure to outcome, thus not be a consequence of the exposure [4]. Using these criteria, age classifies as a confounder in the relationship between CKD and mortality. In the general population, people with CKD are on average older than people without CKD. Among elderly subjects, the risk of mortality is also higher. Therefore, if we would just compare mortality risk in patients with CKD to patients without CKD, we would indirectly compare old with young people. Age is associated with the exposure CKD, a risk factor for the outcome but not a consequence of the exposure. As identified with the traditional method, the effect of CKD on mortality is mixed with the effect of age and confounding by age is present. Usually we would want to remove this confounding effect of age, and in order to do so we must first have identified potential confounding. We will show that DAGs provide an extension and more formalized way of the traditional method to identify confounding.
DAGS
A DAG is a directed acyclic graph (Figure 1). A graph is called directed if all variables in the graph are connected by arrows. Arrows in DAGs represent direct causal effects of one factor on another, either protective or harmful [9]. A cause is a factor that produces an effect on another factor. The causal nature of such a factor is inferred from the fact that the effect is no more observed when the factor in question is (hypothetically) removed. Causes are seldom sufficient or necessary, especially in a multifactorial disease such as CKD. An arrow reflects a causal pathway: one factor causes the other and not the other way around. The arrows and their direction are based on a priori knowledge. A path in a DAG is a sequence of arrows connecting the exposure and outcome studied, irrespective of the direction of the arrows. A directed path is a sequence of arrows in which every arrow points in the same direction. The graphs are acyclic because causes always precede their effects, i.e. the future cannot cause the past. In DAGs, this means that no directed path can form a closed loop [8]. Thus one can never start from one factor, follow the direction of the arrows and then end up at the same factor [9]. To increase the readability of a DAG, it is therefore good practice to insert a chronology, with causes left from their effects. For clarity and explanatory purposes, we indicate the research question at hand with a question mark above the arrow from exposure to outcome.
FIGURE 1:
A graphical presentation of confounding in DAGs. (a) The structure of confounding in DAGs. Since age is a common cause of CKD and mortality, confounding is present when we want to assess the causal relationship between the exposure CKD and the outcome mortality (b). The backdoor path from CKD via age to mortality can be blocked by conditioning on age, as depicted by a box around age in (c). Similarly, ethnicity is a common cause of obesity and decline in kidney function (d). The backdoor path from obesity via ethnicity to decline in kidney function can be blocked by conditioning on ethnicity. If ethnicity is not measured or not properly measured, residual confounding remains present.
CONFOUNDING IN DAGS
Figure 1a shows the general structure of confounding in a DAG and Figure 1b shows the DAG of the first example, in which confounding by age was identified in the causal relationship between CKD and mortality. The arrows are drawn based on a priori knowledge. In this case, age is a cause of both CKD and mortality. Therefore, the arrows point away from age towards CKD as well as towards mortality. Age is thus a common cause of CKD and mortality. The presence of a common cause in a DAG is equivalent to the presence of confounding. The DAG in Figure 1b indicates two paths from CKD to mortality. One path leads directly from CKD to mortality, representing the effect of CKD on mortality, which is the research question at hand. There is, however, another path from CKD to mortality, via their common cause age. In DAG terms, this path is called a backdoor path because it starts with an arrowhead towards CKD, the exposure. Thus, the presence of a common cause or backdoor path in a DAG identifies the presence of confounding. A DAG represents an overview of all causes in the causal mechanism under study. When a DAG contains all relevant variables and their causal relationships, that is the exposure, outcome and their context, the presence of ‘confounding’ in general can be identified. This is inherently different from the traditional three criteria approach, in which every factor is judged as a ‘confounder’ separately. Therefore, in DAGs we do not speak of ‘confounders’ but only of ‘confounding’.
HOW TO DEAL WITH CONFOUNDING AND ITS REPRESENTATION IN DAGS
Since confounding obscures the real effect of an exposure, the effect of confounding should be removed as much as possible. In the analysis phase, this can be done by means of restriction, stratification and subsequent pooling, or by adjusting in multivariable regression analysis. For instance in the previous example, the relationship between CKD and mortality could be assessed in different age categories separately. In these separate groups with the same age, confounding by age cannot be present. All methods accomplish the same: they allow the estimation of the causal effect of the exposure on the outcome in the absence of confounding effects. In DAG terms, adjusting for confounding by means of restriction, stratification or multivariable analysis is called conditioning. In a DAG, conditioning on a factor is often depicted by a box around this factor, which is a graphic indication that the backdoor path from the exposure to the outcome that went through the common cause is blocked. Since this backdoor path is blocked, the confounding has been removed. An example of this is shown in Figure 1c. In the remainder of this article, the terms ‘adjusting for’ and ‘conditioning on’ a factor are used interchangeably to indicate that this factor is included in the analysis in order to reduce confounding.
RESIDUAL CONFOUNDING
Example 2. We are assessing the causal relationship between obesity and decline in kidney function. Is confounding by ethnicity present?
Suppose the aim is to study the causal relationship between obesity and decline in kidney function. It has been shown that black patients have a faster decline in kidney function and progression to end-stage renal disease [10]. Also, obesity rates are higher in African American patients than in white patients [11]. Ethnicity could therefore be regarded as a cause of decline in kidney function and a cause of obesity. Therefore, in the DAG in Figure 1d the arrows point away from ethnicity towards obesity and decline in kidney function. Ethnicity is thus a common cause of obesity and decline in kidney function and a backdoor path from obesity via ethnicity to decline in kidney function is identified. We conclude that confounding is present and we should condition on ethnicity to remove confounding. It is, however, possible to identify confounding in a DAG that is impossible to adjust for. For instance, it could be that physicians did not record ethnicity, and ethnicity is thus unavailable in the data analyses. The investigator cannot adjust for a factor that is not measured. Similarly, it is possible that adjustments are only partly successful in controlling for confounding. For example, even if ethnicity was recorded and adjusted for in the analyses, some residual confounding can remain present. The reason for this is that self-reported or physician-reported race does not always completely represent the racial background of an individual. When confounding is unknown, unmeasured or even partially measured and adjusted for, residual confounding will remain present. This is also the problem with confounding by indication. A physician’s treatment decision is based on many factors, including the physician’s preference and estimation of the patient’s outcome, and it is almost impossible to completely measure all these factors. Randomized controlled trials are therefore the best way to avoid confounding by indication [1, 12].
NO CONFOUNDING: MEDIATION
Example 3. We are assessing the causal relationship between ethnicity and decline in kidney function. Is confounding by obesity present?
Suppose this time we want to study the causal relationship between ethnicity and decline in kidney function and want to determine if confounding by obesity is present. In the DAG, ethnicity is the exposure and decline in kidney function the outcome. Again the arrow from ethnicity to obesity is drawn, because obesity rates are higher in African American patients than in white patients. Furthermore, a higher body mass index is associated with a faster decline in kidney function [13], so an arrow from obesity to decline in kidney function can be drawn. The DAG in Figure 2a shows that obesity is not a common cause of ethnicity and decline in kidney function and we can conclude that there is no confounding by obesity. The path from ethnicity via obesity to decline in kidney function is not a backdoor path, as the first arrow points away from the exposure ethnicity. Obesity is not a cause of ethnicity, but ethnicity can be regarded as a cause of obesity. Obesity is therefore in the causal pathway between ethnicity and decline in kidney function. Part of the effect of ethnicity on the decline in kidney function is via obesity, thus the effect of ethnicity is mediated by obesity. This is also captured in the last part of the traditional definition of a confounder: it should not be in the causal path between exposure and outcome. If we would adjust for obesity (sometimes called ‘overadjustment’) [4], thereby comparing black with white patients within the same level of obesity, we would take away the effect of obesity on the decline of kidney function. Then part of the effect of ethnicity that is mediated through obesity is not accounted for and the total effect of ethnicity on decline of kidney function would be underestimated. Of course, these decisions on modelling depend on the research question being asked. We are interested in the total causal effect of ethnicity on decline of kidney function and therefore do not adjust for obesity, because there is no confounding by obesity. If one wants to know why ethnicity has an effect on decline of kidney function, we could deliberately adjust for obesity to see which part of the effect of ethnicity is mediated by obesity or perform more advanced mediation analysis [14, 15]. Importantly, the interpretation of results should be consistent with the performed analyses and a DAG can be a useful tool in this process. In our specific example, the DAG shows that obesity is a mediator and therefore there is no confounding by obesity present in the causal relationship between ethnicity and decline in kidney function. This is in contrast to the previous example, in which confounding by ethnicity was identified in the causal relationship between obesity and decline in kidney function.
FIGURE 2:
No confounding: mediation. The path from the exposure to outcome via mediator (a) is not a backdoor path, because it does not start with an arrowhead towards the exposure. Therefore, no confounding by obesity is present in the causal relation between ethnicity and decline in kidney function (b).
NO CONFOUNDING: COMMON EFFECT
Example 4. Assessing the causal effect of lead poisoning on developing polycystic kidney disease. Is confounding by glomerular filtration rate (GFR) present?
Before we knew that polycystic kidney disease (PKD) was a genetic disorder, we could have hypothesized that lead poisoning could cause PKD. Of course now we know that these two are not causally related, but in reality also sometimes without knowing it we study a causal relationship that at a later stage turns out to be absent. In this case, the question is whether confounding by glomerular filtration rate (GFR) is present. A valid question it seems, since a priori knowledge shows that GFR is associated with both lead poisoning and PKD and not in the causal path between lead poisoning and PKD. By drawing a DAG, the causal assumptions about the underlying relations are being made explicit. In this case, lead poisoning is a cause of renal failure, affecting GFR. GFR is thus an effect of lead poisoning and the arrow points from lead poisoning, our exposure, to GFR. PKD is also a cause of renal failure. Again, the arrow is drawn from PKD to GFR. The resulting DAG is depicted in Figure 3a. There is no backdoor path via GFR, because GFR is not a common cause of lead poisoning and PKD. The DAG therefore shows that GFR does not cause confounding. The traditional definition would also not identify GFR as a confounder, because although GFR is associated with the outcome, GFR is not a risk factor for or cause of PKD. In contrast, the DAG clearly shows that GFR is a common effect of lead poisoning and PKD. In DAG terms, a common effect is called a collider, because two arrowheads collide at this factor. A collider blocks a path. So, before we knew about genetics, what would have happened if we wanted to investigate the causal relationship between lead poisoning and PKD and would we falsely adjust for GFR? In the extreme case, imagine that lead poisoning and PKD are the only two causes of kidney disease. If we only conduct our study in patients with a low GFR, then absence of lead poisoning would perfectly predict the presence of PKD, because otherwise the patient would not have had a low GFR. In addition, the absence of PKD would perfectly predict the presence of lead poisoning. So restricting our study to only those patients with a low GFR leads to an inverse association between lead poisoning and PKD. We would have concluded that lead poisoning has a protective effect on PKD, although we know now that PKD is a genetic disorder and there is actually no causal effect. This demonstrates that adjusting for a variable that is a common effect of the exposure and outcome —a collider—can introduce erroneous results. In DAG terms, conditioning on a collider opens a path. This bias is called collider-stratification bias and is extensively discussed in the literature [16, 17]. Collider-stratification bias is an example of selection bias, which will be discussed and explained in DAGs in a separate paper. We refer to Box 1 for a more technical overview of confounding in DAGs.
Box 1
DAG: directed acyclic graph
-
Directed: the factors in the graph are connected with arrows, the arrows represent the direction of the causal relationship
-
Acyclic: no directed path can form a closed loop, as a factor cannot cause itself DAG definitions and identifying confounding [18]
-
A ‘path’ is a sequence of arrows, irrespective of the direction of the arrows
-
A ‘directed path’ is a sequence of arrows in which every arrow points in the same direction, representing the causal relationship.
-
A ‘backdoor path’ is a sequence of arrows from exposure to outcome that starts with an arrowhead towards the exposure and ends with an arrowhead towards the outcome (Figure 1a and b)
-
Two factors are associated if they are connected by an ‘open path’
-
A ‘collider’ is a common effect; a factor on which two arrowheads collide (Figure 3a)
-
A collider blocks a path
-
A collider that has been conditioned on no longer blocks a path; conditioning on a collider could therefore introduce a form of selection bias and should be done with caution. See also [16, 17]
-
Any path that contains non-colliders is open, unless a non-collider has been conditioned on, then it is blocked (Figure 1c)
-
‘Blocked paths’ do not affect the direct causal relationship between the exposure and the outcome
-
‘Confounding’ is identified by an open backdoor path
-
The causal relationship between exposure and outcome will be unconfounded if the only open paths from exposure to outcome are directed paths from exposure to outcome [18]
FIGURE 3:
No confounding: collider. A collider is a common effect (a). GFR is a common effect of lead poisoning and polycystic kidney disease (b). The path from lead poisoning to polycystic kidney disease via GFR is not a backdoor path, it is blocked by collider GFR. Therefore, no confounding by GFR is present in the causal relationship between lead poisoning and polycystic kidney disease.
USE OF DAGS TO IDENTIFY A MINIMUM SET OF FACTORS TO ELIMINATE CONFOUNDING
So far, the traditional approach identified the same sources of confounding as with the DAG approach. So how do DAGs improve on the traditional approach? In the traditional approach, the three criteria are applied for each ‘potential confounder’ separately. In DAGs, all assumptions on all factors and their relationships in a causal mechanism are made explicit in order to identify confounding in general. As a consequence, DAGs allow the investigator to oversee all information needed to judge whether conditioning on a certain factor might introduce collider-stratification bias, something that is not possible in the traditional three criteria approach which only focuses on a single factor. Furthermore, because DAGs provide an overview of the causal relationships, they allow the investigator to identify a minimum but sufficient set of factors to adjust for in the analysis to remove confounding [19]. For illustration, let us go back to the first simple example in which the relationship between CKD and mortality was confounded by age. This DAG could be extended as presented in Figure 4a. In this example, the effect of age on mortality is caused through two mechanisms, i.e. a higher incidence of cancer and dementia in the elderly. In the traditional definition of a confounder, we would probably conclude that we should adjust for age, cancer and dementia, because all three are associated with the exposure, are risk factors for the outcome and are not in the causal path between CKD and mortality. However, the DAG shows that it is sufficient to only adjust for age to eliminate the confounding, because the backdoor path is blocked by adjusting for the common cause age. Note, this is only true in this simplified example in which we assume that cancer and dementia do not directly affect the presence of CKD. It can be argued that cancer also causes CKD, which could be a valid assumption for renal cancer or other types of cancer that will be treated with nephrotoxic chemotherapy. Then, an arrow should also be drawn from cancer to CKD, as depicted in Figure 4b. In that case, two backdoor paths would be identified: the first via age and then cancer and dementia, as in Figure 4a, and the second via common cause cancer. Although in Figure 4a it is sufficient to adjust for age to block the backdoor paths and eliminate confounding, in Figure 4b it is necessary to adjust for two factors to eliminate confounding. The two backdoor paths can be blocked by either adjusting for age and cancer, or by adjusting for cancer and dementia. The use of DAGs allows for better insight in the assumed causal mechanisms and can aid in the discussion and selection of factors to adjust for in order to remove the confounding. Readers interested in examples of more complex causal mechanisms can refer to articles of Hernan or Shrier [9, 20]. DAGs can be drawn by hand, but several computer-based approaches, such as DAGitty and dagR, have been developed to identify the minimal sufficient adjustment set [21, 22]. If drawn and discussed prior to data collection, DAGs may help identify the best and most parsimonious set of factors to be measured and adjusted for. This will prevent loss of statistical power and funds, but also avoids problems such as collider-stratification bias and collinearity [18, 19, 23].
FIGURE 4:
Identification of a minimal set of factors to resolve confounding. In (a), the backdoor path from CKD to mortality can be blocked by just conditioning on age, as depicted by the box around age. However if we assume that cancer also causes CKD (b), the backdoor paths can only be closed by conditioning on two factors, either age and cancer (as depicted) or cancer and dementia.
CONCLUSION
In the above examples, we demonstrated the use of DAGs as a visual aid in identifying the presence of confounding. For explanatory purposes, the examples were relatively easy with limited factors. Examples of more complex DAGs can be found elsewhere [9, 20]. Especially in more complex situations, DAGs can be preferable over the traditional definition of confounding as they allow to identify the presumed causal mechanism and thereby the possibility of collider-stratification bias with certain adjustments, as well as a minimum set of factors to adjust for to remove the unwanted confounding. These attributes are derived from the fact that all relevant factors and their causal relationships are depicted in DAGs in a chronologic order, with the question of whether confounding is present. As a result, relevant paths can be blocked whereas others will not be unblocked, all to remove confounding without inducing collider-stratification bias. In contrast, the traditional three criteria approach is based on a case-by-case judgement of whether a factor is a confounder, without any acknowledgement of the context. The use of DAGs in identifying confounding still relies on prior knowledge and assumed causal effects. It does therefore not tell anything about the truth of your assumptions. It may well be possible that different physicians have different beliefs on which factor causes the other and this may result in different choices regarding factors to adjust for. DAGs can aid in this discussion among physicians and researchers by providing a visual representation to discuss causal research questions by making the underlying assumptions about causal mechanisms explicit.
CONFLICT OF INTEREST STATEMENT
All authors declare no conflict of interest.
REFERENCES
1
.
Confounding by indication?
Epidemiology
1997
;
8
:
110
–
111
2
et al.
The valuable contribution of observational studies to nephrology
.
Kidney Int
2007
;
72
:
671
–
675
3
et al.
Confounding
.
Nephron Clin Pract
2010
;
116
:
c143
–
c147
4
et al.
Confounding: what it is and how to deal with it
.
Kidney Int
2008
;
73
:
256
–
260
5
et al.
Directed acyclic graphs helped to identify confounding in the association of disability and electrocardiographic findings: results from the KORA-Age study
.
J Clin Epidemiol
2014
;
67
:
199
–
206
6
et al.
Communication and medication refill adherence: the Diabetes Study of Northern California
.
JAMA Intern Med
2013
;
173
:
210
–
218
7
et al.
Triglycerides-diabetes association in healthy middle-aged men: modified by physical fitness? A long term follow-up of 1962 Norwegian men in the Oslo Ischemia Study
.
Diabetes Res Clin Pract
2013
;
101
:
201
–
209
8
.
Causal diagrams for epidemiologic research
.
Epidemiology
1999
;
10
:
37
–
48
9
et al.
Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology
.
Am J Epidemiol
2002
;
155
:
176
–
184
10
et al.
Differences in progression to ESRD between black and white patients receiving predialysis care in a universal health care system
.
Clin J Am Soc Nephrol
2013
;
8
:
1540
–
1547
11
et al.
Association of race and body mass index with ESRD and mortality in CKD stages 3–4: results from the Kidney Early Evaluation Program (KEEP)
.
Am J Kidney Dis
2013
;
61
:
404
–
412
12
.
Confounding by indication
.
Epidemiology
1996
;
7
:
335
–
336
13
et al.
Body mass index and early kidney function decline in young adults: a longitudinal analysis of the CARDIA (Coronary Artery Risk Development in Young Adults) Study
.
Am J Kidney Dis
2013
;
63
:
590
–
597
14
.
Mediation analysis
.
Annu Rev Psychol
2007
;
58
:
593
–
614
15
.
Mediation analysis in epidemiology: methods, interpretation and bias
.
Int J Epidemiol
2013
;
42
:
1511
–
1519
16
.
Quantifying biases in causal models: classical confounding vs collider-stratification bias
.
Epidemiology
2003
;
14
:
300
–
306
17
et al.
Illustrating bias due to conditioning on a collider
.
Int J Epidemiol
2010
;
39
:
417
–
420
18
.
Confounding
. In: (eds).
Causal Inference
.
Chapman & Hall/CRC
,
2014
, pp.
83
–
94
.19
.
Causal diagrams
. In: , eds.
Modern Epidemiology
.
Philadelphia
:
Lippincott Williams & Wilkins
,
2008
;
183
–
209
20
.
Reducing bias through directed acyclic graphs
.
BMC Med Res Methodol
2008
;
8
:
70
21
.
DAGitty: a graphical tool for analyzing causal diagrams
.
Epidemiology
2011
;
22
:
745
22
.
dagR: a suite of R functions for directed acyclic graphs
.
Epidemiology
2010
;
21
:
586
–
587
23
.
Confronting multicollinearity in ecological multiple regression
.
Ecology
2003
;
84
:
2809
–
2815
© The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
© The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Abstract. Since confounding obscures the real effect of the exposure, it is important to adequately address confounding for making valid causal inferences from
Graphical presentation of confounding in directed acyclic graphs
Our Facebook Page For Digital Marketing
website: site_url