Plot risk estimates from estimate_risk()

Generate horizontal bar charts of the risk estimates generated from estimate_risk()

Usage

plot_risk(
  risk_dat,
  add_to_dat = TRUE,
  collapse = is.data.frame(risk_dat),
  progress = TRUE,
  outcomes = "all",
  color_scheme = "single",
  color_dat = get_default_color("single"),
  color_for_last_group = get_default_color("threshold_final"),
  annotation = TRUE,
  legend = TRUE,
  lines = TRUE,
  line_text = TRUE,
  base_size = 12
)

Arguments

risk_dat

Data frame or list of data frames (required).This either needs to be from the output of estimate_risk() or needs to match the output format of risk estimation output of estimate_risk(). However, the input need not have all columns; in fact, the function will still run if risk_dat is a data frame with just a single column with one of the risk estimates.

add_to_dat

Logical (optional behavior variable): Whether to add the plots as a list column in the data frame passed via risk_dat, either TRUE or FALSE (the default is TRUE). This argument is strict, so 1 or 0 are not accepted, and moreover, anything other than TRUE will be treated as FALSE. If TRUE, plot_risk() will return the plots in a newly-created list-column containing the plots; if FALSE, plot_risk() will return the plots directly and only the plots (no data frame component). See the "Value" section for more information.

collapse

Logical (optional behavior variable): Whether to collapse the output into a single data frame if applicable, either TRUE or FALSE; this argument is strict, so 1 or 0 are not accepted, and moreover, anything other than TRUE will be treated as FALSE. This argument is only considered if add_to_dat is TRUE and risk_dat is a list of data frames (as happens with est_risk() when the collapse argument for that function is FALSE). This maintains consistency with the API for est_risk(). See the "Value" section for more information.

outcomes

Character (optional behavior variable): The outcome(s) to plot. This should be a character vector listing the outcomes in the order of desired plotting from top to bottom. The default of "all" gets translated internally to c("total_cvd", "ascvd", "heart_failure", "chd", "stroke"), and it overrides anything else that might be passed to outcomes.

color_scheme

Character (optional behavior variable): The color scheme to use, one of "single" (the default) or "categories". This argument is interdependent with argument color_dat.

color_dat

Character or data frame (optional behavior variable): This argument is interdependent with argument color_scheme.

If color_scheme = "single", color_dat should be a character vector of length 1 specifying the color to use, in a format recognized by R. For example, one can specify either "dodgerblue" or "#1e90ff" as input. One can also specify the argument as a call to rgb() (this just returns the hexadecimal color code as a character vector). The default is "#3c8dbc".
If color_scheme = "categories", color_dat should be a data frame with columns threshold (numeric, 0.001 < value < 0.999) and color (character, adhering to specifications delineated for indicating desired color when color_scheme = "single").
- The numeric limits are set as such (rather than, e.g., being inclusive or being 0 and 1) because the risk estimates only extend to 3 decimal places and it would not make sense to set a threshold of 0.001 or 0.999 (one could, of course, argue even less extreme values are also not terribly sensible, either, but 0.001 and 0.999 are unequivocal, simply based on how risk estimation and plotting work).
- The data frame can have up to 3 rows specifying pairs of threshold values and corresponding colors (threshold-color pairs) to use for values below the given threshold.
- A final threshold will always be added for values that are at or above the highest valid user-specified threshold in the data frame, and the color for this final threshold can be specified using the color_for_last_group argument.
- Thresholds should be entered in increasing order.
- The function will disregard threshold-color pairs where the threshold is empty, duplicated, or outside the aforementioned limits. The function will then sort the threshold-color pairs based on the threshold value to prevent problematic requests (e.g., threshold 1 at 0.15 and threshold 2 at 0.10 will be rearranged to 0.10 and 0.15, but the colors entered for those thresholds will be preserved during the sort).
- See details for further information.

color_for_last_group

Character (optional behavior variable): The color to use for the de facto last group (i.e., values that fall at or above the highest valid user-specified threshold in the color_dat data frame). This argument is only considered when color_scheme = "categories". The default is "#dd4b39". Entry should adhere to the specifications delineated for indicating desired color when color_scheme = "single". See details as well.

annotation

Logical (optional behavior variable): Whether to include a title and caption, with the title specifying the time frame of the estimates and the caption specifying the model used to generate the risk estimates. The default is TRUE.

legend

Logical (optional behavior variable): Whether to include a legend with the plot. Only considered when color_scheme = "categories". The default is TRUE.

lines

Logical (optional behavior variable): Whether to include vertical, dashed lines at the threshold values. Only considered when color_scheme = "categories". The default is TRUE.

line_text

Logical (optional behavior variable): Whether to include caption text describing the values at which the lines are drawn (i.e., the threshold values). Only considered when color_scheme = "categories" and lines = TRUE. The default is TRUE.

base_size

Numeric (optional behavior variable): The base font size to use for the plot. The default is 12. Invalid entries will be discarded in favor of the default.

Value

A ggplot object or list of ggplot objects, depending on the input format of risk_dat. If risk_dat is a list, the output will be a list of the same length of ggplot objects. If risk_dat is a data frame, the output will be a single ggplot object.

Because the function returns a ggplot object, it can be further customized, e.g.,


# Let:
#  `res` be the result of a valid `estimate_risk()` call
#   and
#  `res_10yr <- res$risk_est_10yr`

# Customization via {ggplot2}
p <- plot_risk(res_10yr)
# Note `labs()`, `theme()`, and `margin()` are from {ggplot2}, so one would
# need to get access to them via, e.g., `library(ggplot2)`, `ggplot2::` prefixing,
# `importFrom()` (if developing a package; for example, {preventr} `importFrom()`s
# all three), etc.
p + labs(title = "Lorem ipsum") + theme(plot.margin = margin(20, 20, 20, 20))
# etc.

# Easy to combine the 10- and 30-year plots if desired, e.g., via {patchwork}
# Basic example, side-by-side:
patchwork::wrap_plots(plot_risk(res))
# Many additional options for composing plots via {patchwork}

Details

Specifying `color_dat` and `color_for_last_group` when `color_scheme = "categories"`

See also the color_dat and color_scheme arguments. An example color_dat data frame might be something like the following:


 color_dat_v1 <- data.frame(
   threshold = 0.075,
   color = "#00A65A"
 )

 # or

 color_dat_v2 <- data.frame(
   threshold = c(0.075, 0.20),
   color = c("#00A65A", "#FFFDAF")
 )

 # or

 color_dat_v3 <- data.frame(
   threshold = c(0.075, 0.20, 0.35),
   color = c("#00A65A", "#FFFDAF", "#FF8C00")
 )

 # Not-great entries that will be cleaned

 color_dat_why_v1 <- data.frame(
   threshold = c(0.075, 0.20, 0.20),
   color = c("#00A65A", "#FFFDAF", "#FF8C00")
 )

 color_dat_why_v2 <- data.frame(
   threshold = c(0.35, 0.075, 1.5),
   color = c("#00A65A", "#FFFDAF", "#FF8C00")
 )

In all the above cases, users can specify color_for_last_group, and plot_risk() will automatically assign that color to values at or above the highest valid user-specified threshold. For example, suppose color_for_last_group is set as "#dd4b39" (this is also the default color for this argument). Using the above examples:

For color_dat_v1, "#00A65A" will be applied for values below 0.075, and "#dd4b39" will be applied for values at or above 0.075.
For color_dat_v2, "#00A65A" will be applied for values below 0.075, "#FFFDAF" will be applied for values at or above 0.075 and below 0.20, and "#dd4b39" will be applied for values at or above 0.20.
For color_dat_v3, "#00A65A" will be applied for values below 0.075, "#FFFDAF" will be applied for values at or above 0.075 and below 0.20, "#FF8C00" will be applied for values at or above 0.20 and below 0.35, and "#dd4b39" will be applied for values at or above 0.35.
For color_dat_why_v1, the function will clean the data frame to remove the duplicate threshold value, and the result will be the same as color_dat_v2.
For color_dat_why_v2, the function will clean the data frame to remove the invalid threshold value of 1.5, then the thresholds will be arranged in increasing order, so the final result will be threshold 1 being at 0.075 with color "#FFFDAF", threshold 2 being at 0.35 with color "#00A65A", and the final group being at or above 0.35 with color "#dd4b39".

Examples

res <- estimate_risk(
  age = 50, 
  sex = "female",    
  sbp = 160, 
  bp_tx = TRUE,      
  total_c = 200,     
  hdl_c = 45,        
  statin = FALSE,    
  dm = TRUE,         
  smoking = FALSE,   
  egfr = 90,
  bmi = 35
)
#> PREVENT estimates are from: Base model.

plot_risk(res)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> $risk_est_10yr
#> # A tibble: 1 × 9
#>   total_cvd ascvd heart_failure   chd stroke model over_years input_problems
#>       <dbl> <dbl>         <dbl> <dbl>  <dbl> <chr>      <int> <chr>         
#> 1     0.147 0.092         0.081 0.044  0.054 base          10 NA            
#> # ℹ 1 more variable: plot <list>
#> 
#> $risk_est_30yr
#> # A tibble: 1 × 9
#>   total_cvd ascvd heart_failure   chd stroke model over_years input_problems
#>       <dbl> <dbl>         <dbl> <dbl>  <dbl> <chr>      <int> <chr>         
#> 1      0.53 0.354          0.39 0.198  0.221 base          30 NA            
#> # ℹ 1 more variable: plot <list>
#> 

# Remove annotation (title and caption)
plot_risk(res, annotation = FALSE)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> $risk_est_10yr
#> # A tibble: 1 × 9
#>   total_cvd ascvd heart_failure   chd stroke model over_years input_problems
#>       <dbl> <dbl>         <dbl> <dbl>  <dbl> <chr>      <int> <chr>         
#> 1     0.147 0.092         0.081 0.044  0.054 base          10 NA            
#> # ℹ 1 more variable: plot <list>
#> 
#> $risk_est_30yr
#> # A tibble: 1 × 9
#>   total_cvd ascvd heart_failure   chd stroke model over_years input_problems
#>       <dbl> <dbl>         <dbl> <dbl>  <dbl> <chr>      <int> <chr>         
#> 1      0.53 0.354          0.39 0.198  0.221 base          30 NA            
#> # ℹ 1 more variable: plot <list>
#> 

# Plot only a subset of the outcomes
# (e.g., excluding total CVD and heart failure)

plot_risk(res, outcomes = c("ascvd", "chd", "stroke"))
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> $risk_est_10yr
#> # A tibble: 1 × 9
#>   total_cvd ascvd heart_failure   chd stroke model over_years input_problems
#>       <dbl> <dbl>         <dbl> <dbl>  <dbl> <chr>      <int> <chr>         
#> 1     0.147 0.092         0.081 0.044  0.054 base          10 NA            
#> # ℹ 1 more variable: plot <list>
#> 
#> $risk_est_30yr
#> # A tibble: 1 × 9
#>   total_cvd ascvd heart_failure   chd stroke model over_years input_problems
#>       <dbl> <dbl>         <dbl> <dbl>  <dbl> <chr>      <int> <chr>         
#> 1      0.53 0.354          0.39 0.198  0.221 base          30 NA            
#> # ℹ 1 more variable: plot <list>
#> 

# Need to plot risk estimates you already have? No problem.
risk_dat <- data.frame(
  total_cvd = 0.15,
  ascvd = 0.10,
  heart_failure = 0.05,
  chd = 0.07,
  stroke = 0.03
)

# Note because `risk_dat` does not specify the model or time horizon, these
# elements will not be included in the output, regardless of the value for the
# argument `annotation`
plot_risk(risk_dat)
#> Error in format.default(unlist(xx), ...): c("Found no format() method for class \"ggplot2::ggplot\"", "Found no format() method for class \"ggplot\"", "Found no format() method for class \"ggplot2::gg\"", "Found no format() method for class \"S7_object\"", "Found no format() method for class \"gg\"")

# ... but if you specify that as part of the data frame, it will plot it
risk_dat$model <- "base"
plot_risk(risk_dat)
#> Error in format.default(unlist(xx), ...): c("Found no format() method for class \"ggplot2::ggplot\"", "Found no format() method for class \"ggplot\"", "Found no format() method for class \"ggplot2::gg\"", "Found no format() method for class \"S7_object\"", "Found no format() method for class \"gg\"")

risk_dat$over_years <- 10
plot_risk(risk_dat)
#> Error in format.default(unlist(xx), ...): c("Found no format() method for class \"ggplot2::ggplot\"", "Found no format() method for class \"ggplot\"", "Found no format() method for class \"ggplot2::gg\"", "Found no format() method for class \"S7_object\"", "Found no format() method for class \"gg\"")

# Rest of examples limited to interactive sessions
if (FALSE) { # interactive()
  res_10yr <- res$risk_est_10yr
  res_30yr <- res$risk_est_30yr
  
  # Change color for `color_scheme = "single"`
  plot_risk(res_10yr, color_scheme = "single", color_dat = "darkgreen")
  
  # Use `color_scheme = "categories"`
  color_dat <- data.frame(
    threshold = c(0.075, 0.20),
    color = c("#00A65A", "#FF8C00")
  )
   
  plot_risk(res_30yr, color_scheme = "categories", color_dat = color_dat)
  
  # Change color for final group
  plot_risk(
    res_30yr, 
    color_scheme = "categories", 
    color_dat = color_dat,
    color_for_last_group = "maroon"
  )
  
  # Remove legend
  plot_risk(
    res_10yr,
    color_scheme = "categories",
    color_dat = color_dat,
    legend = FALSE
  )
  
  # Remove legend and lines
  plot_risk(
    res_10yr,
    color_scheme = "categories",
    color_dat = color_dat,
    legend = FALSE,
    lines = FALSE
  )
  
  # Remove legend and line text (but keep lines)
  plot_risk(
   res_10yr,
   color_scheme = "categories",
   color_dat = color_dat,
   legend = FALSE,
   line_text = FALSE
  )
  
  # Run `plot_risk()` on a data frame of combined results from `estimate_risk()`
  # (see examples from `estimate_risk()` for other ideas about how one can 
  # combine results)
  res_no_smoking <- estimate_risk(
    age = 50, 
    sex = "female",    
    sbp = 160, 
    bp_tx = TRUE,      
    total_c = 200,     
    hdl_c = 45,        
    statin = FALSE,    
    dm = TRUE,         
    smoking = FALSE,   
    egfr = 90,
    bmi = 35,
    time = "10yr"
  )
  
  res_smoking <- estimate_risk(
    age = 50, 
    sex = "female",    
    sbp = 160, 
    bp_tx = TRUE,      
    total_c = 200,     
    hdl_c = 45,        
    statin = FALSE,    
    dm = TRUE,         
    smoking = TRUE,   
    egfr = 90,
    bmi = 35,
    time = "10yr"
  )
  
  res_combined <- rbind(res_no_smoking, res_smoking)
  
  lapply(seq_len(nrow(res_combined)), function(x) plot_risk(res_combined[x, ]))
}