Generate horizontal bar charts of the risk estimates generated
from estimate_risk()
Usage
plot_risk(
risk_dat,
add_to_dat = TRUE,
collapse = is.data.frame(risk_dat),
progress = TRUE,
outcomes = "all",
color_scheme = "single",
color_dat = get_default_color("single"),
color_for_last_group = get_default_color("threshold_final"),
annotation = TRUE,
legend = TRUE,
lines = TRUE,
line_text = TRUE,
base_size = 12
)Arguments
- risk_dat
Data frame or list of data frames (required).This either needs to be from the output of
estimate_risk()or needs to match the output format of risk estimation output ofestimate_risk(). However, the input need not have all columns; in fact, the function will still run ifrisk_datis a data frame with just a single column with one of the risk estimates.- add_to_dat
Logical (optional behavior variable): Whether to add the plots as a list column in the data frame passed via
risk_dat, eitherTRUEorFALSE(the default isTRUE). This argument is strict, so 1 or 0 are not accepted, and moreover, anything other thanTRUEwill be treated asFALSE. IfTRUE,plot_risk()will return the plots in a newly-created list-column containing the plots; ifFALSE,plot_risk()will return the plots directly and only the plots (no data frame component). See the "Value" section for more information.- collapse
Logical (optional behavior variable): Whether to collapse the output into a single data frame if applicable, either
TRUEorFALSE; this argument is strict, so 1 or 0 are not accepted, and moreover, anything other thanTRUEwill be treated asFALSE. This argument is only considered ifadd_to_datisTRUEandrisk_datis a list of data frames (as happens withest_risk()when thecollapseargument for that function isFALSE). This maintains consistency with the API forest_risk(). See the "Value" section for more information.- outcomes
Character (optional behavior variable): The outcome(s) to plot. This should be a character vector listing the outcomes in the order of desired plotting from top to bottom. The default of
"all"gets translated internally toc("total_cvd", "ascvd", "heart_failure", "chd", "stroke"), and it overrides anything else that might be passed tooutcomes.- color_scheme
Character (optional behavior variable): The color scheme to use, one of "single" (the default) or "categories". This argument is interdependent with argument
color_dat.- color_dat
Character or data frame (optional behavior variable): This argument is interdependent with argument
color_scheme.If
color_scheme = "single",color_datshould be a character vector of length 1 specifying the color to use, in a format recognized by R. For example, one can specify either "dodgerblue" or "#1e90ff" as input. One can also specify the argument as a call torgb()(this just returns the hexadecimal color code as a character vector). The default is "#3c8dbc".If
color_scheme = "categories",color_datshould be a data frame with columnsthreshold(numeric, 0.001 < value < 0.999) andcolor(character, adhering to specifications delineated for indicating desired color whencolor_scheme = "single").The numeric limits are set as such (rather than, e.g., being inclusive or being 0 and 1) because the risk estimates only extend to 3 decimal places and it would not make sense to set a threshold of 0.001 or 0.999 (one could, of course, argue even less extreme values are also not terribly sensible, either, but 0.001 and 0.999 are unequivocal, simply based on how risk estimation and plotting work).
The data frame can have up to 3 rows specifying pairs of threshold values and corresponding colors (threshold-color pairs) to use for values below the given threshold.
A final threshold will always be added for values that are at or above the highest valid user-specified threshold in the data frame, and the color for this final threshold can be specified using the
color_for_last_groupargument.Thresholds should be entered in increasing order.
The function will disregard threshold-color pairs where the threshold is empty, duplicated, or outside the aforementioned limits. The function will then sort the threshold-color pairs based on the threshold value to prevent problematic requests (e.g., threshold 1 at 0.15 and threshold 2 at 0.10 will be rearranged to 0.10 and 0.15, but the colors entered for those thresholds will be preserved during the sort).
See details for further information.
- color_for_last_group
Character (optional behavior variable): The color to use for the de facto last group (i.e., values that fall at or above the highest valid user-specified threshold in the
color_datdata frame). This argument is only considered whencolor_scheme = "categories". The default is "#dd4b39". Entry should adhere to the specifications delineated for indicating desired color whencolor_scheme = "single". See details as well.- annotation
Logical (optional behavior variable): Whether to include a title and caption, with the title specifying the time frame of the estimates and the caption specifying the model used to generate the risk estimates. The default is
TRUE.- legend
Logical (optional behavior variable): Whether to include a legend with the plot. Only considered when
color_scheme = "categories". The default isTRUE.- lines
Logical (optional behavior variable): Whether to include vertical, dashed lines at the threshold values. Only considered when
color_scheme = "categories". The default isTRUE.- line_text
Logical (optional behavior variable): Whether to include caption text describing the values at which the lines are drawn (i.e., the threshold values). Only considered when
color_scheme = "categories"andlines = TRUE. The default isTRUE.- base_size
Numeric (optional behavior variable): The base font size to use for the plot. The default is 12. Invalid entries will be discarded in favor of the default.
Value
A ggplot object or list of ggplot objects, depending on the input
format of risk_dat. If risk_dat is a list, the output will
be a list of the same length of ggplot objects. If risk_dat is a data frame,
the output will be a single ggplot object.
Because the function returns a ggplot object, it can be further customized, e.g.,
# Let:
# `res` be the result of a valid `estimate_risk()` call
# and
# `res_10yr <- res$risk_est_10yr`
# Customization via {ggplot2}
p <- plot_risk(res_10yr)
# Note `labs()`, `theme()`, and `margin()` are from {ggplot2}, so one would
# need to get access to them via, e.g., `library(ggplot2)`, `ggplot2::` prefixing,
# `importFrom()` (if developing a package; for example, {preventr} `importFrom()`s
# all three), etc.
p + labs(title = "Lorem ipsum") + theme(plot.margin = margin(20, 20, 20, 20))
# etc.
# Easy to combine the 10- and 30-year plots if desired, e.g., via {patchwork}
# Basic example, side-by-side:
patchwork::wrap_plots(plot_risk(res))
# Many additional options for composing plots via {patchwork}Details
Specifying color_dat and color_for_last_group when color_scheme = "categories"
See also the color_dat and color_scheme arguments. An example color_dat
data frame might be something like the following:
color_dat_v1 <- data.frame(
threshold = 0.075,
color = "#00A65A"
)
# or
color_dat_v2 <- data.frame(
threshold = c(0.075, 0.20),
color = c("#00A65A", "#FFFDAF")
)
# or
color_dat_v3 <- data.frame(
threshold = c(0.075, 0.20, 0.35),
color = c("#00A65A", "#FFFDAF", "#FF8C00")
)
# Not-great entries that will be cleaned
color_dat_why_v1 <- data.frame(
threshold = c(0.075, 0.20, 0.20),
color = c("#00A65A", "#FFFDAF", "#FF8C00")
)
color_dat_why_v2 <- data.frame(
threshold = c(0.35, 0.075, 1.5),
color = c("#00A65A", "#FFFDAF", "#FF8C00")
)In all the above cases, users can specify color_for_last_group, and
plot_risk() will automatically assign that color to values at or above the
highest valid user-specified threshold. For example, suppose color_for_last_group
is set as "#dd4b39" (this is also the default color for this argument). Using
the above examples:
For
color_dat_v1, "#00A65A" will be applied for values below 0.075, and "#dd4b39" will be applied for values at or above 0.075.For
color_dat_v2, "#00A65A" will be applied for values below 0.075, "#FFFDAF" will be applied for values at or above 0.075 and below 0.20, and "#dd4b39" will be applied for values at or above 0.20.For
color_dat_v3, "#00A65A" will be applied for values below 0.075, "#FFFDAF" will be applied for values at or above 0.075 and below 0.20, "#FF8C00" will be applied for values at or above 0.20 and below 0.35, and "#dd4b39" will be applied for values at or above 0.35.For
color_dat_why_v1, the function will clean the data frame to remove the duplicate threshold value, and the result will be the same ascolor_dat_v2.For
color_dat_why_v2, the function will clean the data frame to remove the invalid threshold value of 1.5, then the thresholds will be arranged in increasing order, so the final result will be threshold 1 being at 0.075 with color "#FFFDAF", threshold 2 being at 0.35 with color "#00A65A", and the final group being at or above 0.35 with color "#dd4b39".
Examples
res <- estimate_risk(
age = 50,
sex = "female",
sbp = 160,
bp_tx = TRUE,
total_c = 200,
hdl_c = 45,
statin = FALSE,
dm = TRUE,
smoking = FALSE,
egfr = 90,
bmi = 35
)
#> PREVENT estimates are from: Base model.
plot_risk(res)
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> $risk_est_10yr
#> # A tibble: 1 × 9
#> total_cvd ascvd heart_failure chd stroke model over_years input_problems
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
#> 1 0.147 0.092 0.081 0.044 0.054 base 10 NA
#> # ℹ 1 more variable: plot <list>
#>
#> $risk_est_30yr
#> # A tibble: 1 × 9
#> total_cvd ascvd heart_failure chd stroke model over_years input_problems
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
#> 1 0.53 0.354 0.39 0.198 0.221 base 30 NA
#> # ℹ 1 more variable: plot <list>
#>
# Remove annotation (title and caption)
plot_risk(res, annotation = FALSE)
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> $risk_est_10yr
#> # A tibble: 1 × 9
#> total_cvd ascvd heart_failure chd stroke model over_years input_problems
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
#> 1 0.147 0.092 0.081 0.044 0.054 base 10 NA
#> # ℹ 1 more variable: plot <list>
#>
#> $risk_est_30yr
#> # A tibble: 1 × 9
#> total_cvd ascvd heart_failure chd stroke model over_years input_problems
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
#> 1 0.53 0.354 0.39 0.198 0.221 base 30 NA
#> # ℹ 1 more variable: plot <list>
#>
# Plot only a subset of the outcomes
# (e.g., excluding total CVD and heart failure)
plot_risk(res, outcomes = c("ascvd", "chd", "stroke"))
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> $risk_est_10yr
#> # A tibble: 1 × 9
#> total_cvd ascvd heart_failure chd stroke model over_years input_problems
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
#> 1 0.147 0.092 0.081 0.044 0.054 base 10 NA
#> # ℹ 1 more variable: plot <list>
#>
#> $risk_est_30yr
#> # A tibble: 1 × 9
#> total_cvd ascvd heart_failure chd stroke model over_years input_problems
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
#> 1 0.53 0.354 0.39 0.198 0.221 base 30 NA
#> # ℹ 1 more variable: plot <list>
#>
# Need to plot risk estimates you already have? No problem.
risk_dat <- data.frame(
total_cvd = 0.15,
ascvd = 0.10,
heart_failure = 0.05,
chd = 0.07,
stroke = 0.03
)
# Note because `risk_dat` does not specify the model or time horizon, these
# elements will not be included in the output, regardless of the value for the
# argument `annotation`
plot_risk(risk_dat)
#> Error in format.default(unlist(xx), ...): c("Found no format() method for class \"ggplot2::ggplot\"", "Found no format() method for class \"ggplot\"", "Found no format() method for class \"ggplot2::gg\"", "Found no format() method for class \"S7_object\"", "Found no format() method for class \"gg\"")
# ... but if you specify that as part of the data frame, it will plot it
risk_dat$model <- "base"
plot_risk(risk_dat)
#> Error in format.default(unlist(xx), ...): c("Found no format() method for class \"ggplot2::ggplot\"", "Found no format() method for class \"ggplot\"", "Found no format() method for class \"ggplot2::gg\"", "Found no format() method for class \"S7_object\"", "Found no format() method for class \"gg\"")
risk_dat$over_years <- 10
plot_risk(risk_dat)
#> Error in format.default(unlist(xx), ...): c("Found no format() method for class \"ggplot2::ggplot\"", "Found no format() method for class \"ggplot\"", "Found no format() method for class \"ggplot2::gg\"", "Found no format() method for class \"S7_object\"", "Found no format() method for class \"gg\"")
# Rest of examples limited to interactive sessions
if (FALSE) { # interactive()
res_10yr <- res$risk_est_10yr
res_30yr <- res$risk_est_30yr
# Change color for `color_scheme = "single"`
plot_risk(res_10yr, color_scheme = "single", color_dat = "darkgreen")
# Use `color_scheme = "categories"`
color_dat <- data.frame(
threshold = c(0.075, 0.20),
color = c("#00A65A", "#FF8C00")
)
plot_risk(res_30yr, color_scheme = "categories", color_dat = color_dat)
# Change color for final group
plot_risk(
res_30yr,
color_scheme = "categories",
color_dat = color_dat,
color_for_last_group = "maroon"
)
# Remove legend
plot_risk(
res_10yr,
color_scheme = "categories",
color_dat = color_dat,
legend = FALSE
)
# Remove legend and lines
plot_risk(
res_10yr,
color_scheme = "categories",
color_dat = color_dat,
legend = FALSE,
lines = FALSE
)
# Remove legend and line text (but keep lines)
plot_risk(
res_10yr,
color_scheme = "categories",
color_dat = color_dat,
legend = FALSE,
line_text = FALSE
)
# Run `plot_risk()` on a data frame of combined results from `estimate_risk()`
# (see examples from `estimate_risk()` for other ideas about how one can
# combine results)
res_no_smoking <- estimate_risk(
age = 50,
sex = "female",
sbp = 160,
bp_tx = TRUE,
total_c = 200,
hdl_c = 45,
statin = FALSE,
dm = TRUE,
smoking = FALSE,
egfr = 90,
bmi = 35,
time = "10yr"
)
res_smoking <- estimate_risk(
age = 50,
sex = "female",
sbp = 160,
bp_tx = TRUE,
total_c = 200,
hdl_c = 45,
statin = FALSE,
dm = TRUE,
smoking = TRUE,
egfr = 90,
bmi = 35,
time = "10yr"
)
res_combined <- rbind(res_no_smoking, res_smoking)
lapply(seq_len(nrow(res_combined)), function(x) plot_risk(res_combined[x, ]))
}
