Compact letter displays (CBDs) are letters that show which treatment groups are not significantly different by some statistical test. It is often desirable to include CBDs on graphs. Here I show how to add them to a box plot created with ggplot2.

First, make an example plot using the iris data:

library(ggplot2) library(car) library(QsRutils) data("iris") plt1 <- ggplot(data = iris, aes(x=Species, y=Petal.Length)) + geom_boxplot() plt1

I want to add CBDs just above the tops of the box whiskers or highest outlier in the plot. I can use the base boxplot function to capture the coordinates for these points.

box.rslt <- with(iris, graphics::boxplot(Petal.Length ~ Species, plot = FALSE)) str(box.rslt) List of 6 $ stats: num [1:5, 1:3] 1.1 1.4 1.5 1.6 1.9 3.3 4 4.35 4.6 5.1 ... $ n : num [1:3] 50 50 50 $ conf : num [1:2, 1:3] 1.46 1.54 4.22 4.48 5.37 ... $ out : num [1:2] 1 3 $ group: num [1:2] 1 2 $ names: chr [1:3] "setosa" "versicolor" "virginica"

The fifth row of `box.rslt$stats`

gives the y coordinates for the tops of the whiskers or the largest outlier if present.

box.rslt$stats [,1] [,2] [,3] [1,] 1.1 3.30 4.50 [2,] 1.4 4.00 5.10 [3,] 1.5 4.35 5.55 [4,] 1.6 4.60 5.90 [5,] 1.9 5.10 6.90

Next I have to get a vector of the letters to add to the plot. For this example, I will make a pairwise t-test which outputs a matrix of p-values.

ptt.rslt <- with(iris, pairwise.t.test(Petal.Length, Species, pool.sd = FALSE))

Looking at the structure of `ptt.rslt`

, we see that `ptt.rslt$p.value`

gives a matrix of p.values:

str(ptt.rslt) List of 4 $ method : chr "t tests with non-pooled SD" $ data.name : chr "Petal.Length and Species" $ p.value : num [1:2, 1:2] 1.99e-45 2.78e-49 NA 4.90e-22 ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:2] "versicolor" "virginica" .. ..$ : chr [1:2] "setosa" "versicolor" $ p.adjust.method: chr "holm" - attr(*, "class")= chr "pairwise.htest" ptt.rslt$p.value setosa versicolor versicolor 1.986887e-45 NA virginica 2.780888e-49 4.900288e-22

From this matrix we can use `QsRutils::make_letter_assignments`

to get a vector of letters for our CBDs.

ltrs <- make_letter_assignments(ptt.rslt) str(ltrs) List of 3 $ Letters : Named chr [1:3] "a" "b" "c" ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica" $ monospacedLetters: Named chr [1:3] "a " " b " " c" ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica" $ LetterMatrix : logi [1:3, 1:3] TRUE FALSE FALSE FALSE TRUE FALSE ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:3] "setosa" "versicolor" "virginica" .. ..$ : chr [1:3] "a" "b" "c" - attr(*, "class")= chr "multcompLetters" ltrs$Letters setosa versicolor virginica "a" "b" "c"

`ltrs$Letters`

gives the vector of letters that we want. Now we can make a data frame to add the CBDs to the box plot.

x <- c(1:length(ltrs$Letters)) y <- box.rslt$stats[5, ] cbd <- ltrs$Letters ltr_df <- data.frame(x, y, cbd) ltr_df x y cbd setosa 1 1.9 a versicolor 2 5.1 b virginica 3 6.9 c

If we plot the CBDs at the coordinates in `ltr_df`

, they will over plot the tops of the whiskers or the highest outlier if present. We need to nudge the CBDs upward to avoid the overlap. To determine how much to nudge, I will get the range of the Y-axis and nudge upward 5% of this range.

lmts <- get_plot_limits(plt1) y.range <- lmts$ymax - lmts$ymin y.nudge <- 0.05 * y.range plt1 + geom_text(data = ltr_df, aes(x=x, y=y, label=cbd), nudge_y = y.nudge)

The CBDs are perfectly positioned, and without any trial and error.