Compact Letter Displays

Compact letter displays (CBDs) are letters that show which treatment groups are not significantly different by some statistical test. It is often desirable to include CBDs on graphs. Here I show how to add them to a box plot created with ggplot2.

First, make an example plot using the iris data:

library(ggplot2)
library(car)
library(QsRutils)
data("iris")
plt1 <- ggplot(data = iris, aes(x=Species, y=Petal.Length)) +     geom_boxplot()
plt1

I want to add CBDs just above the tops of the box whiskers or highest outlier in the plot. I can use the base boxplot function to capture the coordinates for these points.

box.rslt <- with(iris, graphics::boxplot(Petal.Length ~ Species, plot = FALSE))
str(box.rslt)
List of 6
 $ stats: num [1:5, 1:3] 1.1 1.4 1.5 1.6 1.9 3.3 4 4.35 4.6 5.1 ...
 $ n    : num [1:3] 50 50 50
 $ conf : num [1:2, 1:3] 1.46 1.54 4.22 4.48 5.37 ...
 $ out  : num [1:2] 1 3
 $ group: num [1:2] 1 2
 $ names: chr [1:3] "setosa" "versicolor" "virginica"

The fifth row of box.rslt$stats gives the y coordinates for the tops of the whiskers or the largest outlier if present.

box.rslt$stats
[,1] [,2] [,3]
[1,] 1.1 3.30 4.50
[2,] 1.4 4.00 5.10
[3,] 1.5 4.35 5.55
[4,] 1.6 4.60 5.90
[5,] 1.9 5.10 6.90

Next I have to get a vector of the letters to add to the plot. For this example, I will make a pairwise t-test which outputs a matrix of p-values.

ptt.rslt <- with(iris, pairwise.t.test(Petal.Length, Species, pool.sd = FALSE))

Looking at the structure of ptt.rslt, we see that ptt.rslt$p.value gives a matrix of p.values:

str(ptt.rslt)

List of 4
$ method : chr "t tests with non-pooled SD"
$ data.name : chr "Petal.Length and Species"
$ p.value : num [1:2, 1:2] 1.99e-45 2.78e-49 NA 4.90e-22
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "versicolor" "virginica"
.. ..$ : chr [1:2] "setosa" "versicolor"
$ p.adjust.method: chr "holm"
- attr(*, "class")= chr "pairwise.htest"

ptt.rslt$p.value

           setosa      versicolor
versicolor 1.986887e-45 NA
virginica 2.780888e-49 4.900288e-22

From this matrix we can use QsRutils::make_letter_assignments to get a vector of letters for our CBDs.

ltrs <- make_letter_assignments(ptt.rslt)
str(ltrs)

List of 3
 $ Letters          : Named chr [1:3] "a" "b" "c"
  ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica"
 $ monospacedLetters: Named chr [1:3] "a  " " b " "  c"
  ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica"
 $ LetterMatrix     : logi [1:3, 1:3] TRUE FALSE FALSE FALSE TRUE FALSE ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "setosa" "versicolor" "virginica"
  .. ..$ : chr [1:3] "a" "b" "c"
 - attr(*, "class")= chr "multcompLetters"

ltrs$Letters
    setosa versicolor  virginica 
       "a"        "b"        "c"

ltrs$Letters gives the vector of letters that we want. Now we can make a data frame to add the CBDs to the box plot.

x <- c(1:length(ltrs$Letters))
y <- box.rslt$stats[5, ]
cbd <- ltrs$Letters
ltr_df <- data.frame(x, y, cbd)
ltr_df
           x   y cbd
setosa     1 1.9   a
versicolor 2 5.1   b
virginica  3 6.9   c

If we plot the CBDs at the coordinates in ltr_df, they will over plot the tops of the whiskers or the highest outlier if present. We need to nudge the CBDs upward to avoid the overlap. To determine how much to nudge, I will get the range of the Y-axis and nudge upward 5% of this range.

lmts <- get_plot_limits(plt1)
y.range <- lmts$ymax - lmts$ymin
y.nudge <- 0.05 * y.range
plt1 + 
    geom_text(data = ltr_df, aes(x=x, y=y, label=cbd), nudge_y = y.nudge)

The CBDs are perfectly positioned, and without any trial and error.

Get ggplot plot panel limits

I have added a new function (get_plot_limits) to my package QsRutils.  It extracts the minimum and maximum X and Y values for a ggplot panel. This is useful in formatting ggplots. For example, you may wish to expand the panel to avoid text running out of the panel, or nudge text relative to some point. For an example, see my post on adding compact letter displays to box plots created with ggplot2.