Introduction

ggplot2 is an R package designed and implemented based on The Grammar of Graphics; a book by Leland Wilkinson which is the foundation of ggplot2.

ggplot2 mainly works with dataframes—in contrast to base R plotting functions which work with vectors as well. Dataframes are usually passed to the ggplot() function and from thereon desired sections of the dataframe (e.g. specific columns) are mapped to geom layers.

One of the most important and useful features of plots created using the ggplot2 package is that they can be enhanced continuously and infinitely by adding more and more layers on top of eachother. ggplot2 makes improving your plots a relatively easy and efficient task. This is because it is built on the Grammar of Graphics; a principle that intends to break down graphics into semantic components. Therefore, how ggplot2 works can be understood intuitively.

Data
The actual variables to be plotted.
Aesthetics
The scales onto which we will map our data.
Geometries
Shapes used to represent our data.
Facets
Rows and columns of sub-plots.
Statistics
Statistical models & summaries.
Coordinates
The plotting space we are using.
Theme
Describes non-data ink.


First of all, we need to install and load required packages.

### This installs 'pacman' if it is not already installed
if(!require(pacman)){install.packages("pacman")}

### Install packages that are not installed and load them
pacman::p_load(ggplot2, ggrepel, ggthemes, plotly,
               openxlsx, dplyr, tidyr, RColorBrewer, 
               Hmisc, devtools, pander, stringr,
               knitr, kableExtra, knitr, rmarkdown)

### Install a theme package for ggplot2 from GitHub (https://github.com/cttobin/ggthemr)
if(!require(ggthemr)){devtools::install_github('cttobin/ggthemr')}

### An alternative way to install packages from GitHub is using biocLite
### You do not need an 'if' to check whether the package is already installed, biocLite will do that for you
#source("https://bioconductor.org/biocLite.R")
#if(!require(ggthemr)){biocLite('cttobin/ggthemr')}

library(ggthemr)

Eyeball/Inspect your dataset

An important and helpful step before visualizing any type of dataset is in fact manually inspecting and eyeballing the data.

We use the dataset mice.xlsx.

Let’s find out how many columns and rows there are in the dataset we intend to visualize. Then let’s print the name of the columns.

mice.data <- openxlsx::read.xlsx("data/mice.xlsx")
dim(mice.data) ### mice.data %>% dim
## [1] 640 106
names(mice.data) ### mice.data %>% names
ID strain sex group cohort Tail.presence Tail.length Forelimb.digit.number Hindlimb.digit.number weight_w9 TransferArousal Gait TailElevation Unexpected_behavior weight_w13 Glucose.conc.0 Glucose.conc.15 Glucose.conc.30 Glucose.conc.60 Glucose.conc.120 albumin ALP Bili-T Ca cholesterol Fe AST ALT glucose TP Urea HDL phosphate TG CREZ K Cl Whole.arena.resting.time.5 Whole.arena.resting.time.10 Whole.arena.resting.time.15 Whole.arena.resting.time.20 Whole.arena.average.speed.5 Whole.arena.average.speed.10 Whole.arena.average.speed.15 Whole.arena.average.speed.20 Periphery.distance.travelled.5 Periphery.distance.travelled.10 Periphery.distance.travelled.15 Periphery.distance.travelled.20 Periphery.resting.time.5 Periphery.resting.time.10 Periphery.resting.time.15 Periphery.resting.time.20 Distance.travelled.5 Distance.travelled.10 Distance.travelled.15 Distance.travelled.20 Whole.arena.resting.time.(s) Whole.arena.permanence.(s) Whole.arena.average.speed.(cm/s) Periphery.distance.travelled.(cm) Periphery.resting.time.(s) Periphery.permanence.time.(s) Periphery.average.speed.(cm/s) Center.distance.travelled.(cm) Center.resting.time.(s) Center.permanence.time.(s) Center.average.speed.(cm/s) Latency.to.center.entry.(s) Number.of.center.entries Distance.travelled.-.total.(cm) Percentage.center.time.(%) Periphery.permanence.time.5 Periphery.permanence.time.10 Periphery.permanence.time.15 Periphery.permanence.time.20 Periphery.average.speed.5 Periphery.average.speed.10 Periphery.average.speed.15 Periphery.average.speed.20 Center.distance.travelled.5 Center.distance.travelled.10 Center.distance.travelled.15 Center.distance.travelled.20 Center.resting.time.5 Center.resting.time.10 Center.resting.time.15 Center.resting.time.20 Center.permanence.time.5 Center.permanence.time.10 Center.permanence.time.15 Center.permanence.time.20 Center.average.speed.5 Center.average.speed.10 Center.average.speed.15 Center.average.speed.20 Number.of.center.entries.5 Number.of.center.entries.10 Number.of.center.entries.15 Number.of.center.entries.20 Click 30.kHz 24.kHz 18.kHz 12.kHz 6.kHz
064-EPL19 2510009E07Rik (indel) Male Male KO_E07Rik c007 Present Long As expected As expected 24.6 Immediate movement Fluid movement Straub / elevated tail None 22.3 3.3 12.4 16.399999999999999 6.5 4.2 26.77 85 3.93 2.14 1.80 16.83 55 25 7.53 46.494 7.80 1.179 2.26 0.755 7.641 5.83 107.8 254.7 254.1 255.0 248.0173 2.4 2.7 2.6 3.0 411.9 611.8 571.2 669.3 192.8076 225.6801 223.9341 184.6691 718.7 804.7 780.5 907.9 1011.6 1200 2.7 2264.2 826.2880 960.8 2.4 947.6 185.8584 239.2 3.8 2.4 51 3211.8 19.933333 220.1 260.3 257.1 223.3 2.0 2.3 2.2 3.0 306.9 192.8 209.3 238.6 62.8014 27.7503 31.4886 63.3482 79.9 39.7 42.9 76.6 3.3 5.5 4.6 3.1 17 11 12 11 15 25 10 15 10 15
064-EPL20 2510009E07Rik (indel) Male Male KO_E07Rik c007 Present As expected One forelimb - increased number Both hindlimbs - decreased number 26.8 Immediate movement Lack of fluidity in movement As expected Circling 24.4 3.8 13.3 14.3 11.8 5.6 27.33 83 2.72 2.18 2.43 17.82 44 23 11.89 48.744 8.58 1.555 1.97 1.352 7.707 6.05 108.3 255.6 266.7 269.7 255.2149 2.3 1.9 1.7 2.5 494.5 388.9 343.6 491.0 242.9133 239.9250 236.6049 203.9328 699.7 561.8 522.8 763.1 1047.6 1200 2.1 1718.1 923.5427 1027.3 1.7 829.2 122.4443 172.7 5.0 24.3 35 2547.3 14.391667 275.1 262.5 256.9 232.8 1.8 1.5 1.3 2.1 205.2 172.8 179.1 272.1 12.8750 25.8750 33.3163 50.6605 25.0 37.5 43.1 67.1 8.0 5.3 4.0 4.4 9 8 8 10 20 15 15 15 15 20
064-EPL40 2510009E07Rik (indel) Male Male KO_E07Rik c007 Present As expected As expected Both hindlimbs - increased number 20.0 As expected Lack of fluidity in movement Dragging Other 18.7 4.4000000000000004 11.8 12.7 8.3000000000000007 3.7 26.31 120 5.03 2.28 1.82 16.71 81 30 7.72 46.215 5.87 1.128 2.37 0.699 8.859 5.77 109.9 287.1 234.3 251.1 284.3052 0.8 3.4 2.8 0.9 223.9 627.7 489.3 196.3 283.2886 188.5950 207.2700 275.8054 231.8 1031.2 828.7 276.3 1057.2 1200 2.0 1537.2 954.6328 1045.6 1.4 830.8 98.2891 154.3 6.0 11.6 42 2368.0 12.858333 295.4 228.6 235.0 286.7 0.8 2.7 2.0 0.7 8.0 403.5 339.4 79.9 4.0138 44.9820 41.2734 8.0520 4.7 71.4 65.1 13.2 1.8 5.9 6.2 6.7 2 19 17 4 15 20 5 10 25 50
064-EPL69 2510009E07Rik (indel) Male Male KO_E07Rik c007 Present As expected Both forelimbs - decreased number One hindlimb - increased number 26.3 Extended freeze Fluid movement Dragging None 24.6 4.2 10.5 12 8.4 5.9 26.20 96 2.12 2.16 2.60 23.87 44 25 13.26 47.813 7.90 1.730 2.55 1.162 7.229 5.68 108.7 245.1 275.7 270.6 276.2079 3.0 1.2 1.5 1.2 659.6 261.6 341.0 290.5 233.6184 252.5435 246.6438 262.4425 903.4 374.6 461.0 350.5 1068.0 1200 1.7 1552.7 994.9914 1094.6 1.4 536.9 72.6570 105.3 5.1 91.8 30 2089.6 8.775000 274.2 270.1 267.8 282.5 2.4 1.0 1.3 1.0 243.8 113.0 120.0 60.1 11.2488 23.4715 23.6670 13.9125 25.8 29.9 32.2 17.5 9.8 3.5 4.0 3.3 11 7 7 5 10 20 10 10 10 15
064-EPL75 2510009E07Rik (indel) Male Male KO_E07Rik c007 Present Long As expected As expected 25.5 Immediate movement Fluid movement Straub / elevated tail None 21.9 3.3 13.4 13.1 7.1 3.6 26.60 79 2.17 2.19 2.40 21.55 54 38 11.70 46.984 8.42 1.571 1.98 0.900 5.574 5.00 110.8 257.7 252.3 269.4 281.4000 2.3 2.9 1.8 1.1 614.8 696.4 467.0 283.5 249.6637 243.3660 262.8660 281.0430 696.6 865.8 540.7 326.7 1060.8 1200 2.0 2061.7 1037.0217 1156.1 1.8 368.1 23.5743 43.9 8.6 8.5 20 2429.8 3.658333 287.3 282.0 289.5 297.4 2.1 2.5 1.6 1.0 81.8 169.4 73.7 43.3 7.8976 9.2340 6.1950 0.1066 12.8 18.0 10.5 2.6 6.4 9.1 7.8 20.2 5 9 4 2 10 60 5 5 10 15
064-EPL82 2510009E07Rik (indel) Male Male KO_E07Rik c007 Present As expected One forelimb - increased number Both hindlimbs - increased number 24.4 As expected Lack of fluidity in movement Straub / elevated tail None 22.2 4.4000000000000004 11.9 10 7.7 5.6 23.85 83 1.09 2.24 2.27 24.40 71 48 15.04 46.191 8.59 1.330 2.52 0.981 8.113 5.33 109.6 252.0 261.6 247.2 278.7000 2.5 2.0 2.8 1.1 603.7 405.7 564.6 201.9 235.2465 191.8430 192.7445 221.3700 738.9 600.8 844.3 336.8 1039.2 1200 2.1 1775.9 840.4998 956.2 1.9 744.8 199.4284 243.8 3.0 19.8 50 2520.8 20.316667 274.5 218.5 228.1 235.0 2.2 1.8 2.5 0.9 135.1 195.1 279.8 134.8 16.4475 69.6010 54.3526 57.9800 25.5 81.5 71.8 65.0 5.7 2.4 3.9 1.8 8 13 19 10 20 85 60 40 10 10

Your first plot in ggplot2

Now let’s create our first ggplot by plotting two of the columns from the data.
In the example below, we have chosen to plot Whole.arena.resting.time.5 against Distance.travelled.5. Feel free to use any other columns.

If you feel stuck at any point, check out the cheatsheet for ggplot2.

ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5))

The plot is empty and nothing has been plotted! Can you find out why there are no datapoints on our plot?

We have properly supplied our dataset to the ggplot function and have also mapped specific columns to each axis (x and y) using the aes function. Or in other words, we have constructed aesthetic mappings using aes. What is missing? What have we done wrong?

Thus, by adding the proper geom layer, we can now see the datapoints plotted.

ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) + geom_point()

Note that the mapping function can be written directly inside geom_point. We can see that the resulting plot is the same as the one above. Can you however find out the technical differences? When would you use the first approach and when the second?

Everything you define inside ggplot is automatically passed on to any further layes you add to your plot. In other words, new geom layers will automatically inhert whatever is specified inside the ggplot function. “Aesthetics supplied to ggplot() are used as defaults for every layer”. This is particularly useful when you have several layers at the same time (e.g. geom_point(), geom_smooth(), and geom_rug) and want all of them to use the same color and aes mapping.

ggplot(data = mice.data) + geom_point(mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5))

In the next steps, we will add more layers to our ggplot to make it more sophisticated. You can keep on adding more layers by adding a + sign and specifying more ggplot functions. Alternatively, you can save the base of your plot into a variable at any given time to avoid writing repetitive lines later on.


Saving the plot as an object for further addition of new layers

p <- ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) + geom_point()

Now p contains our ggplot object.


Plot titles and axis labels

Now let’s add descriptive labels to our x and y axes and specify a plot title and subtitle.

We already have our ggplot object saved into p and we can reuse it by only adding new layers.

p + labs(title = "Mice data scatterplot", subtitle = "Resting time and travel distance", y = "Distance Travelled", x = "Whole arena resting time")

Same can be acheived in the following way.

p + ggtitle("Mice data scatterplot", subtitle = "Resting time and travel distance") + 
  xlab("Whole arena resting time") +
  ylab("Distance Travelled")


Smoothing line

We now add a smoothing line, which “aids the eye in seeing patterns in the presence of overplotting.”

p + geom_smooth(method = "lm", se = TRUE)


Zoom

This example demonstrates how to change the theme (and use one of the built-in ones) and use zoom-in to only show a portion of the data and not the entire dataset.

p +
  theme_dark() +
  coord_cartesian(xlim = c(200, 240), ylim = c(1000, 2000)) +
  geom_smooth(method = "lm")

There is another way to do the same above. We can use xlim and ylim instead of coord_cartesian. But this will remove data points from the data and thus it will be impossible to use those points somewhere else in the plot (e.g. drawing a line, etc.). coord_cartesian acts as a ‘zooming’ function instead.

Note: The data points are NOT removed from the original dataset itself, but rather from the ggplot object.

p +
  theme_dark() +
  xlim(c(200, 240)) +
  ylim(c(1000, 2000)) +
  geom_smooth(method = "lm")
## Warning: Removed 435 rows containing non-finite values (stat_smooth).
## Warning: Removed 435 rows containing missing values (geom_point).


Rug plots on x and y axes

In ggplot2, it is possible to add extra plots on x and y axes. One of the easiest ways to do this is using rug plots from the ggplot2 package itself.

A rug plot, according to MATLAB Central, “is a compact way of illustrating the marginal distributions of a variable along x and y. Positions of the data points along x and y are denoted by tick marks, reminiscent of the tassels on a rug.”

Example below is a relatively more advanced and uses many layes at the same time. Try to understand what each line of code does.

g <- ggplot(data = mice.data) + 
     geom_rug(aes(Whole.arena.resting.time.5, Distance.travelled.5), color = "red") + 
     geom_point(mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5, col = Tail.length)) +
     geom_smooth(mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5), method = "lm", color = "yellow") +
     ggthemes::theme_solarized_2(light = FALSE) +
     scale_colour_solarized("blue") +
     theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
     labs(title = "Distance traveled vs. Resting time", subtitle = "Mice measurement data", y = "Whole arena resting time 5", x = "Distance traveled 5", caption = "Phenotyping data")

plot(g)


Conditionally change the color of datapoints

This example uses a theme from the ggthemr package and demonstrates how to change the color of points that satisfy a particular condition.

small <- filter(iris, Sepal.Length > 7)

ggthemr::ggthemr('dust')
## Warning: New theme missing the following elements: panel.grid, plot.tag,
## plot.tag.position
ggplot(iris) +
  geom_rug(aes(Sepal.Length,Sepal.Width) ) +
  geom_point(aes(Sepal.Length, Sepal.Width, col=Species)) +
  geom_point(data = small, mapping = aes(Sepal.Length, Sepal.Width), col = "white", shape = 12)


Alternative shapes for datapoints

There are several different shapes available in R. Each shape has its own number.
You can explore them by yourself in this way.

shapes <- data.frame(
  shape = c(0:25),
  x = 0:25 %/% 5,
  y = -(0:25 %% 5)
)
ggplot(shapes, aes(x, y)) + 
  geom_point(aes(shape = shape), size = 5, fill = "red") +
  geom_text(aes(label = shape), hjust = 0, nudge_x = 0.15) +
  scale_shape_identity() +
  expand_limits(x = 4.1) +
  scale_x_continuous(NULL, breaks = NULL) + 
  scale_y_continuous(NULL, breaks = NULL)


Static shape and colors

In the following example, we will use static color and size for the points, change the smoothing to loess and finally choose a different theme from the ggthemes package.

ggthemr_reset()
ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
    geom_point(col = "steelblue", size = 4, shape = 6) +
    geom_smooth(method = "loess", color = "firebrick") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(title = "Distance traveled vs. Resting time", subtitle = "Mice measurement data", y = "Whole arena resting time 5", x = "Distance traveled 5", caption = "Phenotyping data") +
    ggthemes::theme_hc(bgcolor = "darkunica") +
    ggthemes::scale_fill_hc("darkunica")
## Warning in ggthemes::theme_hc(bgcolor = "darkunica"): `bgcolor` is
## deprecated. Use `style` instead.


Color palettes

ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
       geom_point(aes(col = sex), size = 3) +
       geom_smooth(method = "loess", color = "firebrick") +
       theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position = "top") +
       scale_colour_brewer(palette = "Set2")

Why are there two male categories? And how should we fix this? This is the type of problem that you can expect to encounter on a regular basis.

unique(mice.data$sex)
## [1] "Male"   "Female" "Male "
#View(mice.data[mice.data$sex == "Male ", ])
mice.data$sex[mice.data$sex == "Male "] <- "Male"

Now let’s visualize again:

ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
       geom_point(aes(col = sex), size = 3) +
       geom_smooth(method = "loess", color = "firebrick") +
       theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position = "top") +
       scale_colour_brewer(palette = "Set2")

You can explore other color palettes in the following way.

RColorBrewer::brewer.pal.info
##          maxcolors category colorblind
## BrBG            11      div       TRUE
## PiYG            11      div       TRUE
## PRGn            11      div       TRUE
## PuOr            11      div       TRUE
## RdBu            11      div       TRUE
## RdGy            11      div      FALSE
## RdYlBu          11      div       TRUE
## RdYlGn          11      div      FALSE
## Spectral        11      div      FALSE
## Accent           8     qual      FALSE
## Dark2            8     qual       TRUE
## Paired          12     qual       TRUE
## Pastel1          9     qual      FALSE
## Pastel2          8     qual      FALSE
## Set1             9     qual      FALSE
## Set2             8     qual       TRUE
## Set3            12     qual      FALSE
## Blues            9      seq       TRUE
## BuGn             9      seq       TRUE
## BuPu             9      seq       TRUE
## GnBu             9      seq       TRUE
## Greens           9      seq       TRUE
## Greys            9      seq       TRUE
## Oranges          9      seq       TRUE
## OrRd             9      seq       TRUE
## PuBu             9      seq       TRUE
## PuBuGn           9      seq       TRUE
## PuRd             9      seq       TRUE
## Purples          9      seq       TRUE
## RdPu             9      seq       TRUE
## Reds             9      seq       TRUE
## YlGn             9      seq       TRUE
## YlGnBu           9      seq       TRUE
## YlOrBr           9      seq       TRUE
## YlOrRd           9      seq       TRUE
RColorBrewer

RColorBrewer


Label specific data points

ggthemr::ggthemr("earth")
## Warning: New theme missing the following elements: panel.grid, plot.tag,
## plot.tag.position
select.mice <- mice.data %>%
  group_by(cohort) %>% filter(Distance.travelled.5 > 2000)

ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
  geom_point(aes(colour = Tail.length)) +
  geom_label(aes(label = Gait), data = select.mice, nudge_y = 2, alpha = 1) +
  coord_cartesian(ylim = c(1200, 2300), xlim = c(180, 350))

ggthemr_reset()

Almost all of the label are overlapping. We can fix this by using the ggrepel package.

ggthemr::ggthemr("earth")
## Warning: New theme missing the following elements: panel.grid, plot.tag,
## plot.tag.position
select.mice <- mice.data %>%
  group_by(cohort) %>% filter(Distance.travelled.5 > 2000)

ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
  geom_point(aes(colour = Tail.length)) +
  coord_cartesian(ylim = c(1200, 2300), xlim = c(180, 350)) +
  geom_point(size = 3, shape = 1, data = select.mice) +
  ggrepel::geom_label_repel(aes(label = Gait), data = select.mice, segment.color = "white")

ggthemr_reset()

Boxplots

ggplot(mice.data) + geom_boxplot(mapping = aes(Gait, weight_w13))

We can see that there is another problem in our dataset that needs to be fixed. Apparently, weight_w13 is not numeric, as we expected it to be.

summary(mice.data$weight_w13)
##    Length     Class      Mode 
##       640 character character
Hmisc::describe(mice.data$weight_w13)
## mice.data$weight_w13 
##        n  missing distinct 
##      640        0      140 
## 
## lowest : 13.5         13.9         14           14.3         14.4        
## highest: 27.4         27.5         27.9         28           IMPC_PSC_003
class(mice.data$weight_w13)
## [1] "character"
apply(X = mice.data, MARGIN = 2, class) %>% table
## .
## character 
##       106
mice.data$weight_w13 <- as.numeric(mice.data$weight_w13) ## You can also use the compound assignment pipe-operator from magrittr package: mice.data$weight_w13 %<>% as.numeric
## Warning: NAs introduced by coercion
ggplot(mice.data) + geom_boxplot(mapping = aes(Gait, weight_w13))
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

ggplot(mice.data) + geom_boxplot(mapping = aes(cohort, weight_w13, col = sex)) +
theme(axis.text.x = element_text(angle = 90)) +
  ylab(label = "Weight")
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

ggplot(mice.data) + 
  geom_boxplot(mapping = aes(Gait, weight_w13, fill = sex)) + 
  geom_jitter(aes(Gait, weight_w13), width = 0.1) +
  ylab("Weight (w13)")
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
## Warning: Removed 3 rows containing missing values (geom_point).

ggplot(mice.data) + 
  geom_boxplot(mapping = aes(Gait, weight_w13, fill = sex)) + 
  geom_jitter(aes(Gait, weight_w13), width = 0.3) +
  ylab("Weight (w13)") +
  scale_fill_brewer(palette="Dark2") + 
  theme_minimal()
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

## Warning: Removed 3 rows containing missing values (geom_point).

fun_mean <- function(x){return(data.frame(y=mean(x),label=mean(x,na.rm=T)))}
ggplot(mice.data, mapping = aes(Gait, weight_w13)) + 
  geom_boxplot() + 
  ylab("Weight (w13)") +
  scale_fill_brewer(palette="Dark2") + 
  theme_minimal() +
stat_summary(fun.y = mean, geom="point",colour="darkred", size=3) +
stat_summary(fun.data = fun_mean, geom="text", vjust=-0.7)
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
## Warning: Removed 3 rows containing non-finite values (stat_summary).

## Warning: Removed 3 rows containing non-finite values (stat_summary).


Histograms

ggplot(mice.data) + geom_bar(aes(Tail.length), color = "black") + 
  xlab(label = "Tail length") +
  ylab(label = "Count")

ggplot(mice.data) + geom_histogram(aes(weight_w13, fill = Gait), color = "black", bins = 20) + xlab(label = "Weight w13")
## Warning: Removed 3 rows containing non-finite values (stat_bin).

ggplot(mice.data) + geom_histogram(aes(weight_w13, fill = Gait), color = "black", bins = 20, position = "dodge") + xlab(label = "Weight w13")
## Warning: Removed 3 rows containing non-finite values (stat_bin).

ggplot(mice.data, aes(weight_w13)) + 
  geom_histogram(aes(y = ..density.., fill = Gait), color = "black", bins = 20, position = "dodge") + 
  xlab(label = "Weight w13") + geom_density(col = "red")
## Warning: Removed 3 rows containing non-finite values (stat_bin).
## Warning: Removed 3 rows containing non-finite values (stat_density).

mu <- plyr::ddply(mice.data, "sex", summarise, grp.mean = mean(weight_w13))

mice.data %>%
ggplot(., aes(x = weight_w13, color = sex, fill = sex)) +
  geom_histogram(aes(y = ..density..), position = "identity", alpha = 0.5) +
  geom_density(alpha = 0.6) +
  geom_vline(data = mu, aes(xintercept = grp.mean, color = sex),
             linetype = "dashed")+
  scale_color_manual(values = c("#999999", "#E69F00", "#56B4E9"))+
  scale_fill_manual(values = c("#999999", "#E69F00", "#56B4E9"))+
  labs(title = "Male vs. Female (Weight)", x = "Weight (w13)", y = "Density")+
  theme_classic() -> density_hist_mice

density_hist_mice
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 3 rows containing non-finite values (stat_bin).

## Warning: Removed 3 rows containing non-finite values (stat_density).
## Warning: Removed 1 rows containing missing values (geom_vline).


Violin Plots

mice.data %>%
    filter(Tail.length == "As expected" & group %in% c("Female CTRL", "Male CTRL")) %>% 
      ggplot(aes(x = sex, y = weight_w9 %>% as.numeric)) +
      geom_violin(aes(fill = sex)) +
      theme_minimal() +
      theme(axis.title.x = element_blank(),
          axis.text.x = element_blank(),
          axis.ticks.x = element_blank()) +
      stat_summary(fun.data = data_summary <- function(x) {
          m <- mean(x)
          ymin <- m - sd(x)
          ymax <- m + sd(x)
          return(c(y = m, ymin = ymin, ymax = ymax))
          }) +
      labs(y = "Weight (w9)", fill = "Sex", title = "Weight Difference Among Sex Groups of CTRL Mice")


Facets

The first plot below does not really show us much information in a clear manner. It’s chaotic and cluttered. The lines are overlapping and it is impossible to compare the trends in the dataset. Sometimes we need to use several plots for proper visualization of our data points of interest. Compare the two plots below and see how they differ.

p5 <- ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5))
p5 + geom_line(aes(col = cohort))

p5 + geom_line() + facet_wrap(~cohort, ncol = 5)

p5 + geom_line(aes(col = cohort)) + facet_wrap(~cohort, ncol = 5)

mice.data %>% 
  filter(cohort %in% c("c015", "c016", "c017", "c018", "c020")) %>% 
  select(c("sex", "cohort", "strain", "Glucose.conc.0", "Glucose.conc.15", "Glucose.conc.30", "Glucose.conc.60", "Glucose.conc.120")) %>%  
  gather("time_point", "Glc_c", 4:8)  %>% 
  mutate(time = as.numeric(str_extract(time_point, "\\d+"))) %>% 
  mutate(Glc_c = as.numeric(Glc_c)) %>% 
  group_by(strain, sex, time, cohort) %>% 
  summarise(Glc_avg =  mean(Glc_c)) %>% 
  ggplot(aes(x=time, y=Glc_avg, color=strain)) +
  geom_line()+
  theme_bw() +
  ylab("Glucose concentration [mmol/L]") +
  facet_grid(cohort~sex)


Saving (exporting) plots

ggplot lets you easily export your plots to a file with customized image quality attributes. ggsave is the function that will do this for you. If you are familiar with \(\LaTeX\), you might want to consider exporting your plots as PDF files. This lets you scale them easily in your \(\LaTeX\) script and place them anywhere in the document without needing to worry about aspect ratio. Another appropriate option is SVG. However, a high-dpi png or tiff should do fine too.

ggsave(filename = "R_course_test_plot.png", 
       plot = density_hist_mice, 
       device = "png",
       width = 15, 
       height = 7, 
       units = "in", 
       dpi = 1200)

Interactive plots with Plotly

plotly::ggplotly(g)