ggplot2
is an R package designed and implemented based on The Grammar of Graphics; a book by Leland Wilkinson which is the foundation of ggplot2
.
ggplot2
mainly works with dataframes—in contrast to base R plotting functions which work with vectors as well. Dataframes are usually passed to the ggplot()
function and from thereon desired sections of the dataframe (e.g. specific columns) are mapped
to geom
layers.
One of the most important and useful features of plots created using the ggplot2
package is that they can be enhanced continuously and infinitely by adding more and more layers on top of eachother. ggplot2
makes improving your plots a relatively easy and efficient task. This is because it is built on the Grammar of Graphics; a principle that intends to break down graphics into semantic
components. Therefore, how ggplot2
works can be understood intuitively.
Data
The actual variables to be plotted.
Aesthetics
The scales onto which we will map our data.
Geometries
Shapes used to represent our data.
Facets
Rows and columns of sub-plots.
Statistics
Statistical models & summaries.
Coordinates
The plotting space we are using.
Theme
Describes non-data ink.
First of all, we need to install and load required packages.
### This installs 'pacman' if it is not already installed
if(!require(pacman)){install.packages("pacman")}
### Install packages that are not installed and load them
pacman::p_load(ggplot2, ggrepel, ggthemes, plotly,
openxlsx, dplyr, tidyr, RColorBrewer,
Hmisc, devtools, pander, stringr,
knitr, kableExtra, knitr, rmarkdown)
### Install a theme package for ggplot2 from GitHub (https://github.com/cttobin/ggthemr)
if(!require(ggthemr)){devtools::install_github('cttobin/ggthemr')}
### An alternative way to install packages from GitHub is using biocLite
### You do not need an 'if' to check whether the package is already installed, biocLite will do that for you
#source("https://bioconductor.org/biocLite.R")
#if(!require(ggthemr)){biocLite('cttobin/ggthemr')}
library(ggthemr)
An important and helpful step before visualizing any type of dataset is in fact manually inspecting and eyeballing the data.
We use the dataset mice.xlsx
.
Let’s find out how many columns and rows there are in the dataset we intend to visualize. Then let’s print the name of the columns.
mice.data <- openxlsx::read.xlsx("data/mice.xlsx")
dim(mice.data) ### mice.data %>% dim
## [1] 640 106
names(mice.data) ### mice.data %>% names
ID | strain | sex | group | cohort | Tail.presence | Tail.length | Forelimb.digit.number | Hindlimb.digit.number | weight_w9 | TransferArousal | Gait | TailElevation | Unexpected_behavior | weight_w13 | Glucose.conc.0 | Glucose.conc.15 | Glucose.conc.30 | Glucose.conc.60 | Glucose.conc.120 | albumin | ALP | Bili-T | Ca | cholesterol | Fe | AST | ALT | glucose | TP | Urea | HDL | phosphate | TG | CREZ | K | Cl | Whole.arena.resting.time.5 | Whole.arena.resting.time.10 | Whole.arena.resting.time.15 | Whole.arena.resting.time.20 | Whole.arena.average.speed.5 | Whole.arena.average.speed.10 | Whole.arena.average.speed.15 | Whole.arena.average.speed.20 | Periphery.distance.travelled.5 | Periphery.distance.travelled.10 | Periphery.distance.travelled.15 | Periphery.distance.travelled.20 | Periphery.resting.time.5 | Periphery.resting.time.10 | Periphery.resting.time.15 | Periphery.resting.time.20 | Distance.travelled.5 | Distance.travelled.10 | Distance.travelled.15 | Distance.travelled.20 | Whole.arena.resting.time.(s) | Whole.arena.permanence.(s) | Whole.arena.average.speed.(cm/s) | Periphery.distance.travelled.(cm) | Periphery.resting.time.(s) | Periphery.permanence.time.(s) | Periphery.average.speed.(cm/s) | Center.distance.travelled.(cm) | Center.resting.time.(s) | Center.permanence.time.(s) | Center.average.speed.(cm/s) | Latency.to.center.entry.(s) | Number.of.center.entries | Distance.travelled.-.total.(cm) | Percentage.center.time.(%) | Periphery.permanence.time.5 | Periphery.permanence.time.10 | Periphery.permanence.time.15 | Periphery.permanence.time.20 | Periphery.average.speed.5 | Periphery.average.speed.10 | Periphery.average.speed.15 | Periphery.average.speed.20 | Center.distance.travelled.5 | Center.distance.travelled.10 | Center.distance.travelled.15 | Center.distance.travelled.20 | Center.resting.time.5 | Center.resting.time.10 | Center.resting.time.15 | Center.resting.time.20 | Center.permanence.time.5 | Center.permanence.time.10 | Center.permanence.time.15 | Center.permanence.time.20 | Center.average.speed.5 | Center.average.speed.10 | Center.average.speed.15 | Center.average.speed.20 | Number.of.center.entries.5 | Number.of.center.entries.10 | Number.of.center.entries.15 | Number.of.center.entries.20 | Click | 30.kHz | 24.kHz | 18.kHz | 12.kHz | 6.kHz |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
064-EPL19 | 2510009E07Rik (indel) | Male | Male KO_E07Rik | c007 | Present | Long | As expected | As expected | 24.6 | Immediate movement | Fluid movement | Straub / elevated tail | None | 22.3 | 3.3 | 12.4 | 16.399999999999999 | 6.5 | 4.2 | 26.77 | 85 | 3.93 | 2.14 | 1.80 | 16.83 | 55 | 25 | 7.53 | 46.494 | 7.80 | 1.179 | 2.26 | 0.755 | 7.641 | 5.83 | 107.8 | 254.7 | 254.1 | 255.0 | 248.0173 | 2.4 | 2.7 | 2.6 | 3.0 | 411.9 | 611.8 | 571.2 | 669.3 | 192.8076 | 225.6801 | 223.9341 | 184.6691 | 718.7 | 804.7 | 780.5 | 907.9 | 1011.6 | 1200 | 2.7 | 2264.2 | 826.2880 | 960.8 | 2.4 | 947.6 | 185.8584 | 239.2 | 3.8 | 2.4 | 51 | 3211.8 | 19.933333 | 220.1 | 260.3 | 257.1 | 223.3 | 2.0 | 2.3 | 2.2 | 3.0 | 306.9 | 192.8 | 209.3 | 238.6 | 62.8014 | 27.7503 | 31.4886 | 63.3482 | 79.9 | 39.7 | 42.9 | 76.6 | 3.3 | 5.5 | 4.6 | 3.1 | 17 | 11 | 12 | 11 | 15 | 25 | 10 | 15 | 10 | 15 |
064-EPL20 | 2510009E07Rik (indel) | Male | Male KO_E07Rik | c007 | Present | As expected | One forelimb - increased number | Both hindlimbs - decreased number | 26.8 | Immediate movement | Lack of fluidity in movement | As expected | Circling | 24.4 | 3.8 | 13.3 | 14.3 | 11.8 | 5.6 | 27.33 | 83 | 2.72 | 2.18 | 2.43 | 17.82 | 44 | 23 | 11.89 | 48.744 | 8.58 | 1.555 | 1.97 | 1.352 | 7.707 | 6.05 | 108.3 | 255.6 | 266.7 | 269.7 | 255.2149 | 2.3 | 1.9 | 1.7 | 2.5 | 494.5 | 388.9 | 343.6 | 491.0 | 242.9133 | 239.9250 | 236.6049 | 203.9328 | 699.7 | 561.8 | 522.8 | 763.1 | 1047.6 | 1200 | 2.1 | 1718.1 | 923.5427 | 1027.3 | 1.7 | 829.2 | 122.4443 | 172.7 | 5.0 | 24.3 | 35 | 2547.3 | 14.391667 | 275.1 | 262.5 | 256.9 | 232.8 | 1.8 | 1.5 | 1.3 | 2.1 | 205.2 | 172.8 | 179.1 | 272.1 | 12.8750 | 25.8750 | 33.3163 | 50.6605 | 25.0 | 37.5 | 43.1 | 67.1 | 8.0 | 5.3 | 4.0 | 4.4 | 9 | 8 | 8 | 10 | 20 | 15 | 15 | 15 | 15 | 20 |
064-EPL40 | 2510009E07Rik (indel) | Male | Male KO_E07Rik | c007 | Present | As expected | As expected | Both hindlimbs - increased number | 20.0 | As expected | Lack of fluidity in movement | Dragging | Other | 18.7 | 4.4000000000000004 | 11.8 | 12.7 | 8.3000000000000007 | 3.7 | 26.31 | 120 | 5.03 | 2.28 | 1.82 | 16.71 | 81 | 30 | 7.72 | 46.215 | 5.87 | 1.128 | 2.37 | 0.699 | 8.859 | 5.77 | 109.9 | 287.1 | 234.3 | 251.1 | 284.3052 | 0.8 | 3.4 | 2.8 | 0.9 | 223.9 | 627.7 | 489.3 | 196.3 | 283.2886 | 188.5950 | 207.2700 | 275.8054 | 231.8 | 1031.2 | 828.7 | 276.3 | 1057.2 | 1200 | 2.0 | 1537.2 | 954.6328 | 1045.6 | 1.4 | 830.8 | 98.2891 | 154.3 | 6.0 | 11.6 | 42 | 2368.0 | 12.858333 | 295.4 | 228.6 | 235.0 | 286.7 | 0.8 | 2.7 | 2.0 | 0.7 | 8.0 | 403.5 | 339.4 | 79.9 | 4.0138 | 44.9820 | 41.2734 | 8.0520 | 4.7 | 71.4 | 65.1 | 13.2 | 1.8 | 5.9 | 6.2 | 6.7 | 2 | 19 | 17 | 4 | 15 | 20 | 5 | 10 | 25 | 50 |
064-EPL69 | 2510009E07Rik (indel) | Male | Male KO_E07Rik | c007 | Present | As expected | Both forelimbs - decreased number | One hindlimb - increased number | 26.3 | Extended freeze | Fluid movement | Dragging | None | 24.6 | 4.2 | 10.5 | 12 | 8.4 | 5.9 | 26.20 | 96 | 2.12 | 2.16 | 2.60 | 23.87 | 44 | 25 | 13.26 | 47.813 | 7.90 | 1.730 | 2.55 | 1.162 | 7.229 | 5.68 | 108.7 | 245.1 | 275.7 | 270.6 | 276.2079 | 3.0 | 1.2 | 1.5 | 1.2 | 659.6 | 261.6 | 341.0 | 290.5 | 233.6184 | 252.5435 | 246.6438 | 262.4425 | 903.4 | 374.6 | 461.0 | 350.5 | 1068.0 | 1200 | 1.7 | 1552.7 | 994.9914 | 1094.6 | 1.4 | 536.9 | 72.6570 | 105.3 | 5.1 | 91.8 | 30 | 2089.6 | 8.775000 | 274.2 | 270.1 | 267.8 | 282.5 | 2.4 | 1.0 | 1.3 | 1.0 | 243.8 | 113.0 | 120.0 | 60.1 | 11.2488 | 23.4715 | 23.6670 | 13.9125 | 25.8 | 29.9 | 32.2 | 17.5 | 9.8 | 3.5 | 4.0 | 3.3 | 11 | 7 | 7 | 5 | 10 | 20 | 10 | 10 | 10 | 15 |
064-EPL75 | 2510009E07Rik (indel) | Male | Male KO_E07Rik | c007 | Present | Long | As expected | As expected | 25.5 | Immediate movement | Fluid movement | Straub / elevated tail | None | 21.9 | 3.3 | 13.4 | 13.1 | 7.1 | 3.6 | 26.60 | 79 | 2.17 | 2.19 | 2.40 | 21.55 | 54 | 38 | 11.70 | 46.984 | 8.42 | 1.571 | 1.98 | 0.900 | 5.574 | 5.00 | 110.8 | 257.7 | 252.3 | 269.4 | 281.4000 | 2.3 | 2.9 | 1.8 | 1.1 | 614.8 | 696.4 | 467.0 | 283.5 | 249.6637 | 243.3660 | 262.8660 | 281.0430 | 696.6 | 865.8 | 540.7 | 326.7 | 1060.8 | 1200 | 2.0 | 2061.7 | 1037.0217 | 1156.1 | 1.8 | 368.1 | 23.5743 | 43.9 | 8.6 | 8.5 | 20 | 2429.8 | 3.658333 | 287.3 | 282.0 | 289.5 | 297.4 | 2.1 | 2.5 | 1.6 | 1.0 | 81.8 | 169.4 | 73.7 | 43.3 | 7.8976 | 9.2340 | 6.1950 | 0.1066 | 12.8 | 18.0 | 10.5 | 2.6 | 6.4 | 9.1 | 7.8 | 20.2 | 5 | 9 | 4 | 2 | 10 | 60 | 5 | 5 | 10 | 15 |
064-EPL82 | 2510009E07Rik (indel) | Male | Male KO_E07Rik | c007 | Present | As expected | One forelimb - increased number | Both hindlimbs - increased number | 24.4 | As expected | Lack of fluidity in movement | Straub / elevated tail | None | 22.2 | 4.4000000000000004 | 11.9 | 10 | 7.7 | 5.6 | 23.85 | 83 | 1.09 | 2.24 | 2.27 | 24.40 | 71 | 48 | 15.04 | 46.191 | 8.59 | 1.330 | 2.52 | 0.981 | 8.113 | 5.33 | 109.6 | 252.0 | 261.6 | 247.2 | 278.7000 | 2.5 | 2.0 | 2.8 | 1.1 | 603.7 | 405.7 | 564.6 | 201.9 | 235.2465 | 191.8430 | 192.7445 | 221.3700 | 738.9 | 600.8 | 844.3 | 336.8 | 1039.2 | 1200 | 2.1 | 1775.9 | 840.4998 | 956.2 | 1.9 | 744.8 | 199.4284 | 243.8 | 3.0 | 19.8 | 50 | 2520.8 | 20.316667 | 274.5 | 218.5 | 228.1 | 235.0 | 2.2 | 1.8 | 2.5 | 0.9 | 135.1 | 195.1 | 279.8 | 134.8 | 16.4475 | 69.6010 | 54.3526 | 57.9800 | 25.5 | 81.5 | 71.8 | 65.0 | 5.7 | 2.4 | 3.9 | 1.8 | 8 | 13 | 19 | 10 | 20 | 85 | 60 | 40 | 10 | 10 |
Now let’s create our first ggplot by plotting two of the columns from the data.
In the example below, we have chosen to plot Whole.arena.resting.time.5
against Distance.travelled.5
. Feel free to use any other columns.
If you feel stuck at any point, check out the cheatsheet for ggplot2.
ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5))
The plot is empty and nothing has been plotted! Can you find out why there are no datapoints on our plot?
We have properly supplied our dataset to the ggplot
function and have also mapped
specific columns to each axis (x and y) using the aes
function. Or in other words, we have constructed aesthetic mappings using aes
. What is missing? What have we done wrong?
Thus, by adding the proper geom
layer, we can now see the datapoints plotted.
ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) + geom_point()
Note that the mapping
function can be written directly inside geom_point
. We can see that the resulting plot is the same as the one above. Can you however find out the technical differences? When would you use the first approach and when the second?
Everything you define inside ggplot
is automatically passed on to any further layes you add to your plot. In other words, new geom
layers will automatically inhert
whatever is specified inside the ggplot
function. “Aesthetics supplied to ggplot() are used as defaults for every layer”. This is particularly useful when you have several layers at the same time (e.g. geom_point()
, geom_smooth()
, and geom_rug
) and want all of them to use the same color and aes
mapping.
ggplot(data = mice.data) + geom_point(mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5))
In the next steps, we will add more layers to our ggplot to make it more sophisticated. You can keep on adding more layers by adding a +
sign and specifying more ggplot functions. Alternatively, you can save the base of your plot into a variable at any given time to avoid writing repetitive lines later on.
p <- ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) + geom_point()
Now p
contains our ggplot
object.
Now let’s add descriptive labels to our x
and y
axes and specify a plot title
and subtitle
.
We already have our ggplot object saved into p
and we can reuse it by only adding new layers.
p + labs(title = "Mice data scatterplot", subtitle = "Resting time and travel distance", y = "Distance Travelled", x = "Whole arena resting time")
Same can be acheived in the following way.
p + ggtitle("Mice data scatterplot", subtitle = "Resting time and travel distance") +
xlab("Whole arena resting time") +
ylab("Distance Travelled")
We now add a smoothing line, which “aids the eye in seeing patterns in the presence of overplotting.”
p + geom_smooth(method = "lm", se = TRUE)
This example demonstrates how to change the theme (and use one of the built-in ones) and use zoom-in to only show a portion of the data and not the entire dataset.
p +
theme_dark() +
coord_cartesian(xlim = c(200, 240), ylim = c(1000, 2000)) +
geom_smooth(method = "lm")
There is another way to do the same above. We can use xlim
and ylim
instead of coord_cartesian
. But this will remove data points from the data and thus it will be impossible to use those points somewhere else in the plot (e.g. drawing a line, etc.). coord_cartesian
acts as a ‘zooming’ function instead.
Note: The data points are NOT removed from the original dataset itself, but rather from the ggplot object.
p +
theme_dark() +
xlim(c(200, 240)) +
ylim(c(1000, 2000)) +
geom_smooth(method = "lm")
## Warning: Removed 435 rows containing non-finite values (stat_smooth).
## Warning: Removed 435 rows containing missing values (geom_point).
In ggplot2
, it is possible to add extra plots on x
and y
axes. One of the easiest ways to do this is using rug plots from the ggplot2
package itself.
A rug plot
, according to MATLAB Central, “is a compact way of illustrating the marginal distributions of a variable along x and y. Positions of the data points along x and y are denoted by tick marks, reminiscent of the tassels on a rug.”
Example below is a relatively more advanced and uses many layes at the same time. Try to understand what each line of code does.
g <- ggplot(data = mice.data) +
geom_rug(aes(Whole.arena.resting.time.5, Distance.travelled.5), color = "red") +
geom_point(mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5, col = Tail.length)) +
geom_smooth(mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5), method = "lm", color = "yellow") +
ggthemes::theme_solarized_2(light = FALSE) +
scale_colour_solarized("blue") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "Distance traveled vs. Resting time", subtitle = "Mice measurement data", y = "Whole arena resting time 5", x = "Distance traveled 5", caption = "Phenotyping data")
plot(g)
This example uses a theme from the ggthemr
package and demonstrates how to change the color of points that satisfy a particular condition.
small <- filter(iris, Sepal.Length > 7)
ggthemr::ggthemr('dust')
## Warning: New theme missing the following elements: panel.grid, plot.tag,
## plot.tag.position
ggplot(iris) +
geom_rug(aes(Sepal.Length,Sepal.Width) ) +
geom_point(aes(Sepal.Length, Sepal.Width, col=Species)) +
geom_point(data = small, mapping = aes(Sepal.Length, Sepal.Width), col = "white", shape = 12)
There are several different shapes
available in R. Each shape has its own number.
You can explore them by yourself in this way.
shapes <- data.frame(
shape = c(0:25),
x = 0:25 %/% 5,
y = -(0:25 %% 5)
)
ggplot(shapes, aes(x, y)) +
geom_point(aes(shape = shape), size = 5, fill = "red") +
geom_text(aes(label = shape), hjust = 0, nudge_x = 0.15) +
scale_shape_identity() +
expand_limits(x = 4.1) +
scale_x_continuous(NULL, breaks = NULL) +
scale_y_continuous(NULL, breaks = NULL)
In the following example, we will use static color
and size
for the points, change the smoothing to loess
and finally choose a different theme from the ggthemes
package.
ggthemr_reset()
ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
geom_point(col = "steelblue", size = 4, shape = 6) +
geom_smooth(method = "loess", color = "firebrick") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "Distance traveled vs. Resting time", subtitle = "Mice measurement data", y = "Whole arena resting time 5", x = "Distance traveled 5", caption = "Phenotyping data") +
ggthemes::theme_hc(bgcolor = "darkunica") +
ggthemes::scale_fill_hc("darkunica")
## Warning in ggthemes::theme_hc(bgcolor = "darkunica"): `bgcolor` is
## deprecated. Use `style` instead.
ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
geom_point(aes(col = sex), size = 3) +
geom_smooth(method = "loess", color = "firebrick") +
theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position = "top") +
scale_colour_brewer(palette = "Set2")
Why are there two male categories? And how should we fix this? This is the type of problem that you can expect to encounter on a regular basis.
unique(mice.data$sex)
## [1] "Male" "Female" "Male "
#View(mice.data[mice.data$sex == "Male ", ])
mice.data$sex[mice.data$sex == "Male "] <- "Male"
Now let’s visualize again:
ggplot(data = mice.data, mapping = aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
geom_point(aes(col = sex), size = 3) +
geom_smooth(method = "loess", color = "firebrick") +
theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position = "top") +
scale_colour_brewer(palette = "Set2")
You can explore other color palettes in the following way.
RColorBrewer::brewer.pal.info
## maxcolors category colorblind
## BrBG 11 div TRUE
## PiYG 11 div TRUE
## PRGn 11 div TRUE
## PuOr 11 div TRUE
## RdBu 11 div TRUE
## RdGy 11 div FALSE
## RdYlBu 11 div TRUE
## RdYlGn 11 div FALSE
## Spectral 11 div FALSE
## Accent 8 qual FALSE
## Dark2 8 qual TRUE
## Paired 12 qual TRUE
## Pastel1 9 qual FALSE
## Pastel2 8 qual FALSE
## Set1 9 qual FALSE
## Set2 8 qual TRUE
## Set3 12 qual FALSE
## Blues 9 seq TRUE
## BuGn 9 seq TRUE
## BuPu 9 seq TRUE
## GnBu 9 seq TRUE
## Greens 9 seq TRUE
## Greys 9 seq TRUE
## Oranges 9 seq TRUE
## OrRd 9 seq TRUE
## PuBu 9 seq TRUE
## PuBuGn 9 seq TRUE
## PuRd 9 seq TRUE
## Purples 9 seq TRUE
## RdPu 9 seq TRUE
## Reds 9 seq TRUE
## YlGn 9 seq TRUE
## YlGnBu 9 seq TRUE
## YlOrBr 9 seq TRUE
## YlOrRd 9 seq TRUE
ggthemr::ggthemr("earth")
## Warning: New theme missing the following elements: panel.grid, plot.tag,
## plot.tag.position
select.mice <- mice.data %>%
group_by(cohort) %>% filter(Distance.travelled.5 > 2000)
ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
geom_point(aes(colour = Tail.length)) +
geom_label(aes(label = Gait), data = select.mice, nudge_y = 2, alpha = 1) +
coord_cartesian(ylim = c(1200, 2300), xlim = c(180, 350))
ggthemr_reset()
Almost all of the label are overlapping. We can fix this by using the ggrepel
package.
ggthemr::ggthemr("earth")
## Warning: New theme missing the following elements: panel.grid, plot.tag,
## plot.tag.position
select.mice <- mice.data %>%
group_by(cohort) %>% filter(Distance.travelled.5 > 2000)
ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5)) +
geom_point(aes(colour = Tail.length)) +
coord_cartesian(ylim = c(1200, 2300), xlim = c(180, 350)) +
geom_point(size = 3, shape = 1, data = select.mice) +
ggrepel::geom_label_repel(aes(label = Gait), data = select.mice, segment.color = "white")
ggthemr_reset()
ggplot(mice.data) + geom_boxplot(mapping = aes(Gait, weight_w13))
We can see that there is another problem in our dataset that needs to be fixed. Apparently, weight_w13
is not numeric, as we expected it to be.
summary(mice.data$weight_w13)
## Length Class Mode
## 640 character character
Hmisc::describe(mice.data$weight_w13)
## mice.data$weight_w13
## n missing distinct
## 640 0 140
##
## lowest : 13.5 13.9 14 14.3 14.4
## highest: 27.4 27.5 27.9 28 IMPC_PSC_003
class(mice.data$weight_w13)
## [1] "character"
apply(X = mice.data, MARGIN = 2, class) %>% table
## .
## character
## 106
mice.data$weight_w13 <- as.numeric(mice.data$weight_w13) ## You can also use the compound assignment pipe-operator from magrittr package: mice.data$weight_w13 %<>% as.numeric
## Warning: NAs introduced by coercion
ggplot(mice.data) + geom_boxplot(mapping = aes(Gait, weight_w13))
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
ggplot(mice.data) + geom_boxplot(mapping = aes(cohort, weight_w13, col = sex)) +
theme(axis.text.x = element_text(angle = 90)) +
ylab(label = "Weight")
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
ggplot(mice.data) +
geom_boxplot(mapping = aes(Gait, weight_w13, fill = sex)) +
geom_jitter(aes(Gait, weight_w13), width = 0.1) +
ylab("Weight (w13)")
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
## Warning: Removed 3 rows containing missing values (geom_point).
ggplot(mice.data) +
geom_boxplot(mapping = aes(Gait, weight_w13, fill = sex)) +
geom_jitter(aes(Gait, weight_w13), width = 0.3) +
ylab("Weight (w13)") +
scale_fill_brewer(palette="Dark2") +
theme_minimal()
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
## Warning: Removed 3 rows containing missing values (geom_point).
fun_mean <- function(x){return(data.frame(y=mean(x),label=mean(x,na.rm=T)))}
ggplot(mice.data, mapping = aes(Gait, weight_w13)) +
geom_boxplot() +
ylab("Weight (w13)") +
scale_fill_brewer(palette="Dark2") +
theme_minimal() +
stat_summary(fun.y = mean, geom="point",colour="darkred", size=3) +
stat_summary(fun.data = fun_mean, geom="text", vjust=-0.7)
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
## Warning: Removed 3 rows containing non-finite values (stat_summary).
## Warning: Removed 3 rows containing non-finite values (stat_summary).
ggplot(mice.data) + geom_bar(aes(Tail.length), color = "black") +
xlab(label = "Tail length") +
ylab(label = "Count")
ggplot(mice.data) + geom_histogram(aes(weight_w13, fill = Gait), color = "black", bins = 20) + xlab(label = "Weight w13")
## Warning: Removed 3 rows containing non-finite values (stat_bin).
ggplot(mice.data) + geom_histogram(aes(weight_w13, fill = Gait), color = "black", bins = 20, position = "dodge") + xlab(label = "Weight w13")
## Warning: Removed 3 rows containing non-finite values (stat_bin).
ggplot(mice.data, aes(weight_w13)) +
geom_histogram(aes(y = ..density.., fill = Gait), color = "black", bins = 20, position = "dodge") +
xlab(label = "Weight w13") + geom_density(col = "red")
## Warning: Removed 3 rows containing non-finite values (stat_bin).
## Warning: Removed 3 rows containing non-finite values (stat_density).
mu <- plyr::ddply(mice.data, "sex", summarise, grp.mean = mean(weight_w13))
mice.data %>%
ggplot(., aes(x = weight_w13, color = sex, fill = sex)) +
geom_histogram(aes(y = ..density..), position = "identity", alpha = 0.5) +
geom_density(alpha = 0.6) +
geom_vline(data = mu, aes(xintercept = grp.mean, color = sex),
linetype = "dashed")+
scale_color_manual(values = c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values = c("#999999", "#E69F00", "#56B4E9"))+
labs(title = "Male vs. Female (Weight)", x = "Weight (w13)", y = "Density")+
theme_classic() -> density_hist_mice
density_hist_mice
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 3 rows containing non-finite values (stat_bin).
## Warning: Removed 3 rows containing non-finite values (stat_density).
## Warning: Removed 1 rows containing missing values (geom_vline).
mice.data %>%
filter(Tail.length == "As expected" & group %in% c("Female CTRL", "Male CTRL")) %>%
ggplot(aes(x = sex, y = weight_w9 %>% as.numeric)) +
geom_violin(aes(fill = sex)) +
theme_minimal() +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
stat_summary(fun.data = data_summary <- function(x) {
m <- mean(x)
ymin <- m - sd(x)
ymax <- m + sd(x)
return(c(y = m, ymin = ymin, ymax = ymax))
}) +
labs(y = "Weight (w9)", fill = "Sex", title = "Weight Difference Among Sex Groups of CTRL Mice")
The first plot below does not really show us much information in a clear manner. It’s chaotic and cluttered. The lines are overlapping and it is impossible to compare the trends in the dataset. Sometimes we need to use several plots for proper visualization of our data points of interest. Compare the two plots below and see how they differ.
p5 <- ggplot(mice.data, aes(x = Whole.arena.resting.time.5, y = Distance.travelled.5))
p5 + geom_line(aes(col = cohort))
p5 + geom_line() + facet_wrap(~cohort, ncol = 5)
p5 + geom_line(aes(col = cohort)) + facet_wrap(~cohort, ncol = 5)
mice.data %>%
filter(cohort %in% c("c015", "c016", "c017", "c018", "c020")) %>%
select(c("sex", "cohort", "strain", "Glucose.conc.0", "Glucose.conc.15", "Glucose.conc.30", "Glucose.conc.60", "Glucose.conc.120")) %>%
gather("time_point", "Glc_c", 4:8) %>%
mutate(time = as.numeric(str_extract(time_point, "\\d+"))) %>%
mutate(Glc_c = as.numeric(Glc_c)) %>%
group_by(strain, sex, time, cohort) %>%
summarise(Glc_avg = mean(Glc_c)) %>%
ggplot(aes(x=time, y=Glc_avg, color=strain)) +
geom_line()+
theme_bw() +
ylab("Glucose concentration [mmol/L]") +
facet_grid(cohort~sex)
ggplot lets you easily export your plots to a file with customized image quality attributes. ggsave
is the function that will do this for you. If you are familiar with \(\LaTeX\), you might want to consider exporting your plots as PDF files. This lets you scale them easily in your \(\LaTeX\) script and place them anywhere in the document without needing to worry about aspect ratio. Another appropriate option is SVG. However, a high-dpi png
or tiff
should do fine too.
ggsave(filename = "R_course_test_plot.png",
plot = density_hist_mice,
device = "png",
width = 15,
height = 7,
units = "in",
dpi = 1200)
plotly::ggplotly(g)