122 lines
5.0 KiB
Plaintext
Raw Normal View History

2025-01-12 00:52:51 +08:00
---
title: "An introduction to the gapmap package"
date: 2 September 2014
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{"An introduction to the gapmap package"}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
This document explains basic functions of the `gapmap` package to draw a gapped cluster heatmap. The plot is generated using the `ggplot2` package.
Let's load the library first.
```{r, message=FALSE}
library(gapmap)
```
We will simulate a simple dataset.
```{r}
set.seed(1234)
x <- rnorm(10, mean=rep(1:5, each=2), sd=0.4)
y <- rnorm(10, mean=rep(c(1,2), each=5), sd=0.4)
dataFrame <- data.frame(x=x, y=y, row.names=c(1:10))
#calculate distance matrix. default is Euclidean distance
distxy <- dist(dataFrame)
#perform hierarchical clustering. default is complete linkage.
hc <- hclust(distxy)
dend <- as.dendrogram(hc)
```
To make a gapped cluster heatmap, you need to pass a `matrix` object for heatmap, and `dendrogram` class objects for drawing dendrograms and ordering.
```{r, fig.width= 6.5, fig.height=6}
grey_scale =c("#333333", "#5C5C5C", "#757575", "#8A8A8A", "#9B9B9B", "#AAAAAA", "#B8B8B8", "#C5C5C5", "#D0D0D0", "#DBDBDB", "#E6E6E6")
gapmap(m = as.matrix(distxy), d_row= rev(dend), d_col=dend, col = grey_scale)
```
The default of `gapmap` function is in the **quantitative** mode and uses **exponential** mapping. First, you can choose either modes: **quantitative** or **threshold**.
* **quantitative**: the size of gap is quantitatively mapped to the distance/height at which two nodes merges in hierarchical clustering. For mapping the distance, you have 2 options: **linear** or **exponential** mapping.
+ **linear**: The distance and the size of a gap is linearly mapped.
+ **exponential**: The distance and the size of a gap is linearly mapped. The scale log base can be set with **scale** attribute. The default scale is set to 0.5.
* **threshold**: the gaps are introduced based on the "cutting the tree" method. By setting **threshold** attributes, or alternatively **row_threshold** and **col_threshold**, you can divide into clusters and introduce the gaps between clusters.
The following example uses the linear mapping. This mapping generate more gaps, whereas the previous example of exponential mapping emphasize on the large gaps.
```{r, fig.width= 6.5, fig.height=6}
gapmap(m = as.matrix(distxy), d_row= rev(dend), d_col=dend, mode = "quantitative", mapping="linear", col = grey_scale)
```
The following example illustrate the difference of two mapping schemes. For the exponential mapping, the scale log base is set to 0.5.
```{r, echo=FALSE, fig.width= 6.5, fig.height=6}
distances <-seq(0, 5, 0.1)
data <- data.frame(distance=distances)
s <- 0.5
l <- data
e <- data
for(i in 1:nrow(data)){
dist <- data$distance[i]
linear <- map(dist, 0, 5, 0, 1)
exp <- map.exp(dist, 0, 5, 0, 1, scale = s)
#print(paste0("dist =", dist," linear=",linear, " exp=", exp))
l$gap[i] = linear
e$gap[i] = exp
l$type[i] = "linear"
e$type[i] = "exponential"
}
gaps <- rbind(l, e)
ggplot(gaps, aes(x=gap, y=distance, group=type)) + geom_line(aes(color=type))+theme_bw()+ theme(legend.position= c(0.9,0.1))
```
The variation of scale log base settings is illustrated in the following plot. The value of scale is annotated on the plot.
```{r, echo=FALSE, fig.width= 6.5, fig.height=6}
scales <- seq(0.1, 3, 0.3)
distances <-seq(0, 5, 0.1)
D = data.frame()
for(j in 1:length(scales)){
s <- scales[j]
data <- data.frame(distance=distances)
for(i in 1:nrow(data)){
dist <- data$distance[i]
exp <- map.exp(dist, 0, 5, 0, 1, scale = s)
data$gap[i] = exp
data$scale[i] = s
}
D <- rbind(D, data)
}
labels = data.frame()
for(j in 1:length(scales)){
a = 0
b = 5
c = 0
d = 1
y = 0.4 # x position
x = scales[j]
v = a + ((y/(d-c))^x) *(b-a)
labels <- rbind(labels, data.frame(scale=x, distance =v, gap=y))
}
ggplot() + geom_line(data=D, aes(x=gap, y=distance, group=scale), color="#56B1F7")+ scale_y_continuous(limits = c(0,5))+
geom_text(data= labels, aes(x=gap,y=distance, label=scale), hjust=-0.2, vjust=0) +
geom_point(data= labels, aes(x=gap,y=distance)) +
theme_bw() + theme(legend.position="none")
```
Besides the **quantitative** mode, there is **linear** mode to introduce gap by a threshold. In the following example, the dendrograms for rows and columns are cut at the threshold distance of 2 and gaps of the same size are introduced between clusters.
```{r, fig.width= 6.5, fig.height=6}
gapmap(m = as.matrix(distxy), d_row= rev(dend), d_col=dend, mode = "threshold", row_threshold = 2, col_threshold = 2, col = grey_scale)
```
In addition, this package works well with our dendrogram sorting package, called `dendsort`. For the details on dendsort, please check our [paper](https://f1000research.com/articles/3-177/v1).
```{r, fig.width= 6.5, fig.height=6}
library(dendsort);
gapmap(m = as.matrix(distxy), d_row= rev(dendsort(dend)), d_col=dendsort(dend), mode = "quantitative", col = grey_scale)
```