Model architectures

DeepCpG consists of a DNA model to recognize features in the DNA sequence, a CpG model to recognize features in the methylation neighborhood of multiple cells, and a Joint model to combine the features from the DNA and CpG model.

DeepCpG provides different architectures for the DNA, CpG, and joint model. Architectures differ in the number of layers and neurons, and are hence more or less complex. More complex models are usually more accurate, but more expensive to train. You can select a certain architecture using the --dna_model, --cpg_model, and --joint_model argument of dcpg_train.py, for example:

dcpg_train.py
    --dna_model CnnL2h128
    --cpg_model RnnL1
    --joint_model JointL2h512

In the following, the following layer specifications will be used:

Specification Description
conv[x@y] Convolutional layer with x filters of size y
mp[x] Max-pooling layer with size x
fc[x] Full-connected layer with x units
do Dropout layer
bgru[x] Bidirectional GRU with x units
gap Global average pooling layer
resb[x,y,z] Residual network with three bottleneck residual units of size x, y, z
resc[x,y,z] Residual network with three convolutional residual units of size x, y, z
resa[x,y,z] Residual network with three Atrous residual units of size x, y, z

DNA model architectures

Name Parameters Specification
CnnL1h128 4,100,000 conv[128@11]_mp[4]_fc[128]_do
CnnL1h256 8,100,000 conv[128@11]_mp[4]_fc[256]_do
CnnL2h128 4,100,000 conv[128@11]_mp[4]_conv[256@3]_mp[2]_fc[128]_do
CnnL2h256 8,100,000 conv[128@11]_mp[4]_conv[256@3]_mp[2]_fc[256]_do
CnnL3h128 4,400,000 conv[128@11]_mp[4]_conv[256@3]_mp[2]_conv[512@3]_mp[2]_fc[128]_do
CnnL3h256 8,300,000 conv[128@11]_mp[4]_conv[256@3]_mp[2]_conv[512@3]_mp[2]_fc[128]_do
CnnRnn01 1,100,000 conv[128@11]_pool[4]_conv[256@7]_pool[4]_bgru[256]_do
ResNet01 1,700,000 conv[128@11]_mp[2]_resb[2x128|2x256|2x512|1x1024]_gap_do
ResNet02 2,000,000 conv[128@11]_mp[2]_resb[3x128|3x256|3x512|1x1024]_gap_do
ResConv01 2,800,000 conv[128@11]_mp[2]_resc[2x128|1x256|1x256|1x512]_gap_do
ResAtrous01 2,000,000 conv[128@11]_mp[2]_resa[3x128|3x256|3x512|1x1024]_gap_do

Th prefixes Cnn, CnnRnn, ResNet, ResConv, and ResAtrous denote the class of the DNA model.

Models starting with Cnn are convolutional neural networks (CNNs). DeepCpG CNN architectures consist of a series of convolutional and max-pooling layers, which are followed by one fully-connected layer. Model CnnLxhy has x convolutional-pooling layers, and one fully-connected layer with y units. For example, CnnL2h128 has two convolutional layers, and one fully-connected layer with 128 units. CnnL3h256 has three convolutional layers and one fully-connected layer with 256 units. CnnL1h128 is the fastest model, but models with more layers and neurons usually perform better. In my experiments, CnnL2h128 provided a good trade-off between performance and runtime, which I recommend as default.

CnnRnn01 is a convolutional-recurrent neural network. It consists of two convolutional-pooling layers, which are followed by a bidirectional recurrent neural network (RNN) with one layer and gated recurrent units (GRUs). CnnRnn01 is slower than Cnn architectures and did not perform better in my experiments.

Models starting with ResNet are residual neural networks. ResNets are very deep networks with skip connections to improve the gradient flow and to allow learning how many layers to use. A residual network consists of multiple residual blocks, and each residual block consists of multiple residual units. Residual units have a bottleneck architecture with three convolutional layers to speed up computations. ResNet01 and ResNet02 have three residual blocks with two and three residual units, respectively. ResNets are slower than CNNs, but can perform better on large datasets.

Models starting with ResConv are ResNets with modified residual units that have two convolutional layers instead of a bottleneck architecture. ResConv models performed worse than ResNet models in my experiments.

Models starting with ResAtrous are ResNets with modified residual units that use Atrous convolutional layers instead of normal convolutional layers. Atrous convolutional layers have dilated filters, i.e. filters with ‘holes’, which allow scanning wider regions in the inputs sequence and thereby better capturing distant patters in the DNA sequence. However, ResAtrous models performed worse than ResNet models in my experiments

CpG model architectures

Name Parameters Specification
FcAvg 54,000 fc[512]_gap
RnnL1 810,000 fc[256]_bgru[256]_do
RnnL2 1,100,000 fc[256]_bgru[128]_bgru[256]_do

FcAvg is a lightweight model with only 54000 parameters, which first transforms observed neighboring CpG sites of all cells independently, and than averages the transformed features across cells. FcAvg is very fast, but performs worse than RNN models.

Rnn models consists of bidirectional recurrent neural networks (RNNs) with gated recurrent units (GRUs) to summarize the methylation neighborhood of cells in a more clever way than averaging. RnnL1 consists of one fully-connected layer with 256 units to transform the methylation neighborhood of each cell independently, and one bidirectional GRU with 2x256 units to summarize the transformed methylation neighborhood of cells. RnnL2 has two instead of one GRU layer. RnnL1 is faster and performed as good as RnnL2 in my experiments.

Joint model architectures

Name Parameters Specification
JointL0 0  
JointL1h512 524,000 fc[512]
JointL2h512 786,000 fc[512]_fc[512]
JointL3h512 1,000,000 fc[512]_fc[512]_fc[512]

Joint models join the feature from the DNA and CpG model. JointL0 simply concatenates the features and has no learnable parameters (ultra fast). JointLXh512 has X fully-connect layers with 512 neurons. Models with more layers usually perform better, at the cost of a higher runtime. I recommend using JointL2h512 or JointL3h12.