model

Package for building and training DeepCpG modules.

model.utils

Functions for building, training, and loading models.

class deepcpg.models.utils.DataReader(output_names=None, use_dna=True, dna_wlen=None, replicate_names=None, cpg_wlen=None, cpg_max_dist=25000, encode_replicates=False)[source]

Read data from dcpg_data.py output files.

Generator to read data batches from dcpg_data.py output files. Reads data using hdf.reader() and pre-processes data.

Parameters:
output_names: list

Names of outputs to be read.

use_dna: bool

If True, read DNA sequence windows.

dna_wlen: int

Maximum length of DNA sequence windows.

replicate_names: list

Name of cells (profiles) whose neighboring CpG sites are read.

cpg_wlen: int

Maximum number of neighboring CpG sites.

cpg_max_dist: int

Value to threshold the distance of neighboring CpG sites.

encode_replicates: bool

If True, encode replicated names in key of returned dict. This option is deprecated and will be removed in the future.

Returns:
tuple

dict (inputs, outputs, weights), where inputs, outputs, weights is a dict of model inputs, outputs, and output weights. outputs and weights are not returned if output_names is undefined.

class deepcpg.models.utils.Model(dropout=0.0, l1_decay=0.0, l2_decay=0.0, init='glorot_uniform')[source]

Abstract model call.

Abstract class of DNA, CpG, and Joint models.

Parameters:
dropout: float

Dropout rate.

l1_decay: float

L1 weight decay.

l2_decay: float

L2 weight decay.

init: str

Name of Keras initialization.

inputs(*args, **kwargs)[source]

Return list of Keras model inputs.

class deepcpg.models.utils.ScaledSigmoid(scaling=1.0, **kwargs)[source]

Scaled sigmoid activation function.

Scales the maximum of the sigmoid function from one to the provided value.

Parameters:
scaling: float

Maximum of sigmoid function.

call(x, mask=None)[source]

This is where the layer’s logic lives.

# Arguments
inputs: Input tensor, or list/tuple of input tensors. **kwargs: Additional keyword arguments.
# Returns
A tensor or list/tuple of tensors.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

# Returns
Python dictionary.
deepcpg.models.utils.add_output_layers(stem, output_names, init='glorot_uniform')[source]

Add and return outputs to a given layer.

Adds output layer for each output in output_names to layer stem.

Parameters:
stem: Keras layer

Keras layer to which output layers are added.

output_names: list

List of output names.

Returns:
list

Output layers added to stem.

deepcpg.models.utils.copy_weights(src_model, dst_model, must_exist=True)[source]

Copy weights from src_model to dst_model.

Parameters:
src_model

Keras source model.

dst_model

Keras destination model.

must_exist: bool

If True, raises ValueError if a layer in dst_model does not exist in src_model.

Returns:
list

Names of layers that were copied.

deepcpg.models.utils.data_reader_from_model(model, outputs=True, replicate_names=None)[source]

Return DataReader from model.

Builds a DataReader for reading data for model.

Parameters:
model: :class:`Model`.

Model.

outputs: bool

If True, return output labels.

replicate_names: list

Name of input cells of model.

Returns:
:class:`DataReader`

Instance of DataReader.

deepcpg.models.utils.decode_replicate_names(replicate_names)[source]

Decode string of replicate names and return names as list.

Note

Deprecated This function is used to support legacy models and will be removed in the future.

deepcpg.models.utils.encode_replicate_names(replicate_names)[source]

Encode list of replicate names as single string.

Note

Deprecated This function is used to support legacy models and will be removed in the future.

deepcpg.models.utils.evaluate_generator(model, generator, return_data=False, *args, **kwargs)[source]

Evaluate model on generator.

Uses predict_generator to obtain predictions and ev.evaluate to evaluate predictions.

Parameters:
model

Model to be evaluated.

generator

Data generator.

return_rate: bool

Return predictions and labels.

*args: list

Unnamed arguments passed to predict_generator.

*kwargs: dict

Named arguments passed to predict_generator.

Returns:
If `return_data=False`, pandas data frame with performance metrics. If
`return_data=True`, tuple (`perf`, `data`) with performance metrics `perf`
and `data`.
deepcpg.models.utils.get_first_conv_layer(layers, get_act=False)[source]

Return the first convolutional layers in a stack of layer.

Parameters:
layers: list

List of Keras layers.

get_act: bool

Return the activation layer after the convolutional weight layer.

Returns:
Keras layer

Convolutional layer or tuple of convolutional layer and activation layer if get_act=True.

deepcpg.models.utils.get_objectives(output_names)[source]

Return training objectives for a list of output names.

Returns:
dict

dict with output_names as keys and the name of the assigned Keras objective as values.

deepcpg.models.utils.get_sample_weights(y, class_weights=None)[source]

Compute sample weights for model training.

Computes sample weights given a vector of output labels y. Sets weights of samples without label (CPG_NAN) to zero.

Parameters:
y: :class:`numpy.ndarray`

1d numpy array of output labels.

class_weights: dict

Weight of output classes, e.g. methylation states.

Returns:
:class:`numpy.ndarray`

Sample weights of size y.

deepcpg.models.utils.is_input_layer(layer)[source]

Test if layer is an input layer.

deepcpg.models.utils.is_output_layer(layer, model)[source]

Test if layer is an output layer.

deepcpg.models.utils.load_model(model_files, custom_objects={'ScaledSigmoid': <class 'deepcpg.models.utils.ScaledSigmoid'>}, log=None)[source]

Load Keras model from a list of model files.

Loads Keras model from list of filenames, e.g. from search_model_files. model_files can be single HDF5 file, or JSON and weights file.

Parameters:
model_file: list

Input model file names.

custom_object: dict

Custom objects for loading models that were trained with custom objects, e.g. ScaledSigmoid.

Returns:
Keras model.
deepcpg.models.utils.predict_generator(model, generator, nb_sample=None)[source]

Predict model outputs using generator.

Calls model.predict for at most nb_sample samples from generator.

Parameters:
model: Keras model

Model to be evaluated.

generator: generator

Data generator.

nb_sample: int

Maximum number of samples.

Returns:
list

list [inputs, outputs, predictions].

deepcpg.models.utils.read_from(reader, nb_sample=None)[source]

Read nb_sample samples from reader.

deepcpg.models.utils.save_model(model, model_file, weights_file=None)[source]

Save Keras model to file.

If model_file ends with ‘.h5’, saves model description and model weights in HDF5 file. Otherwise, saves JSON model description in model_file and model weights in weights_file if provided.

Parameters:
model

Keras model.

model_file: str

Output file.

weights_file: str

Weights file.

deepcpg.models.utils.search_model_files(dirname)[source]

Search model files in given directory.

Parameters:
dirname: str

Directory name

Returns:
Model JSON file and weights if existing, otherwise HDF5 file. None if no
model files could be found.

model.cpg

CpG models.

Provides models trained with observed neighboring methylation states of multiple cells.

class deepcpg.models.cpg.CpgModel(*args, **kwargs)[source]

Abstract class of a CpG model.

inputs(cpg_wlen, replicate_names)[source]

Return list of Keras model inputs.

class deepcpg.models.cpg.FcAvg(*args, **kwargs)[source]

Fully-connected layer followed by global average layer.

Parameters: 54,000
Specification: fc[512]_gap
class deepcpg.models.cpg.RnnL1(act_replicate='relu', *args, **kwargs)[source]

Bidirectional GRU with one layer.

Parameters: 810,000
Specification: fc[256]_bgru[256]_do
class deepcpg.models.cpg.RnnL2(act_replicate='relu', *args, **kwargs)[source]

Bidirectional GRU with two layers.

Parameters: 1,100,000
Specification: fc[256]_bgru[128]_bgru[256]_do
deepcpg.models.cpg.get(name)[source]

Return object from module by its name.

deepcpg.models.cpg.list_models()[source]

Return the name of models in the module.

model.dna

DNA models.

Provides models trained with DNA sequence windows.

class deepcpg.models.dna.CnnL1h128(nb_hidden=128, *args, **kwargs)[source]

CNN with one convolutional and one fully-connected layer with 128 units.

Parameters: 4,100,000
Specification: conv[128@11]_mp[4]_fc[128]_do
class deepcpg.models.dna.CnnL1h256(*args, **kwargs)[source]

CNN with one convolutional and one fully-connected layer with 256 units.

Parameters: 8,100,000
Specification: conv[128@11]_mp[4]_fc[256]_do
class deepcpg.models.dna.CnnL2h128(nb_hidden=128, *args, **kwargs)[source]

CNN with two convolutional and one fully-connected layer with 128 units.

Parameters: 4,100,000
Specification: conv[128@11]_mp[4]_conv[256@3]_mp[2]_fc[128]_do
class deepcpg.models.dna.CnnL2h256(*args, **kwargs)[source]

CNN with two convolutional and one fully-connected layer with 256 units.

Parameters: 8,100,000
Specification: conv[128@11]_mp[4]_conv[256@3]_mp[2]_fc[256]_do
class deepcpg.models.dna.CnnL3h128(nb_hidden=128, *args, **kwargs)[source]

CNN with three convolutional and one fully-connected layer with 128 units.

Parameters: 4,400,000
Specification: conv[128@11]_mp[4]_conv[256@3]_mp[2]_conv[512@3]_mp[2]_
               fc[128]_do
class deepcpg.models.dna.CnnL3h256(*args, **kwargs)[source]

CNN with three convolutional and one fully-connected layer with 256 units.

Parameters: 8,300,000
Specification: conv[128@11]_mp[4]_conv[256@3]_mp[2]_conv[512@3]_mp[2]_
               fc[256]_do
class deepcpg.models.dna.CnnRnn01(*args, **kwargs)[source]

Convolutional-recurrent model.

Convolutional-recurrent model with two convolutional layers followed by a bidirectional GRU layer.

Parameters: 1,100,000
Specification: conv[128@11]_pool[4]_conv[256@7]_pool[4]_bgru[256]_do
class deepcpg.models.dna.DnaModel(*args, **kwargs)[source]

Abstract class of a DNA model.

inputs(dna_wlen)[source]

Return list of Keras model inputs.

class deepcpg.models.dna.ResAtrous01(*args, **kwargs)[source]

Residual network with Atrous (dilated) convolutional layers.

Residual network with Atrous (dilated) convolutional layer in bottleneck units. Atrous convolutional layers allow to increase the receptive field and hence better model long-range dependencies.

Parameters: 2,000,000
Specification: conv[128@11]_mp[2]_resa[3x128|3x256|3x512|1x1024]_gap_do

He et al., ‘Identity Mappings in Deep Residual Networks.’ Yu and Koltun, ‘Multi-Scale Context Aggregation by Dilated Convolutions.’

class deepcpg.models.dna.ResConv01(*args, **kwargs)[source]

Residual network with two convolutional layers in each residual unit.

Parameters: 2,800,000
Specification: conv[128@11]_mp[2]_resc[2x128|1x256|1x256|1x512]_gap_do

He et al., ‘Identity Mappings in Deep Residual Networks.’

class deepcpg.models.dna.ResNet01(*args, **kwargs)[source]

Residual network with bottleneck residual units.

Parameters: 1,700,000
Specification: conv[128@11]_mp[2]_resb[2x128|2x256|2x512|1x1024]_gap_do

He et al., ‘Identity Mappings in Deep Residual Networks.’

class deepcpg.models.dna.ResNet02(*args, **kwargs)[source]

Residual network with bottleneck residual units.

Parameters: 2,000,000
Specification: conv[128@11]_mp[2]_resb[3x128|3x256|3x512|1x1024]_gap_do

He et al., ‘Identity Mappings in Deep Residual Networks.’

deepcpg.models.dna.get(name)[source]

Return object from module by its name.

deepcpg.models.dna.list_models()[source]

Return the name of models in the module.