callbacks

Keras callback classes used by dcpg_train.py.

class deepcpg.callbacks.PerformanceLogger(metrics=['loss', 'acc'], log_freq=0.1, precision=4, callbacks=[], verbose=<class 'bool'>, logger=<built-in function print>)[source]

Logs performance metrics during training.

Stores and prints performance metrics for each batch, epoch, and output.

Parameters:

metrics: list

Name of metrics to be logged.

log_freq: float

Logging frequency as the percentage of training samples per epoch.

precision: int

Floating point precision.

callbacks: list

List of functions with parameters epoch, epoch_logs, and val_epoch_logs that are called at the end of each epoch.

verbose: bool

If True, log performance metrics of individual outputs.

logger: function

Logging function.

class deepcpg.callbacks.TrainingStopper(max_time=None, stop_file=None, verbose=1, logger=<built-in function print>)[source]

Stop training after certain time or when file is detected.

Parameters:

max_time: int

Maximum training time in seconds.

stop_file: str

Name of stop file that triggers the end of training when existing.

verbose: bool

If True, log message when training is stopped.

evaluation

Functions for evaluating prediction performance.

deepcpg.evaluation.acc(y, z, round=True)[source]

Compute accuracy.

deepcpg.evaluation.auc(y, z, round=True)[source]

Compute area under the ROC curve.

deepcpg.evaluation.cat_acc(y, z)[source]

Compute categorical accuracy given one-hot matrices.

deepcpg.evaluation.cor(y, z)[source]

Compute Pearson’s correlation coefficient.

deepcpg.evaluation.evaluate(y, z, mask=-1, metrics=[<function auc>, <function acc>, <function tpr>, <function tnr>, <function f1>, <function mcc>])[source]

Compute multiple performance metrics.

Computes evaluation metrics using functions in metrics.

Parameters:

y: :class:`numpy.ndarray`

numpy.ndarray vector with labels.

z: :class:`numpy.ndarray`

numpy.ndarray vector with predictions.

mask: scalar

Value to mask unobserved labels in y.

metrics: list

List of evaluation functions to be used.

Returns:

Ordered dict

Ordered dict with name of evaluation functions as keys and evaluation metrics as values.

deepcpg.evaluation.evaluate_cat(y, z, metrics=[<function cat_acc>], binary_metrics=None)[source]

Compute multiple performance metrics for categorical outputs.

Computes evaluation metrics for categorical (one-hot encoded labels) using functions in metrics.

Parameters:

y: :class:`numpy.ndarray`

numpy.ndarray matrix with one-hot encoded labels.

z: :class:`numpy.ndarray`

numpy.ndarray matrix with class probabilities in rows.

metrics: list

List of evaluation functions to be used.

binary_metrics: list

List of binary evaluation metrics to be computed for each category, e.g. class, separately. Will be encoded as name_i in the output dictionary, where name is the name of the evaluation metrics and i the index of the category.

Returns:

Ordered dict

Ordered dict with name of evaluation functions as keys and evaluation metrics as values.

deepcpg.evaluation.evaluate_curve(outputs, preds, fun=<function roc_curve>, mask=-1, nb_point=None)[source]

Evaluate performance curves of multiple outputs.

Given the labels and predictions of multiple outputs, computes a performance a curve, e.g. ROC or PR curve, for each output.

Parameters:

outputs: dict

dict with the name of outputs as keys and a numpy.ndarray vector with labels as value.

preds: dict

dict with the name of outputs as keys and a numpy.ndarray vector with predictions as value.

fun: function

Function to compute the performance curves.

mask: scalar

Value to mask unobserved labels in y.

nb_point: int

Maximum number of points to curve to reduce memory.

Returns:

pandas.DataFrame

pandas.DataFrame with columns output, x, y, thr.

deepcpg.evaluation.evaluate_outputs(outputs, preds)[source]

Evaluate performance metrics of multiple outputs.

Given the labels and predictions of multiple outputs, chooses and computes performance metrics of each output depending on its name.

Parameters:

outputs: dict

dict with the name of outputs as keys and a numpy.ndarray vector with labels as value.

preds: dict

dict with the name of outputs as keys and a numpy.ndarray vector with predictions as value.

Returns:

pandas.DataFrame

pandas.DataFrame with columns metric, output, value.

deepcpg.evaluation.f1(y, z, round=True)[source]

Compute F1 score.

deepcpg.evaluation.get(name)[source]

Return object from module by its name.

deepcpg.evaluation.get_output_metrics(output_name)[source]

Return list of evaluation metrics for model output name.

deepcpg.evaluation.is_binary_output(output_name)[source]

Return True if output_name is binary.

deepcpg.evaluation.kendall(y, z, nb_sample=100000)[source]

Compute Kendall’s correlation coefficient.

deepcpg.evaluation.mad(y, z)[source]

Compute mean absolute deviation.

deepcpg.evaluation.mcc(y, z, round=True)[source]

Compute Matthew’s correlation coefficient.

deepcpg.evaluation.mse(y, z)[source]

Compute mean squared error.

deepcpg.evaluation.rmse(y, z)[source]

Compute root mean squared error.

deepcpg.evaluation.tnr(y, z, round=True)[source]

Compute true negative rate.

deepcpg.evaluation.tpr(y, z, round=True)[source]

Compute true positive rate.

deepcpg.evaluation.unstack_report(report)[source]

Unstack performance report.

Reshapes a pandas.DataFrame of evaluate_outputs() such that performance metrics are listed as columns.

Parameters:

report: :class:`pandas.DataFrame`

Returns:

pandas.DataFrame

pandas.DataFrame with performance metrics as columns.

motifs

Motif analysis.

deepcpg.motifs.get_report(filter_stats_file, tomtom_file, meme_motifs)[source]

Read and join filter_stats_file and tomtom_file.

Used by dcpg_filter_motifs.py to read and join output files.

Returns:

pandas.DataFrame

pandas.DataFrame with columns from Tomtom and statistic file.

deepcpg.motifs.read_meme_db(meme_db_file)[source]

Read MEME database as Pandas DataFrame.

Parameters:

meme_db_file: str

File name of MEME database.

Returns:

pandas.DataFrame

pandas.DataFrame with columns ‘id’, ‘protein’, ‘url’.

deepcpg.motifs.read_tomtom(path)[source]

Read Tomtom output file.

utils

General-purpose functions.

class deepcpg.utils.ProgressBar(nb_tot, logger=<built-in function print>, interval=0.1)[source]

Vertical progress bar.

Unlike the progressbar2 package, logs progress as multiple lines instead of single line, which enables printing to a file. Used, for example, in

Parameters:

nb_tot: int

Maximum value

logger: function

Function that takes a str and prints it.

interval: float

Logging frequency as fraction of one. For example, 0.1 logs every tenth value.

See also

dcpg_eval.py

deepcpg.utils.filter_regex(values, regexs)[source]

Filters list of values by list of regexs.

Returns:

list

Sorted list of values in values that match any regex in regexs.

deepcpg.utils.fold_dict(data, nb_level=100000)[source]

Fold dict data.

Turns dictionary keys, e.g. ‘level1/level2/level3’, into sub-dicts, e.g. data[‘level1’][‘level2’][‘level3’].

Parameters:

data: dict

dict to be folded.

nb_level: int

Maximum recursion depth.

Returns:

dict

Folded dict.

deepcpg.utils.format_table(table, colwidth=None, precision=2, header=True, sep=' | ')[source]

Format a table of values as string.

Formats a table represented as a dict with keys as column headers and values as a lists of values in each column.

Parameters:

table: `dict` or `OrderedDict`

dict or OrderedDict with keys as column headers and values as lists of values in each column.

precision: int or list of ints

Precision of floating point values in each column. If int, uses same precision for all columns, otherwise formats columns with different precisions.

header: bool

If True, print column names.

sep: str

Column separator.

Returns:

str

String of formatted table values.

deepcpg.utils.format_table_row(values, widths=None, sep=' | ')[source]

Format a row with values of a table.

deepcpg.utils.get_from_module(identifier, module_params, ignore_case=True)[source]

Return object from module.

Return object with name identifier from module with items module_params.

Parameters:

identifier: str

Name of object, e.g. a function, in module.

module_params: dict

dict of items in module, e.g. globals()

ignore_case: bool

If True, ignore case of identifier.

Returns:

object

Object with name identifier in module, e.g. a function or class.

deepcpg.utils.linear_weights(length, start=0.1)[source]

Create linear-triangle weights.

Create array x of length length with linear weights, where the weight is highest (one) for the center x[length//2] and lowest (start ) at the ends x[0] and x[-1].

Parameters:

length: int

Length of the weight array.

start: float

Minimum weights.

Returns:

np.ndarray

Array of length length with weight.

deepcpg.utils.make_dir(dirname)[source]

Create directory dirname if non-existing.

Parameters:

dirname: str

Path of directory to be created.

Returns:

bool

True, if directory did not exist and was created.

deepcpg.utils.move_columns_front(frame, columns)[source]

Move columns of Pandas DataFrame to the front.

deepcpg.utils.slice_dict(data, idx)[source]

Slice elements in dict data by idx.

Slices array-like objects in data by index idx. data can be tree-like with sub-dicts, where the leafs must be sliceable by idx.

Parameters:

data: dict

dict to be sliced.

idx: slice

Slice index.

Returns:

dict

dict with same elements as in data with sliced by idx.

deepcpg.utils.to_list(value)[source]

Convert value to a list.