callbacks
¶
Keras callback classes used by dcpg_train.py.
-
class
deepcpg.callbacks.
PerformanceLogger
(metrics=['loss', 'acc'], log_freq=0.1, precision=4, callbacks=[], verbose=<class 'bool'>, logger=<built-in function print>)[source]¶ Logs performance metrics during training.
Stores and prints performance metrics for each batch, epoch, and output.
Parameters: metrics: list
Name of metrics to be logged.
log_freq: float
Logging frequency as the percentage of training samples per epoch.
precision: int
Floating point precision.
callbacks: list
List of functions with parameters epoch, epoch_logs, and val_epoch_logs that are called at the end of each epoch.
verbose: bool
If True, log performance metrics of individual outputs.
logger: function
Logging function.
-
class
deepcpg.callbacks.
TrainingStopper
(max_time=None, stop_file=None, verbose=1, logger=<built-in function print>)[source]¶ Stop training after certain time or when file is detected.
Parameters: max_time: int
Maximum training time in seconds.
stop_file: str
Name of stop file that triggers the end of training when existing.
verbose: bool
If True, log message when training is stopped.
evaluation
¶
Functions for evaluating prediction performance.
-
deepcpg.evaluation.
evaluate
(y, z, mask=-1, metrics=[<function auc>, <function acc>, <function tpr>, <function tnr>, <function f1>, <function mcc>])[source]¶ Compute multiple performance metrics.
Computes evaluation metrics using functions in metrics.
Parameters: y: :class:`numpy.ndarray`
numpy.ndarray
vector with labels.z: :class:`numpy.ndarray`
numpy.ndarray
vector with predictions.mask: scalar
Value to mask unobserved labels in y.
metrics: list
List of evaluation functions to be used.
Returns: Ordered dict
Ordered dict with name of evaluation functions as keys and evaluation metrics as values.
-
deepcpg.evaluation.
evaluate_cat
(y, z, metrics=[<function cat_acc>], binary_metrics=None)[source]¶ Compute multiple performance metrics for categorical outputs.
Computes evaluation metrics for categorical (one-hot encoded labels) using functions in metrics.
Parameters: y: :class:`numpy.ndarray`
numpy.ndarray
matrix with one-hot encoded labels.z: :class:`numpy.ndarray`
numpy.ndarray
matrix with class probabilities in rows.metrics: list
List of evaluation functions to be used.
binary_metrics: list
List of binary evaluation metrics to be computed for each category, e.g. class, separately. Will be encoded as name_i in the output dictionary, where name is the name of the evaluation metrics and i the index of the category.
Returns: Ordered dict
Ordered dict with name of evaluation functions as keys and evaluation metrics as values.
-
deepcpg.evaluation.
evaluate_curve
(outputs, preds, fun=<function roc_curve>, mask=-1, nb_point=None)[source]¶ Evaluate performance curves of multiple outputs.
Given the labels and predictions of multiple outputs, computes a performance a curve, e.g. ROC or PR curve, for each output.
Parameters: outputs: dict
dict with the name of outputs as keys and a
numpy.ndarray
vector with labels as value.preds: dict
dict with the name of outputs as keys and a
numpy.ndarray
vector with predictions as value.fun: function
Function to compute the performance curves.
mask: scalar
Value to mask unobserved labels in y.
nb_point: int
Maximum number of points to curve to reduce memory.
Returns: pandas.DataFrame
with columns output, x, y, thr.
-
deepcpg.evaluation.
evaluate_outputs
(outputs, preds)[source]¶ Evaluate performance metrics of multiple outputs.
Given the labels and predictions of multiple outputs, chooses and computes performance metrics of each output depending on its name.
Parameters: outputs: dict
dict with the name of outputs as keys and a
numpy.ndarray
vector with labels as value.preds: dict
dict with the name of outputs as keys and a
numpy.ndarray
vector with predictions as value.Returns: pandas.DataFrame
with columns metric, output, value.
-
deepcpg.evaluation.
get_output_metrics
(output_name)[source]¶ Return list of evaluation metrics for model output name.
-
deepcpg.evaluation.
kendall
(y, z, nb_sample=100000)[source]¶ Compute Kendall’s correlation coefficient.
-
deepcpg.evaluation.
unstack_report
(report)[source]¶ Unstack performance report.
Reshapes a
pandas.DataFrame
ofevaluate_outputs()
such that performance metrics are listed as columns.Parameters: report: :class:`pandas.DataFrame`
Returns: pandas.DataFrame
with performance metrics as columns.
motifs
¶
Motif analysis.
-
deepcpg.motifs.
get_report
(filter_stats_file, tomtom_file, meme_motifs)[source]¶ Read and join filter_stats_file and tomtom_file.
Used by dcpg_filter_motifs.py to read and join output files.
Returns: pandas.DataFrame
with columns from Tomtom and statistic file.
-
deepcpg.motifs.
read_meme_db
(meme_db_file)[source]¶ Read MEME database as Pandas DataFrame.
Parameters: meme_db_file: str
File name of MEME database.
Returns: pandas.DataFrame
with columns ‘id’, ‘protein’, ‘url’.
utils
¶
General-purpose functions.
-
class
deepcpg.utils.
ProgressBar
(nb_tot, logger=<built-in function print>, interval=0.1)[source]¶ Vertical progress bar.
Unlike the progressbar2 package, logs progress as multiple lines instead of single line, which enables printing to a file. Used, for example, in
Parameters: nb_tot: int
Maximum value
logger: function
Function that takes a str and prints it.
interval: float
Logging frequency as fraction of one. For example, 0.1 logs every tenth value.
See also
dcpg_eval.py
-
deepcpg.utils.
filter_regex
(values, regexs)[source]¶ Filters list of values by list of regexs.
Returns: list
Sorted list of values in values that match any regex in regexs.
-
deepcpg.utils.
fold_dict
(data, nb_level=100000)[source]¶ Fold dict data.
Turns dictionary keys, e.g. ‘level1/level2/level3’, into sub-dicts, e.g. data[‘level1’][‘level2’][‘level3’].
Parameters: data: dict
dict to be folded.
nb_level: int
Maximum recursion depth.
Returns: dict
Folded dict.
-
deepcpg.utils.
format_table
(table, colwidth=None, precision=2, header=True, sep=' | ')[source]¶ Format a table of values as string.
Formats a table represented as a dict with keys as column headers and values as a lists of values in each column.
Parameters: table: `dict` or `OrderedDict`
dict or OrderedDict with keys as column headers and values as lists of values in each column.
precision: int or list of ints
Precision of floating point values in each column. If int, uses same precision for all columns, otherwise formats columns with different precisions.
header: bool
If True, print column names.
sep: str
Column separator.
Returns: str
String of formatted table values.
-
deepcpg.utils.
format_table_row
(values, widths=None, sep=' | ')[source]¶ Format a row with values of a table.
-
deepcpg.utils.
get_from_module
(identifier, module_params, ignore_case=True)[source]¶ Return object from module.
Return object with name identifier from module with items module_params.
Parameters: identifier: str
Name of object, e.g. a function, in module.
module_params: dict
dict of items in module, e.g. globals()
ignore_case: bool
If True, ignore case of identifier.
Returns: object
Object with name identifier in module, e.g. a function or class.
-
deepcpg.utils.
linear_weights
(length, start=0.1)[source]¶ Create linear-triangle weights.
Create array x of length length with linear weights, where the weight is highest (one) for the center x[length//2] and lowest (start ) at the ends x[0] and x[-1].
Parameters: length: int
Length of the weight array.
start: float
Minimum weights.
Returns: np.ndarray
Array of length length with weight.
-
deepcpg.utils.
make_dir
(dirname)[source]¶ Create directory dirname if non-existing.
Parameters: dirname: str
Path of directory to be created.
Returns: bool
True, if directory did not exist and was created.
-
deepcpg.utils.
move_columns_front
(frame, columns)[source]¶ Move columns of Pandas DataFrame to the front.
-
deepcpg.utils.
slice_dict
(data, idx)[source]¶ Slice elements in dict data by idx.
Slices array-like objects in data by index idx. data can be tree-like with sub-dicts, where the leafs must be sliceable by idx.
Parameters: data: dict
dict to be sliced.
idx: slice
Slice index.
Returns: dict
dict with same elements as in data with sliced by idx.