Metadata-Version: 2.1
Name: deepac
Version: 0.9.2
Summary: Predicting pathogenic potentials of novel DNA with reverse-complement neural networks.
Home-page: https://gitlab.com/rki_bioinformatics/DeePaC
Author: Jakub Bartoszewicz
Author-email: bartoszewiczj@rki.de
License: MIT
Keywords: deep learning DNA sequencing synthetic biology pathogenicity prediction
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3
Description-Content-Type: text/markdown
Requires-Dist: keras (>=2.2.4)
Requires-Dist: tensorflow
Requires-Dist: biopython
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: numpy
Requires-Dist: h5py
Requires-Dist: psutil

<!-- {#mainpage} -->

# DeePaC

DeePaC is a python package for predicting labels (e.g. pathogenic potentials) from short DNA sequences (e.g. Illumina 
reads) with reverse-complement neural networks. For details, see our preprint on bioRxiv: 
<https://www.biorxiv.org/content/10.1101/535286v2>.

Documentation can be found here:
<https://rki_bioinformatics.gitlab.io/DeePaC/>.


## Installation

### With conda
 [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/deepac/README.html)

You can install DeePaC with `conda`, and use it an a python package, or a CLI tool. Set up the [bioconda channel](
<https://bioconda.github.io/index.html#set-up-channels>) first, and remember to activate your conda environment before
 using DeePaC:
```
conda create -c bioconda -n my_env
conda activate my_env
conda install deepac
```

### With pip

You can install DeePaC with `pip`, and use it an a python package, or a CLI tool.
Remember to activate your virtual environment before using DeePaC:
```
virtualenv --system-site-packages my_env
source my_env/bin/activate
pip install deepac
```

### GPU support

To use GPUs, you need to install the GPU version of TensorFlow. It is easy in conda:
```
conda install tensorflow-gpu deepac
```

If you're using `pip`, you need to install CUDA and CuDNN first (see TensorFlow installation guide for details). Then
you can do the same as above:
```
pip uninstall tensorflow
pip install tensorflow-gpu
```


### Help

To see help, just use
```
deepac --help
deepac predict --help
deepac train --help
# Etc.
```

## Prediction

You can predict pathogenic potentials with one of the built-in models out of the box:
```
# A rapid CNN (trained on IMG/M data)
deepac predict -r input.fasta
# A sensitive LSTM (trained on IMG/M data)
deepac predict -s input.fasta
# With GPU support
deepac predict -s -g 1 input.fasta
```

The rapid and the sensitive models are trained to predict pathogenic potentials of novel bacterial species.
For details, see <https://www.biorxiv.org/content/10.1101/535286v2>.

To quickly filter your data according to predicted pathogenic potentials, you can use:
```
deepac predict -r input.fasta
deepac filter input.fasta input_predictions.npy -t 0.5
```
Note that after running `predict`, you can use the `input_predictions.npy` to filter your fasta file with different
thresholds. You can also add pathogenic potentials to the fasta headers in the output files:
```
deepac filter input.fasta input_predictions.npy -t 0.75 -p -o output-75.fasta
deepac filter input.fasta input_predictions.npy -t 0.9 -p -o output-90.fasta
```

## Preprocessing

For more complex analyzes, it can be useful to preprocess the fasta files by converting them to binary numpy arrays. Use:
```
deepac preproc preproc_config.ini
```
See the `config_templates` directory of the GitLab repository (https://gitlab.com/rki_bioinformatics/DeePaC/) for a sample configuration file.

## Training
You can use the built-in architectures to train a new model:
```
deepac train -r -g 1 -T train_data.npy -t train_labels.npy -V val_data.npy -v val_labels.npy
deepac train -s -g 1 -T train_data.npy -t train_labels.npy -V val_data.npy -v val_labels.npy

```

To train a new model based on you custom configuration, use
```
deepac train -c nn_train_config.ini
```

If you train an LSTM on a GPU, a CUDNNLSTM implementation will be used. To convert the resulting model to be 
CPU-compatible, use `deepac convert`. You can also use it to save the weights of a model, or recompile a model 
from a set of weights to use it with a different Python binary.

## Evaluation

To evaluate a trained model, use
```
# Read-by-read performance
deepac eval -r eval_config.ini
# Species-by-species performance
deepac eval -s eval_species_config.ini
# Ensemble performance
deepac eval -e eval_ens_config.ini
```
See the configs directory for sample configuration files. Note that `deepac eval -s` requires precomputed predictions 
and a csv file with a number of DNA reads for each species in each of the classes.



## Dependencies
DeePaC requires Tensorflow, Keras, Biopython, Scikit-learn and matplotlib. Python 3.4+ is supported.


