1Department of Pathology, Norwegian Radium Hospital, PO Box 4953 Nydalen, 0424 Oslo, Norway
2Department of Pathology, University Health Network, Toronto, Ontario, Canada
BayesPI-BAR2  is a package designed to predict how non-coding somatic mutations in cancer samples affect protein-DNA binding at the mutated place. Changes in binding of transcription factors to mutated regulatory sequences can lead to disrupted gene regulation, which may promote tumorigenesis. BayesPI-BAR2 takes into account the possibility for several nearby mutations to affect binding of the same protein. The predicted effects are tested for significance in the given patient cohort, and only those that appear in patient samples more frequently than expected by chance are reported.
BayesPI-BAR2 is written in Python 2. It includes our BayesPI2  software in binary form, which is available for Linux and OS X operating systems. Here is the full list of dependencies:
You can use the
pip install scipy matplotlib command to install the Python libraries. bedtools and samtools are included in many Linux repositories.
The BayesPI-BAR2 package can be downloaded here.
To test the basic functionality, go to the
demo/melanoma_small folder and run the command
python melanoma_small_pipeline.py . After downloading the reference human genome, the test pipeline should complete without errors in a few minutes and produce the result file,
data/skin_cancer_small/out/foreground/block_0_5_1295228_1295253/result.tsv with several ETS factors mentioned in it.
The package has four subfolders:
bin: the binaries of BayesPI2
demo: the two example pipelines,
melanoma_smallfor a quick test and
melanoma_fullfor a complete application
data: the folder from which the demos take their input data and where they put their outputs
python: the folder with the package Python source code
The main package is a set of command line tools residing in the
python folder. Run
python <tool_name.py> --help command to see the full usage information for a particular tool. The detailed description of every tool is here.
The package includes an example analysis pipeline which reproduces the known result about mutations in the TERT gene promoter that create binding sites for ETS family transcription factors. The pipeline calls the main package tools in appropriate sequence, reporting the progress of the computation.
To run the pipeline, go to the
demo/melanoma_full folder and run the following commands:
python get_and_preprocess_data.pyto download the input and reference data and preprocess it into the right format.
python bayespi_bar2_pipeline.pyto execute the main pipeline code. This will take about one full day of computation on a multi-core machine. The computation speed can be greatly improved if you run the pipeline on a cluster which supports the SLURM queue manager. Edit the
parallel_options.txtfile in the same folder to specify the desired parallelization configuration. Check the help of
bayespi_bar.pyfrom the main package to learn about the parallelization options.
python make_plots.pyto make the heatmaps for the significantly affected transcription factors in the foreground blocks.
The main pipeline script,
bayespi_bar2_pipeline.py, is designed to be robust to interruptions. If the pipeline execution was interrupted at any point, simply run the script again, and it will resume calculation from the place it was interrupted. You can see the progress of the computation as well as the main pipeline parameters in the log file, whose location is printed on the screen when the pipeline starts.
get_and_preprocess_data.py script will download about 2 Gb of data necessary for the pipeline. Here is the full list of additional files that will be downloaded:
bayespi_bar2_pipeline.py script is the starting point for users wishing to use BayesPI-BAR2 to process their own datasets. The instructions for customizing the default pipeline can be found here.