motrpac-rna-splicing-pipeline

Snakemake Alternative Splicing Pipeline

This repository contains a robust, cluster-ready Snakemake pipeline for analyzing alternative splicing from high-throughput bulk rna-seq sequencing data. The pipeline is designed to be modular, scalable, and easily adaptable to a variety of experimental designs and computational environments.


Features


Prerequisites


Getting Started

1. Clone the Repository

git clone https://github.com/stasrira/snakemake_alternative_splicing.git
cd snakemake_alternative_splicing

2. Prepare Your Data

Place your input data (FASTQ files) into a sub-folder fastq_raw (can be customized through data_path_required_folders variable) of the main data location. That location will be referenced in the data_path directory.

3. Create Your .env File

Create an .env file in the config directory to specify your environment variables. You can use the provided example file as a template:

cp config/.env_example config/.env

Edit config/.env to set the variables for your environment as needed.
See the example file here: config/.env_example

4. Create the Base Conda Environment

Important:
The base conda environment (conda_base) must be pre-created using the provided environment YAML file:

conda env create -p /desired/path/to/conda_base -f conda_envs/snakemake_base.yml

For multi-user setups:

5. Genome reference data

All the references (except noted below) used in this implementation are identical to the ones used here - https://github.com/yongchao/motrpac_rnaseq

Alternative Splicing related references:

rMATS

SUPPA


Configurable Variables

All arguments to the pipeline are passed as environment variables. These variables are categorized into two sets:

1. Variables for Configuring the alter_splicing.smk File

2. Variables for Configuring the Pipeline’s Path of Execution

The following variables can be configured for the pipeline, and their default values are listed below.

To view the complete list dynamically at runtime, set the help variable to True and run the pipeline as shown below:

export help=True
./run_pipeline.sh

This will display the help information and terminate the pipeline execution. Set help=False or unset the variable to enable normal operation afterward.


Running the Pipeline

  1. Set up your environment variables as needed.
  2. Execute the pipeline:
    ./run_pipeline.sh
    

File Structure


Example Usage

# Pre-create the shared base environment
conda env create -p /shared/path/to/conda_base -f conda_envs/snakemake_base.yml

# Set up your variables
export conda_base="/shared/path/to/conda_base"
export data_path="/path/to/your/data"

# set dry-run to True to view the list of steps to be executed without the actual run
export dry_run=True

# Run the pipeline
./run_pipeline.sh

Environment & Dependencies

Note:


Troubleshooting