Curated Pipelines

Overview

OmicsPipelines comes with a number of curated pipelines for you to explore. These pipelines have been validated by the MoTrPAC Bioinfomatics Coordinating Center, and are designed to run in the cloud.

Available Pipelines

Proteomics

MSGF+

MSGF+ is a search engine for peptide identification in mass spectrometry-based proteomics. It is designed to search large databases of peptide sequences against tandem mass spectra.

The MSGF+ pipeline was designed in collaboration with the Pacific Northwest National Laboratory (PNNL) and the Broad Institute.

See the documentation for this pipeline here: MSGF+

MaxQuant

MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometry data sets. It is designed to identify and quantify proteins from shotgun proteomics data sets.

See the documentation for this pipeline here: MaxQuant

Genomics

RNA-seq

RNA-seq is a technique used to study the expression levels of genes in a sample. It involves sequentially breaking down the RNA molecules in a sample, and then using high-throughput DNA sequencing to determine the sequence of the RNA fragments. This allows researchers to study the expression levels of genes in a sample, and can be used to identify which genes are active in a given cell or tissue type.

The OmicsPipelines-provided RNA-seq pipeline is designed to process RNA-seq data from a number of samples. It takes as input a set of forward-end read files, reverse-end read files, and, optionally, index files (FASTQ/GZ format) containing the raw RNA-seq reads, and produces as output a BAM file containing the aligned reads. The pipeline also produces a number of QC metrics, including a gene expression matrix, and a number of plots. The references used for annotation are provided for humans and rats.

See the documentation for this pipeline here: RNA-seq

The GitHub repository for this pipeline is here: RNA-seq

WGS

In WGS (Whole-Genome Sequencing) a DNA library is sequenced using a high-throughput DNA sequencing platform. This generates a large number of short DNA sequence reads that represent the original genome. The final step in the pipeline is to analyze the sequence data to generate a complete picture of the genome. This typically involves aligning the sequence reads to a reference genome and using computational tools to identify variations in the genome, such as SNPs (single nucleotide polymorphisms) and indels (insertions and deletions). This information can then be used for a wide range of applications, such as identifying genetic risk factors for diseases or studying the evolution of species.

The OmicsPipelines WGS pipeline is designed to process WGS data from a number of instruments. These include Illimina, PacBio, and Oxford Nanopore. It takes as input a set of forward-end read files, reverse-end read files, and, optionally, index files (FASTQ/GZ or BAM) containing the raw RNA-seq reads, and produces indexed and aligned BAM files. It optionally provides annotations for the aligned reads. The references used for annotation are provided for humans and rats.

See the documentation for this pipeline here: WGS

The GitHub repository for this pipeline is here: WGS

RRBS

Reduced Representation Bisulfite Sequencing (RRBS) is a method for studying DNA methylation. It involves digesting DNA with a restriction enzyme, and then bisulfite converting the DNA. This converts unmethylated cytosines to uracils, and methylated cytosines to thymines. The DNA is then sequenced, and the methylation status of each cytosine can be determined by comparing the methylation status of the cytosine in the reference genome to the methylation status of the cytosine in the sequenced DNA.

The OmicsPipelines-provided RRBS pipeline is designed to process RRBS data from a number of samples. It takes as input a set of forward-end read files, reverse-end read files, and, optionally, index files (FASTQ/GZ format) containing the raw RRBS reads, and produces as output a BAM file containing the aligned reads. The pipeline also produces a number of QC metrics, including a methylation matrix, and a number of plots. The references used for annotation are provided for humans and rats.

See the documentation for this pipeline here: RRBS

The GitHub repository for this pipeline is here: RRBS

Curated Pipelines

Overview​

Available Pipelines​

Proteomics​

MSGF+​

MaxQuant​

Genomics​

RNA-seq​

WGS​

RRBS​