Skip to contents

ATAC-seq experimental and quantification QC metrics for chromatin accessibility (ATAC) data

Usage

ATAC_META

Format

A data frame with 520 rows and 106 variables:

viallabel

character, sample identifier

general.description

character, pipeline workflow description

replicate

character, replicate in the pipeline workflow

general.date

double, date the pipeline workflow was run

general.title

character, pipeline workflow title

general.pipeline_ver

character, ENCODE ATAC-seq pipeline version

general.pipeline_type

character, "atac"

general.genome

character, reference genome

general.aligner

character, read aligner

general.peak_caller

character, peak caller

general.seq_endedness.paired_end

logical, are the reads paired-ended

replication.num_peaks.num_peaks

integer, number of replication peaks (max 300000)

peak_stat.peak_region_size.min_size

integer, minimum peak width

peak_stat.peak_region_size.25_pct

integer, 25th percentile of peak width

peak_stat.peak_region_size.50_pct

integer, 50th percentile of peak width

peak_stat.peak_region_size.75_pct

integer, 75th percentile of peak width

peak_stat.peak_region_size.max_size

integer, max peak width

peak_stat.peak_region_size.mean

double, mean peak width

peak_enrich.frac_reads_in_peaks.macs2.frip

double, fraction of reads in MACS2 peaks

align.samstat.total_reads

integer, total number of alignments, including multimappers

align.samstat.mapped_reads

integer, total number of mapped reads

align.samstat.pct_mapped_reads

double, percent of reads that mapped

align.samstat.paired_reads

integer, number of paired reads

align.samstat.read1

integer, number of read 1 reads

align.samstat.read2

integer, number of read 2 reads

align.samstat.properly_paired_reads

integer, number of properly paired reads

align.samstat.pct_properly_paired_reads

double, percent of reads that were properly paired

align.samstat.with_itself

integer, number of reads paired with its pair

align.samstat.singletons

integer, number of singleton reads

align.samstat.pct_singletons

double, percent of reads that were singleton

align.samstat.diff_chroms

integer

align.dup.paired_reads

integer, number of paired reads for duplication step

align.dup.paired_duplicate_reads

integer, number of duplicate paired reads

align.dup.paired_optical_duplicate_reads

integer, number of optical duplicate paired reads

align.dup.pct_duplicate_reads

double, percent of reads that are duplicate

align.frac_mito.non_mito_reads

integer, percent of reads that align to non-mitochondrial DNA

align.frac_mito.mito_reads

integer, number of reads that align to mitochondrial DNA

align.frac_mito.frac_mito_reads

double, fraction of reads that align to mitochondrial DNA

align.nodup_samstat.total_reads

integer, number of reads after applying all filters

align.nodup_samstat.mapped_reads

integer, number of mapped reads after applying all filters

align.nodup_samstat.paired_reads

integer, number of paired reads after applying all filters

align.nodup_samstat.read1

integer, number of read 1 reads after applying all filters

align.nodup_samstat.read2

integer, number of read 2 reads after applying all filters

align.nodup_samstat.properly_paired_reads

integer, number of properly paired reads after applying all filters

align.nodup_samstat.with_itself

integer, number of reads paired with its pair after applying all filters

align.frag_len_stat.frac_reads_in_nfr

double, fraction of reads in nucelosome-free region. Should be a value greater than 0.4.

align.frag_len_stat.frac_reads_in_nfr_qc_pass

logical, does align.frag_len_stat.frac_reads_in_nfr pass the cutoff?

align.frag_len_stat.frac_reads_in_nfr_qc_reason

character, reason for align.frag_len_stat.frac_reads_in_nfr_qc_pass

align.frag_len_stat.nfr_over_mono_nuc_reads

double, reads in nucleosome-free-region versus reads in mononucleosomal peak. Should be a value greater than 2.5.

align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass

logical, does align.frag_len_stat.frac_reads_in_nfr_qc_pass pass the cutoff?

align.frag_len_stat.nfr_over_mono_nuc_reads_qc_reason

character, reason for align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass

align.frag_len_stat.nfr_peak_exists

logical, does a nucleosome-free peak exist?

align.frag_len_stat.mono_nuc_peak_exists

logical, does a mono-nucleosomal peak exist?

align.frag_len_stat.di_nuc_peak_exists

logical, does a di-nucleosomal peak exist?

lib_complexity.lib_complexity.total_fragments

integer, total number of fragments

lib_complexity.lib_complexity.distinct_fragments

integer, number of distinct fragments

lib_complexity.lib_complexity.positions_with_one_read

integer, number of positions with one read

lib_complexity.lib_complexity.NRF

double, non-reduandant fraction. Measure of library complexity. Ideally >0.9

lib_complexity.lib_complexity.PBC1

double, PCR bottlenecking coefficient 1. Measure of library complexity. Ideally >0.9

lib_complexity.lib_complexity.PBC2

double PCR bottlenecking coefficient 2. Measure of library complexity. Ideally >3

align_enrich.tss_enrich.tss_enrich

double, transcription start site enrichment of peaks

2D_barcode

double, sample barcode

Tissue

character, tissue description

Species

character, species

Sample_category

character, study sample ("study") or reference standard ("ref)

GET_site

character, which Genomics, Epigenomics, and Transcriptomics (GET) site performed the assay, "Stanford" or "MSSM" (Icahn School of Medicine at Mount Sinai)

Sample_batch

integer, numeric batch number for batch in which this sample was manually processed

Lib_adapter_1

character, Adapter sequence for read 1

Lib_adapter_2

character, Adapter sequence for read 2

Lib_index_1

character, i7 index

Lib_index_2

character, i5 index

Nuclei_extr_date

character, nuclei extraction date

Nuclei_extr_count

integer, nuclei count

Nuclei_tagmentation

integer, number of nuclei used in each tagmentation reaction

Tagmentation_date

character, tagmentation date, MM/DD/YYYY format

Tagmentation_enzyme_cat

integer, catalog number of tagmentation enzyme TDE1 (Tn5)

Tagmentation_enzyme_lot

integer, lot number of tagmentation enzyme TDE1 (Tn5)

Tagmentation_buffer_cat

integer, catalog number of tagmentation buffer

Tagmentation_buffer_lot

integer, lot number of tagmentation buffer

Tagmentation_reaction_vol

integer, volume of tagmentation (uL)

Tagmentation_purification_kit

character, purification kit

Tagmentation_purified_DNA_vol

double, volume of purified DNA (uL)

PCR_date

character, PCR date, MM/DD/YYYY format

PCR_cycle_nr

integer, number of PCR cycles

PCR_purification_beads_ul

character, volume of SPRIselect beads for lower size selection

Lib_DNA_conc

double, DNA concentration for the library (ng/uL)

Lib_DNA_molarity

double, DNA molarity of library (nM)

Lib_frag_size

integer, average library fragment size

Lib_BA_quality

integer, visual inspection of the library quality with the Bioanalyzer track (1=good, 0=bad)

Seq_DNA_molarity

double, DNA molarity for sequencing (nM)

Seq_platform

character, sequencing platform

Seq_date

integer, sequencing date, YYMMDD format

Seq_machine_ID

character, serial number of the sequencer

Seq_flowcell_ID

character, flow cell ID

Seq_flowcell_run

integer, flow cell run

Seq_flowcell_lane

character, flow cell lane

Seq_flowcell_type

character, flow cell type, e.g., S4

Seq_length

double, read length

Seq_end_type

integer, 1=single-end, 2=paired-end

total_primary_alignments

integer, number of primary alignments

pct_chrX

double, number of reads mapped to chromosome X

pct_chrY

double, number of reads mapped to chromosome Y

pct_chrM

double, number of reads mapped to chromosome M

pct_auto

double, number of reads mapped to autosomal chromosomes

pct_contig

double, number of reads mapped to contigs

Seq_batch

character, unique identifier for sequencing batch

Source

pass1b-06/results/epigenomics/qa-qc/motrpac_pass1b-06_epigen-atac-seq_qa-qc-metrics.csv

Details

The ENCODE ATAC-seq pipeline v1.7.0 was used to quantify ATAC-seq data. Columns with a period are QC metrics from this pipeline. Note that the ENCODE pipeline reports alignments per paired-end read, so align.samstat.total_reads reports the number of paired-end reads that align, which corresponds to twice the number of sequenced fragments.