ATAC-seq metadata and QC — ATAC_META • MotrpacRatTraining6moData

ATAC-seq experimental and quantification QC metrics for chromatin accessibility (ATAC) data

Usage

ATAC_META

Format

A data frame with 520 rows and 106 variables:

viallabel: character, sample identifier
general.description: character, pipeline workflow description
replicate: character, replicate in the pipeline workflow
general.date: double, date the pipeline workflow was run
general.title: character, pipeline workflow title
general.pipeline_ver: character, ENCODE ATAC-seq pipeline version
general.pipeline_type: character, "atac"
general.genome: character, reference genome
general.aligner: character, read aligner
general.peak_caller: character, peak caller
general.seq_endedness.paired_end: logical, are the reads paired-ended
replication.num_peaks.num_peaks: integer, number of replication peaks (max 300000)
peak_stat.peak_region_size.min_size: integer, minimum peak width
peak_stat.peak_region_size.25_pct: integer, 25th percentile of peak width
peak_stat.peak_region_size.50_pct: integer, 50th percentile of peak width
peak_stat.peak_region_size.75_pct: integer, 75th percentile of peak width
peak_stat.peak_region_size.max_size: integer, max peak width
peak_stat.peak_region_size.mean: double, mean peak width
peak_enrich.frac_reads_in_peaks.macs2.frip: double, fraction of reads in MACS2 peaks
align.samstat.total_reads: integer, total number of alignments, including multimappers
align.samstat.mapped_reads: integer, total number of mapped reads
align.samstat.pct_mapped_reads: double, percent of reads that mapped
align.samstat.paired_reads: integer, number of paired reads
align.samstat.read1: integer, number of read 1 reads
align.samstat.read2: integer, number of read 2 reads
align.samstat.properly_paired_reads: integer, number of properly paired reads
align.samstat.pct_properly_paired_reads: double, percent of reads that were properly paired
align.samstat.with_itself: integer, number of reads paired with its pair
align.samstat.singletons: integer, number of singleton reads
align.samstat.pct_singletons: double, percent of reads that were singleton
align.samstat.diff_chroms: integer
align.dup.paired_reads: integer, number of paired reads for duplication step
align.dup.paired_duplicate_reads: integer, number of duplicate paired reads
align.dup.paired_optical_duplicate_reads: integer, number of optical duplicate paired reads
align.dup.pct_duplicate_reads: double, percent of reads that are duplicate
align.frac_mito.non_mito_reads: integer, percent of reads that align to non-mitochondrial DNA
align.frac_mito.mito_reads: integer, number of reads that align to mitochondrial DNA
align.frac_mito.frac_mito_reads: double, fraction of reads that align to mitochondrial DNA
align.nodup_samstat.total_reads: integer, number of reads after applying all filters
align.nodup_samstat.mapped_reads: integer, number of mapped reads after applying all filters
align.nodup_samstat.paired_reads: integer, number of paired reads after applying all filters
align.nodup_samstat.read1: integer, number of read 1 reads after applying all filters
align.nodup_samstat.read2: integer, number of read 2 reads after applying all filters
align.nodup_samstat.properly_paired_reads: integer, number of properly paired reads after applying all filters
align.nodup_samstat.with_itself: integer, number of reads paired with its pair after applying all filters
align.frag_len_stat.frac_reads_in_nfr: double, fraction of reads in nucelosome-free region. Should be a value greater than 0.4.
align.frag_len_stat.frac_reads_in_nfr_qc_pass: logical, does align.frag_len_stat.frac_reads_in_nfr pass the cutoff?
align.frag_len_stat.frac_reads_in_nfr_qc_reason: character, reason for align.frag_len_stat.frac_reads_in_nfr_qc_pass
align.frag_len_stat.nfr_over_mono_nuc_reads: double, reads in nucleosome-free-region versus reads in mononucleosomal peak. Should be a value greater than 2.5.
align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass: logical, does align.frag_len_stat.frac_reads_in_nfr_qc_pass pass the cutoff?
align.frag_len_stat.nfr_over_mono_nuc_reads_qc_reason: character, reason for align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass
align.frag_len_stat.nfr_peak_exists: logical, does a nucleosome-free peak exist?
align.frag_len_stat.mono_nuc_peak_exists: logical, does a mono-nucleosomal peak exist?
align.frag_len_stat.di_nuc_peak_exists: logical, does a di-nucleosomal peak exist?
lib_complexity.lib_complexity.total_fragments: integer, total number of fragments
lib_complexity.lib_complexity.distinct_fragments: integer, number of distinct fragments
lib_complexity.lib_complexity.positions_with_one_read: integer, number of positions with one read
lib_complexity.lib_complexity.NRF: double, non-reduandant fraction. Measure of library complexity. Ideally >0.9
lib_complexity.lib_complexity.PBC1: double, PCR bottlenecking coefficient 1. Measure of library complexity. Ideally >0.9
lib_complexity.lib_complexity.PBC2: double PCR bottlenecking coefficient 2. Measure of library complexity. Ideally >3
align_enrich.tss_enrich.tss_enrich: double, transcription start site enrichment of peaks
2D_barcode: double, sample barcode
Tissue: character, tissue description
Species: character, species
Sample_category: character, study sample ("study") or reference standard ("ref)
GET_site: character, which Genomics, Epigenomics, and Transcriptomics (GET) site performed the assay, "Stanford" or "MSSM" (Icahn School of Medicine at Mount Sinai)
Sample_batch: integer, numeric batch number for batch in which this sample was manually processed
Lib_adapter_1: character, Adapter sequence for read 1
Lib_adapter_2: character, Adapter sequence for read 2
Lib_index_1: character, i7 index
Lib_index_2: character, i5 index
Nuclei_extr_date: character, nuclei extraction date
Nuclei_extr_count: integer, nuclei count
Nuclei_tagmentation: integer, number of nuclei used in each tagmentation reaction
Tagmentation_date: character, tagmentation date, MM/DD/YYYY format
Tagmentation_enzyme_cat: integer, catalog number of tagmentation enzyme TDE1 (Tn5)
Tagmentation_enzyme_lot: integer, lot number of tagmentation enzyme TDE1 (Tn5)
Tagmentation_buffer_cat: integer, catalog number of tagmentation buffer
Tagmentation_buffer_lot: integer, lot number of tagmentation buffer
Tagmentation_reaction_vol: integer, volume of tagmentation (uL)
Tagmentation_purification_kit: character, purification kit
Tagmentation_purified_DNA_vol: double, volume of purified DNA (uL)
PCR_date: character, PCR date, MM/DD/YYYY format
PCR_cycle_nr: integer, number of PCR cycles
PCR_purification_beads_ul: character, volume of SPRIselect beads for lower size selection
Lib_DNA_conc: double, DNA concentration for the library (ng/uL)
Lib_DNA_molarity: double, DNA molarity of library (nM)
Lib_frag_size: integer, average library fragment size
Lib_BA_quality: integer, visual inspection of the library quality with the Bioanalyzer track (1=good, 0=bad)
Seq_DNA_molarity: double, DNA molarity for sequencing (nM)
Seq_platform: character, sequencing platform
Seq_date: integer, sequencing date, YYMMDD format
Seq_machine_ID: character, serial number of the sequencer
Seq_flowcell_ID: character, flow cell ID
Seq_flowcell_run: integer, flow cell run
Seq_flowcell_lane: character, flow cell lane
Seq_flowcell_type: character, flow cell type, e.g., S4
Seq_length: double, read length
Seq_end_type: integer, 1=single-end, 2=paired-end
total_primary_alignments: integer, number of primary alignments
pct_chrX: double, number of reads mapped to chromosome X
pct_chrY: double, number of reads mapped to chromosome Y
pct_chrM: double, number of reads mapped to chromosome M
pct_auto: double, number of reads mapped to autosomal chromosomes
pct_contig: double, number of reads mapped to contigs
Seq_batch: character, unique identifier for sequencing batch

Source

pass1b-06/results/epigenomics/qa-qc/motrpac_pass1b-06_epigen-atac-seq_qa-qc-metrics.csv

Details

The ENCODE ATAC-seq pipeline v1.7.0 was used to quantify ATAC-seq data. Columns with a period are QC metrics from this pipeline. Note that the ENCODE pipeline reports alignments per paired-end read, so align.samstat.total_reads reports the number of paired-end reads that align, which corresponds to twice the number of sequenced fragments.