ATAC-seq experimental and quantification QC metrics for chromatin accessibility (ATAC) data
Format
A data frame with 520 rows and 106 variables:
- viallabel
- character, sample identifier 
- general.description
- character, pipeline workflow description 
- replicate
- character, replicate in the pipeline workflow 
- general.date
- double, date the pipeline workflow was run 
- general.title
- character, pipeline workflow title 
- general.pipeline_ver
- character, ENCODE ATAC-seq pipeline version 
- general.pipeline_type
- character, "atac" 
- general.genome
- character, reference genome 
- general.aligner
- character, read aligner 
- general.peak_caller
- character, peak caller 
- general.seq_endedness.paired_end
- logical, are the reads paired-ended 
- replication.num_peaks.num_peaks
- integer, number of replication peaks (max 300000) 
- peak_stat.peak_region_size.min_size
- integer, minimum peak width 
- peak_stat.peak_region_size.25_pct
- integer, 25th percentile of peak width 
- peak_stat.peak_region_size.50_pct
- integer, 50th percentile of peak width 
- peak_stat.peak_region_size.75_pct
- integer, 75th percentile of peak width 
- peak_stat.peak_region_size.max_size
- integer, max peak width 
- peak_stat.peak_region_size.mean
- double, mean peak width 
- peak_enrich.frac_reads_in_peaks.macs2.frip
- double, fraction of reads in MACS2 peaks 
- align.samstat.total_reads
- integer, total number of alignments, including multimappers 
- align.samstat.mapped_reads
- integer, total number of mapped reads 
- align.samstat.pct_mapped_reads
- double, percent of reads that mapped 
- align.samstat.paired_reads
- integer, number of paired reads 
- align.samstat.read1
- integer, number of read 1 reads 
- align.samstat.read2
- integer, number of read 2 reads 
- align.samstat.properly_paired_reads
- integer, number of properly paired reads 
- align.samstat.pct_properly_paired_reads
- double, percent of reads that were properly paired 
- align.samstat.with_itself
- integer, number of reads paired with its pair 
- align.samstat.singletons
- integer, number of singleton reads 
- align.samstat.pct_singletons
- double, percent of reads that were singleton 
- align.samstat.diff_chroms
- integer 
- align.dup.paired_reads
- integer, number of paired reads for duplication step 
- align.dup.paired_duplicate_reads
- integer, number of duplicate paired reads 
- align.dup.paired_optical_duplicate_reads
- integer, number of optical duplicate paired reads 
- align.dup.pct_duplicate_reads
- double, percent of reads that are duplicate 
- align.frac_mito.non_mito_reads
- integer, percent of reads that align to non-mitochondrial DNA 
- align.frac_mito.mito_reads
- integer, number of reads that align to mitochondrial DNA 
- align.frac_mito.frac_mito_reads
- double, fraction of reads that align to mitochondrial DNA 
- align.nodup_samstat.total_reads
- integer, number of reads after applying all filters 
- align.nodup_samstat.mapped_reads
- integer, number of mapped reads after applying all filters 
- align.nodup_samstat.paired_reads
- integer, number of paired reads after applying all filters 
- align.nodup_samstat.read1
- integer, number of read 1 reads after applying all filters 
- align.nodup_samstat.read2
- integer, number of read 2 reads after applying all filters 
- align.nodup_samstat.properly_paired_reads
- integer, number of properly paired reads after applying all filters 
- align.nodup_samstat.with_itself
- integer, number of reads paired with its pair after applying all filters 
- align.frag_len_stat.frac_reads_in_nfr
- double, fraction of reads in nucelosome-free region. Should be a value greater than 0.4. 
- align.frag_len_stat.frac_reads_in_nfr_qc_pass
- logical, does - align.frag_len_stat.frac_reads_in_nfrpass the cutoff?
- align.frag_len_stat.frac_reads_in_nfr_qc_reason
- character, reason for - align.frag_len_stat.frac_reads_in_nfr_qc_pass
- align.frag_len_stat.nfr_over_mono_nuc_reads
- double, reads in nucleosome-free-region versus reads in mononucleosomal peak. Should be a value greater than 2.5. 
- align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass
- logical, does - align.frag_len_stat.frac_reads_in_nfr_qc_passpass the cutoff?
- align.frag_len_stat.nfr_over_mono_nuc_reads_qc_reason
- character, reason for - align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass
- align.frag_len_stat.nfr_peak_exists
- logical, does a nucleosome-free peak exist? 
- align.frag_len_stat.mono_nuc_peak_exists
- logical, does a mono-nucleosomal peak exist? 
- align.frag_len_stat.di_nuc_peak_exists
- logical, does a di-nucleosomal peak exist? 
- lib_complexity.lib_complexity.total_fragments
- integer, total number of fragments 
- lib_complexity.lib_complexity.distinct_fragments
- integer, number of distinct fragments 
- lib_complexity.lib_complexity.positions_with_one_read
- integer, number of positions with one read 
- lib_complexity.lib_complexity.NRF
- double, non-reduandant fraction. Measure of library complexity. Ideally >0.9 
- lib_complexity.lib_complexity.PBC1
- double, PCR bottlenecking coefficient 1. Measure of library complexity. Ideally >0.9 
- lib_complexity.lib_complexity.PBC2
- double PCR bottlenecking coefficient 2. Measure of library complexity. Ideally >3 
- align_enrich.tss_enrich.tss_enrich
- double, transcription start site enrichment of peaks 
- 2D_barcode
- double, sample barcode 
- Tissue
- character, tissue description 
- Species
- character, species 
- Sample_category
- character, study sample ("study") or reference standard ("ref) 
- GET_site
- character, which Genomics, Epigenomics, and Transcriptomics (GET) site performed the assay, "Stanford" or "MSSM" (Icahn School of Medicine at Mount Sinai) 
- Sample_batch
- integer, numeric batch number for batch in which this sample was manually processed 
- Lib_adapter_1
- character, Adapter sequence for read 1 
- Lib_adapter_2
- character, Adapter sequence for read 2 
- Lib_index_1
- character, i7 index 
- Lib_index_2
- character, i5 index 
- Nuclei_extr_date
- character, nuclei extraction date 
- Nuclei_extr_count
- integer, nuclei count 
- Nuclei_tagmentation
- integer, number of nuclei used in each tagmentation reaction 
- Tagmentation_date
- character, tagmentation date, MM/DD/YYYY format 
- Tagmentation_enzyme_cat
- integer, catalog number of tagmentation enzyme TDE1 (Tn5) 
- Tagmentation_enzyme_lot
- integer, lot number of tagmentation enzyme TDE1 (Tn5) 
- Tagmentation_buffer_cat
- integer, catalog number of tagmentation buffer 
- Tagmentation_buffer_lot
- integer, lot number of tagmentation buffer 
- Tagmentation_reaction_vol
- integer, volume of tagmentation (uL) 
- Tagmentation_purification_kit
- character, purification kit 
- Tagmentation_purified_DNA_vol
- double, volume of purified DNA (uL) 
- PCR_date
- character, PCR date, MM/DD/YYYY format 
- PCR_cycle_nr
- integer, number of PCR cycles 
- PCR_purification_beads_ul
- character, volume of SPRIselect beads for lower size selection 
- Lib_DNA_conc
- double, DNA concentration for the library (ng/uL) 
- Lib_DNA_molarity
- double, DNA molarity of library (nM) 
- Lib_frag_size
- integer, average library fragment size 
- Lib_BA_quality
- integer, visual inspection of the library quality with the Bioanalyzer track (1=good, 0=bad) 
- Seq_DNA_molarity
- double, DNA molarity for sequencing (nM) 
- Seq_platform
- character, sequencing platform 
- Seq_date
- integer, sequencing date, YYMMDD format 
- Seq_machine_ID
- character, serial number of the sequencer 
- Seq_flowcell_ID
- character, flow cell ID 
- Seq_flowcell_run
- integer, flow cell run 
- Seq_flowcell_lane
- character, flow cell lane 
- Seq_flowcell_type
- character, flow cell type, e.g., S4 
- Seq_length
- double, read length 
- Seq_end_type
- integer, 1=single-end, 2=paired-end 
- total_primary_alignments
- integer, number of primary alignments 
- pct_chrX
- double, number of reads mapped to chromosome X 
- pct_chrY
- double, number of reads mapped to chromosome Y 
- pct_chrM
- double, number of reads mapped to chromosome M 
- pct_auto
- double, number of reads mapped to autosomal chromosomes 
- pct_contig
- double, number of reads mapped to contigs 
- Seq_batch
- character, unique identifier for sequencing batch 
Details
The ENCODE ATAC-seq pipeline v1.7.0 was used to quantify ATAC-seq data.
Columns with a period are QC metrics from this pipeline. Note that the ENCODE pipeline reports alignments per paired-end read,
so align.samstat.total_reads reports the number of paired-end reads that align, which corresponds to twice the number of sequenced fragments.