ATAC-seq experimental and quantification QC metrics for chromatin accessibility (ATAC) data
Format
A data frame with 520 rows and 106 variables:
viallabel
character, sample identifier
general.description
character, pipeline workflow description
replicate
character, replicate in the pipeline workflow
general.date
double, date the pipeline workflow was run
general.title
character, pipeline workflow title
general.pipeline_ver
character, ENCODE ATAC-seq pipeline version
general.pipeline_type
character, "atac"
general.genome
character, reference genome
general.aligner
character, read aligner
general.peak_caller
character, peak caller
general.seq_endedness.paired_end
logical, are the reads paired-ended
replication.num_peaks.num_peaks
integer, number of replication peaks (max 300000)
peak_stat.peak_region_size.min_size
integer, minimum peak width
peak_stat.peak_region_size.25_pct
integer, 25th percentile of peak width
peak_stat.peak_region_size.50_pct
integer, 50th percentile of peak width
peak_stat.peak_region_size.75_pct
integer, 75th percentile of peak width
peak_stat.peak_region_size.max_size
integer, max peak width
peak_stat.peak_region_size.mean
double, mean peak width
peak_enrich.frac_reads_in_peaks.macs2.frip
double, fraction of reads in MACS2 peaks
align.samstat.total_reads
integer, total number of alignments, including multimappers
align.samstat.mapped_reads
integer, total number of mapped reads
align.samstat.pct_mapped_reads
double, percent of reads that mapped
align.samstat.paired_reads
integer, number of paired reads
align.samstat.read1
integer, number of read 1 reads
align.samstat.read2
integer, number of read 2 reads
align.samstat.properly_paired_reads
integer, number of properly paired reads
align.samstat.pct_properly_paired_reads
double, percent of reads that were properly paired
align.samstat.with_itself
integer, number of reads paired with its pair
align.samstat.singletons
integer, number of singleton reads
align.samstat.pct_singletons
double, percent of reads that were singleton
align.samstat.diff_chroms
integer
align.dup.paired_reads
integer, number of paired reads for duplication step
align.dup.paired_duplicate_reads
integer, number of duplicate paired reads
align.dup.paired_optical_duplicate_reads
integer, number of optical duplicate paired reads
align.dup.pct_duplicate_reads
double, percent of reads that are duplicate
align.frac_mito.non_mito_reads
integer, percent of reads that align to non-mitochondrial DNA
align.frac_mito.mito_reads
integer, number of reads that align to mitochondrial DNA
align.frac_mito.frac_mito_reads
double, fraction of reads that align to mitochondrial DNA
align.nodup_samstat.total_reads
integer, number of reads after applying all filters
align.nodup_samstat.mapped_reads
integer, number of mapped reads after applying all filters
align.nodup_samstat.paired_reads
integer, number of paired reads after applying all filters
align.nodup_samstat.read1
integer, number of read 1 reads after applying all filters
align.nodup_samstat.read2
integer, number of read 2 reads after applying all filters
align.nodup_samstat.properly_paired_reads
integer, number of properly paired reads after applying all filters
align.nodup_samstat.with_itself
integer, number of reads paired with its pair after applying all filters
align.frag_len_stat.frac_reads_in_nfr
double, fraction of reads in nucelosome-free region. Should be a value greater than 0.4.
align.frag_len_stat.frac_reads_in_nfr_qc_pass
logical, does
align.frag_len_stat.frac_reads_in_nfr
pass the cutoff?align.frag_len_stat.frac_reads_in_nfr_qc_reason
character, reason for
align.frag_len_stat.frac_reads_in_nfr_qc_pass
align.frag_len_stat.nfr_over_mono_nuc_reads
double, reads in nucleosome-free-region versus reads in mononucleosomal peak. Should be a value greater than 2.5.
align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass
logical, does
align.frag_len_stat.frac_reads_in_nfr_qc_pass
pass the cutoff?align.frag_len_stat.nfr_over_mono_nuc_reads_qc_reason
character, reason for
align.frag_len_stat.nfr_over_mono_nuc_reads_qc_pass
align.frag_len_stat.nfr_peak_exists
logical, does a nucleosome-free peak exist?
align.frag_len_stat.mono_nuc_peak_exists
logical, does a mono-nucleosomal peak exist?
align.frag_len_stat.di_nuc_peak_exists
logical, does a di-nucleosomal peak exist?
lib_complexity.lib_complexity.total_fragments
integer, total number of fragments
lib_complexity.lib_complexity.distinct_fragments
integer, number of distinct fragments
lib_complexity.lib_complexity.positions_with_one_read
integer, number of positions with one read
lib_complexity.lib_complexity.NRF
double, non-reduandant fraction. Measure of library complexity. Ideally >0.9
lib_complexity.lib_complexity.PBC1
double, PCR bottlenecking coefficient 1. Measure of library complexity. Ideally >0.9
lib_complexity.lib_complexity.PBC2
double PCR bottlenecking coefficient 2. Measure of library complexity. Ideally >3
align_enrich.tss_enrich.tss_enrich
double, transcription start site enrichment of peaks
2D_barcode
double, sample barcode
Tissue
character, tissue description
Species
character, species
Sample_category
character, study sample ("study") or reference standard ("ref)
GET_site
character, which Genomics, Epigenomics, and Transcriptomics (GET) site performed the assay, "Stanford" or "MSSM" (Icahn School of Medicine at Mount Sinai)
Sample_batch
integer, numeric batch number for batch in which this sample was manually processed
Lib_adapter_1
character, Adapter sequence for read 1
Lib_adapter_2
character, Adapter sequence for read 2
Lib_index_1
character, i7 index
Lib_index_2
character, i5 index
Nuclei_extr_date
character, nuclei extraction date
Nuclei_extr_count
integer, nuclei count
Nuclei_tagmentation
integer, number of nuclei used in each tagmentation reaction
Tagmentation_date
character, tagmentation date, MM/DD/YYYY format
Tagmentation_enzyme_cat
integer, catalog number of tagmentation enzyme TDE1 (Tn5)
Tagmentation_enzyme_lot
integer, lot number of tagmentation enzyme TDE1 (Tn5)
Tagmentation_buffer_cat
integer, catalog number of tagmentation buffer
Tagmentation_buffer_lot
integer, lot number of tagmentation buffer
Tagmentation_reaction_vol
integer, volume of tagmentation (uL)
Tagmentation_purification_kit
character, purification kit
Tagmentation_purified_DNA_vol
double, volume of purified DNA (uL)
PCR_date
character, PCR date, MM/DD/YYYY format
PCR_cycle_nr
integer, number of PCR cycles
PCR_purification_beads_ul
character, volume of SPRIselect beads for lower size selection
Lib_DNA_conc
double, DNA concentration for the library (ng/uL)
Lib_DNA_molarity
double, DNA molarity of library (nM)
Lib_frag_size
integer, average library fragment size
Lib_BA_quality
integer, visual inspection of the library quality with the Bioanalyzer track (1=good, 0=bad)
Seq_DNA_molarity
double, DNA molarity for sequencing (nM)
Seq_platform
character, sequencing platform
Seq_date
integer, sequencing date, YYMMDD format
Seq_machine_ID
character, serial number of the sequencer
Seq_flowcell_ID
character, flow cell ID
Seq_flowcell_run
integer, flow cell run
Seq_flowcell_lane
character, flow cell lane
Seq_flowcell_type
character, flow cell type, e.g., S4
Seq_length
double, read length
Seq_end_type
integer, 1=single-end, 2=paired-end
total_primary_alignments
integer, number of primary alignments
pct_chrX
double, number of reads mapped to chromosome X
pct_chrY
double, number of reads mapped to chromosome Y
pct_chrM
double, number of reads mapped to chromosome M
pct_auto
double, number of reads mapped to autosomal chromosomes
pct_contig
double, number of reads mapped to contigs
Seq_batch
character, unique identifier for sequencing batch
Details
The ENCODE ATAC-seq pipeline v1.7.0 was used to quantify ATAC-seq data.
Columns with a period are QC metrics from this pipeline. Note that the ENCODE pipeline reports alignments per paired-end read,
so align.samstat.total_reads
reports the number of paired-end reads that align, which corresponds to twice the number of sequenced fragments.