ATAC-seq experimental and quantification QC metrics for chromatin accessibility (ATAC) data
Format
A data frame with 520 rows and 106 variables:
viallabelcharacter, sample identifier
general.descriptioncharacter, pipeline workflow description
replicatecharacter, replicate in the pipeline workflow
general.datedouble, date the pipeline workflow was run
general.titlecharacter, pipeline workflow title
general.pipeline_vercharacter, ENCODE ATAC-seq pipeline version
general.pipeline_typecharacter, "atac"
general.genomecharacter, reference genome
general.alignercharacter, read aligner
general.peak_callercharacter, peak caller
general.seq_endedness.paired_endlogical, are the reads paired-ended
replication.num_peaks.num_peaksinteger, number of replication peaks (max 300000)
peak_stat.peak_region_size.min_sizeinteger, minimum peak width
peak_stat.peak_region_size.25_pctinteger, 25th percentile of peak width
peak_stat.peak_region_size.50_pctinteger, 50th percentile of peak width
peak_stat.peak_region_size.75_pctinteger, 75th percentile of peak width
peak_stat.peak_region_size.max_sizeinteger, max peak width
peak_stat.peak_region_size.meandouble, mean peak width
peak_enrich.frac_reads_in_peaks.macs2.fripdouble, fraction of reads in MACS2 peaks
align.samstat.total_readsinteger, total number of alignments, including multimappers
align.samstat.mapped_readsinteger, total number of mapped reads
align.samstat.pct_mapped_readsdouble, percent of reads that mapped
align.samstat.paired_readsinteger, number of paired reads
align.samstat.read1integer, number of read 1 reads
align.samstat.read2integer, number of read 2 reads
align.samstat.properly_paired_readsinteger, number of properly paired reads
align.samstat.pct_properly_paired_readsdouble, percent of reads that were properly paired
align.samstat.with_itselfinteger, number of reads paired with its pair
align.samstat.singletonsinteger, number of singleton reads
align.samstat.pct_singletonsdouble, percent of reads that were singleton
align.samstat.diff_chromsinteger
align.dup.paired_readsinteger, number of paired reads for duplication step
align.dup.paired_duplicate_readsinteger, number of duplicate paired reads
align.dup.paired_optical_duplicate_readsinteger, number of optical duplicate paired reads
align.dup.pct_duplicate_readsdouble, percent of reads that are duplicate
align.frac_mito.non_mito_readsinteger, percent of reads that align to non-mitochondrial DNA
align.frac_mito.mito_readsinteger, number of reads that align to mitochondrial DNA
align.frac_mito.frac_mito_readsdouble, fraction of reads that align to mitochondrial DNA
align.nodup_samstat.total_readsinteger, number of reads after applying all filters
align.nodup_samstat.mapped_readsinteger, number of mapped reads after applying all filters
align.nodup_samstat.paired_readsinteger, number of paired reads after applying all filters
align.nodup_samstat.read1integer, number of read 1 reads after applying all filters
align.nodup_samstat.read2integer, number of read 2 reads after applying all filters
align.nodup_samstat.properly_paired_readsinteger, number of properly paired reads after applying all filters
align.nodup_samstat.with_itselfinteger, number of reads paired with its pair after applying all filters
align.frag_len_stat.frac_reads_in_nfrdouble, fraction of reads in nucelosome-free region. Should be a value greater than 0.4.
align.frag_len_stat.frac_reads_in_nfr_qc_passlogical, does
align.frag_len_stat.frac_reads_in_nfrpass the cutoff?align.frag_len_stat.frac_reads_in_nfr_qc_reasoncharacter, reason for
align.frag_len_stat.frac_reads_in_nfr_qc_passalign.frag_len_stat.nfr_over_mono_nuc_readsdouble, reads in nucleosome-free-region versus reads in mononucleosomal peak. Should be a value greater than 2.5.
align.frag_len_stat.nfr_over_mono_nuc_reads_qc_passlogical, does
align.frag_len_stat.frac_reads_in_nfr_qc_passpass the cutoff?align.frag_len_stat.nfr_over_mono_nuc_reads_qc_reasoncharacter, reason for
align.frag_len_stat.nfr_over_mono_nuc_reads_qc_passalign.frag_len_stat.nfr_peak_existslogical, does a nucleosome-free peak exist?
align.frag_len_stat.mono_nuc_peak_existslogical, does a mono-nucleosomal peak exist?
align.frag_len_stat.di_nuc_peak_existslogical, does a di-nucleosomal peak exist?
lib_complexity.lib_complexity.total_fragmentsinteger, total number of fragments
lib_complexity.lib_complexity.distinct_fragmentsinteger, number of distinct fragments
lib_complexity.lib_complexity.positions_with_one_readinteger, number of positions with one read
lib_complexity.lib_complexity.NRFdouble, non-reduandant fraction. Measure of library complexity. Ideally >0.9
lib_complexity.lib_complexity.PBC1double, PCR bottlenecking coefficient 1. Measure of library complexity. Ideally >0.9
lib_complexity.lib_complexity.PBC2double PCR bottlenecking coefficient 2. Measure of library complexity. Ideally >3
align_enrich.tss_enrich.tss_enrichdouble, transcription start site enrichment of peaks
2D_barcodedouble, sample barcode
Tissuecharacter, tissue description
Speciescharacter, species
Sample_categorycharacter, study sample ("study") or reference standard ("ref)
GET_sitecharacter, which Genomics, Epigenomics, and Transcriptomics (GET) site performed the assay, "Stanford" or "MSSM" (Icahn School of Medicine at Mount Sinai)
Sample_batchinteger, numeric batch number for batch in which this sample was manually processed
Lib_adapter_1character, Adapter sequence for read 1
Lib_adapter_2character, Adapter sequence for read 2
Lib_index_1character, i7 index
Lib_index_2character, i5 index
Nuclei_extr_datecharacter, nuclei extraction date
Nuclei_extr_countinteger, nuclei count
Nuclei_tagmentationinteger, number of nuclei used in each tagmentation reaction
Tagmentation_datecharacter, tagmentation date, MM/DD/YYYY format
Tagmentation_enzyme_catinteger, catalog number of tagmentation enzyme TDE1 (Tn5)
Tagmentation_enzyme_lotinteger, lot number of tagmentation enzyme TDE1 (Tn5)
Tagmentation_buffer_catinteger, catalog number of tagmentation buffer
Tagmentation_buffer_lotinteger, lot number of tagmentation buffer
Tagmentation_reaction_volinteger, volume of tagmentation (uL)
Tagmentation_purification_kitcharacter, purification kit
Tagmentation_purified_DNA_voldouble, volume of purified DNA (uL)
PCR_datecharacter, PCR date, MM/DD/YYYY format
PCR_cycle_nrinteger, number of PCR cycles
PCR_purification_beads_ulcharacter, volume of SPRIselect beads for lower size selection
Lib_DNA_concdouble, DNA concentration for the library (ng/uL)
Lib_DNA_molaritydouble, DNA molarity of library (nM)
Lib_frag_sizeinteger, average library fragment size
Lib_BA_qualityinteger, visual inspection of the library quality with the Bioanalyzer track (1=good, 0=bad)
Seq_DNA_molaritydouble, DNA molarity for sequencing (nM)
Seq_platformcharacter, sequencing platform
Seq_dateinteger, sequencing date, YYMMDD format
Seq_machine_IDcharacter, serial number of the sequencer
Seq_flowcell_IDcharacter, flow cell ID
Seq_flowcell_runinteger, flow cell run
Seq_flowcell_lanecharacter, flow cell lane
Seq_flowcell_typecharacter, flow cell type, e.g., S4
Seq_lengthdouble, read length
Seq_end_typeinteger, 1=single-end, 2=paired-end
total_primary_alignmentsinteger, number of primary alignments
pct_chrXdouble, number of reads mapped to chromosome X
pct_chrYdouble, number of reads mapped to chromosome Y
pct_chrMdouble, number of reads mapped to chromosome M
pct_autodouble, number of reads mapped to autosomal chromosomes
pct_contigdouble, number of reads mapped to contigs
Seq_batchcharacter, unique identifier for sequencing batch
Details
The ENCODE ATAC-seq pipeline v1.7.0 was used to quantify ATAC-seq data.
Columns with a period are QC metrics from this pipeline. Note that the ENCODE pipeline reports alignments per paired-end read,
so align.samstat.total_reads reports the number of paired-end reads that align, which corresponds to twice the number of sequenced fragments.