RNA-seq metadata and QC — TRNSCRPT_META • MotrpacRatTraining6moData

RNA-seq experimental and quantification QC metrics for transcriptomic (TRNSCRPT) data

Usage

TRNSCRPT_META

Format

A data frame with 935 rows and 82 variables:

viallabel: character, sample identifier
vial_label: double, sample identifier
2D_barcode: double, sample barcode
Species: character, species
BID: integer, biospecimen ID
PID: double, participant ID, one per animal
Tissue: character, tissue description
Sample_category: character, study sample ("study") or reference standard ("ref)
GET_site: character, which Genomics, Epigenomics, and Transcriptomics (GET) site performed the assay, "Stanford" or "MSSM" (Icahn School of Medicine at Mount Sinai)
RNA_extr_plate_ID: character, RNA extraction plate ID
RNA_extr_date: character, RNA extraction date
RNA_extr_conc: double, RNA concentration (ng/uL)
RIN: double, RNA Integrity Number
r_260_280: double, 260/280 ratio
r_260_230: double 260/230 ratio
Lib_prep_date: character, library preparation date in MM/DD/YYYY format
Lib_RNA_conc: double, RNA concentration used for library prep (ng/uL)
Lib_RNA_vol: integer, RNA volume used for library prep (uL)
Lib_robot: character, robot used for library prep
Lib_vendor: character, library prep vendor
Lib_type: character, library prep type
Lib_kit_id: character, library prep kit ID
Lib_batch_ID: character, library prep batch ID that distinguished different sample processing batches
Lib_barcode_well: character, well
Lib_index_1: character, i7 index
Lib_index_2: character, i5 index
Lib_adapter_1: character, Truseq I7 index with 16bp index
Lib_adapter_2: character, Truseq I5 index with 8bp index
Lib_UMI_cycle_num: integer, number of bases of UMI
Lib_adapter_size: integer, total size of the two adapters
Lib_frag_size: integer, average library fragment size (bp)
Lib_DNA_conc: double, DNA concentration of original stock of the library (ng/uL)
Lib_molarity: double, library molarity (nM)
Seq_platform: character, sequencing platform
Seq_date: integer, sequencing date, YYMMDD format
Seq_machine_ID: character, serial number of the sequencer
Seq_flowcell_ID: character, flow cell ID
Seq_flowcell_run: integer, flow cell run
Seq_flowcell_lane: character, flow cell lane
Seq_flowcell_type: character, flow cell type, e.g., S4
Seq_length: integer, read length
Seq_end_type: integer, 1=single-end, 2=paired-end
Phase: character, study phase, "PASS1B-06"
Seq_batch: character, unique identifier for sequencing batch
reads_raw: double, number of read pairs in the raw FASTQ
pct_adapter_detected: double, percent of reads with adapter detected
pct_trimmed: double, percent of reads that were trimmed
pct_trimmed_bases: double, percent of bases that were trimmed
reads: double, number of read pairs in the trimmed FASTQ files
pct_GC: double, percent GC content in trimmed FASTQ files
pct_dup_sequence: double, percent of duplicated sequences in trimmed FASTQ files
pct_rRNA: double, percent of rRNA reads in trimmed FASTQ files
pct_globin: double, percent of globin reads in trimmed FASTQ files
pct_phix: double, percent of phix reads in trimmed FASTQ files
pct_picard_dup: double, PCR duplication assessed by Picard’s tool MarkDuplicate
pct_umi_dup: double, PCR duplication rate assessed using UMIs (Unique Molecular Identifiers)
avg_input_read_length: double, average input read length
uniquely_mapped: double, number of uniquely mapped reads
pct_uniquely_mapped: double, percent of uniquely mapped reads
avg_mapped_read_length: double, average input mapped length
num_splices: double, number of splices
num_annotated_splices: double, number of annotated splices
num_GTAG_splices: double, number of GT/AG and CT/AC splices
num_GCAG_splices: double, number of GC/AG and CT/GC splices
num_ATAC_splices: double, number of AT/AC and GT/TA splices
num_noncanonical_splices: double, number of non-canonical splices
pct_multimapped: double, percent of reads that multimapped
pct_multimapped_toomany: double, percent of reads that multimapped too many times
pct_unmapped_mismatches: double, percent of unmapped reads due to mismatches
pct_unmapped_tooshort: double, percent of unmapped reads due to shortness
pct_unmapped_other: double, percent of unmapped reads for other reason
pct_chimeric: double, percent chimeric reads
pct_chrX: double, percent of reads mapped to chromosome X
pct_chrY: double, percent of reads mapped to chromosome Y
pct_chrM: double, percent of reads mapped to the mitochondrial genome
pct_chrAuto: double, percent of reads mapped to autosomal chromosomes
pct_contig: double, percent of reads mapped to contigs
pct_coding: double, percent of bases mapped to coding
pct_utr: double, percent of bases mapped to untranslated region
pct_intronic: double, percent of bases mapped to introns
pct_intergenic: double, percent of bases mapped to intergenic
pct_mrna: double, percent of bases mapped to mRNA
median_5_3_bias: double, median 5' to 3' bias

Source

pass1b-06/results/transcriptomics/qa-qc/motrpac_pass1b-06_transcript-rna-seq_qa-qc-metrics.csv