Welcome to the blog

Posts

My thoughts and ideas

Strand Settings | Griffith Lab

RNA-seq Bioinformatics

Introduction to bioinformatics for RNA sequence analysis

Strand Settings

There are various strand-related settings for RNA-seq tools that must be adjusted to account for library construction strategy. The following table provides read orientation codes and software settings for commonly used RNA-seq analysis tools including: IGV, TopHat, HISAT2, HTSeq, Picard, Kallisto, StringTie, and others. Each of these explanations/settings is provided for several commonly used RNA-seq library construction kits that produce either stranded or unstranded data.

NOTE: A useful tool to infer strandedness of your raw sequence data is the check_strandedness tool. We provide a tutorial for using this tool here.

NOTE: In the table below, the list of methods/kits for specific strand settings assumes that these kits are used as specified by their manufacturer. It is very possible that a sequencing provider/core may make modifications to these kits. For example, in one case we obtained RNAseq data processed with NEBNext Ultra II Directional kit (dUTP method). However instead of using the NEB hairpin adapters, IDT xGen UDI-UMI adapters were substituted, and this results in the insert strandedness being flipped (from RF/fr-firststrand to FR/fr-secondstrand). Because this level of detail is not always provided it is highly recommended to confirm your data’s strandedness empirically.

Tool RF/fr-firststrand stranded (dUTP) FR/fr-secondstrand stranded (Ligation) Unstranded
check_strandedness (output) RF/fr-firststrand FR/fr-secondstrand unstranded
IGV (5p to 3p read orientation code) F2R1 F1R2 F2R1 or F1R2
TopHat (--library-type parameter) fr-firststrand fr-secondstrand fr-unstranded
HISAT2 (--rna-strandness parameter) R/RF F/FR NONE
HTSeq (--stranded/-s parameter) reverse yes no
STAR n/a (STAR doesn’t use library strandedness info for mapping) NONE NONE
Picard CollectRnaSeqMetrics (STRAND_SPECIFICITY parameter) SECOND_READ_TRANSCRIPTION_STRAND FIRST_READ_TRANSCRIPTION_STRAND NONE
Kallisto quant (parameter) --rf-stranded --fr-stranded NONE
StringTie (parameter) --rf --fr NONE
FeatureCounts (-s parameter) 2 1 0
RSEM (–forward-prob parameter) 0 1 0.5
Salmon (--libType parameter) ISR (assuming paired-end with inward read orientation) ISF (assuming paired-end with inward read orientation) IU (assuming paired-end with inward read orientation)
Trinity (–SS_lib_type parameter) RF FR NONE
MGI CWL YAML (strand parameter) first second NONE
WASHU WDL YAML (strand parameter) first second unstranded
RegTools (strand parameter) -s RF -s FR -s XS
Example kits Example methods/kits: dUTP, NSR, NNSR, Illumina TruSeq Strand Specific Total RNA, NEBNext Ultra II Directional Example methods/kits: Ligation, Standard SOLiD, NuGEN Encore, 10X 5’ scRNA data Example kits/data: Standard Illumina, NuGEN OvationV2, SMARTer universal low input RNA kit (TaKara), GDC normalized TCGA data

Notes

To identify which --library-type setting to use with TopHat, Illumina specifically documents the types in the ‘RNA Sequencing Analysis with TopHat’ Booklet. For the TruSeq RNA Sample Prep Kit, the appropriate library type is fr-unstranded. For TruSeq stranded sample prep kits, the library type is specified as fr-firststrand. These posts are also very informative: How to tell which library type to use (fr-firststrand or fr-secondstrand)? and How to determine if a library Is strand-specific and Strandness in RNASeq by Hong Zheng. Another suggestion is to view aligned reads in IGV and determine the read orientation by one of two methods. First, you can have IGV color alignments according to strand using the ‘Color alignments’ by ‘First-of-pair strand’ setting. Second, to get more detailed information you can hover your cursor over a read aligned to an exon. ‘F2 R1’ means the second read in the pair aligns to the forward strand and the first read in the pair aligns to the reverse strand. For a positive DNA strand transcript (5’ to 3’) this would denote a fr-firststrand setting in TopHat, i.e. “the right-most end of the fragment (in transcript coordinates) is the first sequenced”. For a negative DNA strand transcript (3’ to 5’) this would denote a fr-secondstrand setting in TopHat. ‘F1 R2’ means the first read in the pair aligns to the forward strand and the second read in the pair aligns to the reverse strand. See above for the complete definitions, but its simply the inverse for ‘F1 R2’ mapping. Anything other than FR orientation is not covered here and discussion with the individual responsible for library creation would be required. Typically ‘RF’ orientation is reserved for large-insert mate-pair libraries. Other orientations like ‘FF’ and ‘RR’ seem impossible with Illumina sequence technology and suggest structural variation between the sample and reference. Additional details are provided in the TopHat manual.

For HTSeq, the htseq-count manual indicates that for the --stranded option, stranded=no means that a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

For the ‘CollectRnaSeqMetrics’ sub-command of Picard, the Picard manual indicates that one should use FIRST_READ_TRANSCRIPTION_STRAND if the reads are expected to be on the transcription strand.

Example data providers

Examples (from check_strandedness) that we have observed from different providers (note that these could be changed by the provider at any time, so you should always check your own data):

  • Boston Gene: RF/fr-firststrand
  • Personalis: RF/fr-firststrand
  • WASHU CLE Lab: RF/fr-firststrand
  • Caris: RF/fr-firststrand
  • Tempus: FR/fr-secondstrand
  • IGM @ Nationwide Children’s Hospital: FR/fr-secondstrand
Complete Result Sets | Griffith Lab

RNA-seq Bioinformatics

Introduction to bioinformatics for RNA sequence analysis

Complete Result Sets

Introduction

The following links provide examples of complete result sets for different interations of this coures. These are meant to be the complete set of result files obtained by the instructor running through all the commands of the course. The files are made available in the same file/directory structure as you should get from following the instructions yourself.