Cost effective, experimentally robust differential-expression analysis for human/mammalian, pathogen and dual-species transcriptomics
MetadataShow full item record
AbstractAs sequencing read length has increased, researchers have quickly adopted longer reads for their experiments. Here, we examine 14 pathogen or host-pathogen differential gene expression data sets to assess whether using longer reads is warranted. A variety of data sets was used to assess what genomic attributes might affect the outcome of differential gene expression analysis including: gene density, operons, gene length, number of introns/exons and intron length. No genome attribute was found to influence the data in principal components analysis, hierarchical clustering with bootstrap support, or regression analyses of pairwise comparisons that were undertaken on the same reads, looking at all combinations of paired and unpaired reads trimmed to 36, 54, 72 and 101 bp. Read pairing had the greatest effect when there was little variation in the samples from different conditions or in their replicates (e.g. little differential gene expression). But overall, 54 and 72 bp reads were typically most similar. Given differences in costs and mapping percentages, we recommend 54 bp reads for organisms with no or few introns and 72 bp reads for all others. In a third of the data sets, read pairing had absolutely no effect, despite paired reads having twice as much data. Therefore, single-end reads seem robust for differential-expression analyses, but in eukaryotes paired-end reads are likely desired to analyse splice variants and should be preferred for data sets that are acquired with the intent to be community resources that might be used in secondary data analyses. Copyright 2020 The Authors.
SponsorsNational Institutes of Health, NIH: R01CA206188, R01AI124566, R01DE022600; National Institute of Allergy and Infectious Diseases, NIAID: U19AI110820
Identifier to cite or link to this itemhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85079346779&doi=10.1099%2fmgen.0.000320&partnerID=40&md5=0e4a550f05032acf350f5357f25783e3; http://hdl.handle.net/10713/12055
- The impact of read length on quantification of differentially expressed genes and splice junction detection.
- Authors: Chhangawala S, Rudy G, Mason CE, Rosenfeld JA
- Issue date: 2015 Jun 23
- Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols.
- Authors: Corley SM, MacKenzie KL, Beverdam A, Roddam LF, Wilkins MR
- Issue date: 2017 May 23
- Short paired-end reads trump long single-end reads for expression analysis.
- Authors: Freedman AH, Gaspar JM, Sackton TB
- Issue date: 2020 Apr 19
- Mapping accuracy of short reads from massively parallel sequencing and the implications for quantitative expression profiling.
- Authors: Palmieri N, Schlötterer C
- Issue date: 2009 Jul 28
- Assessment of the impact of using a reference transcriptome in mapping short RNA-Seq reads.
- Authors: Zhao S
- Issue date: 2014